Debug a simple PK update SQL takes 500+ seconds in Oracle

We run into a problem recently in Oracle database at Solaris. A simple PK update SQL takes 500+ seconds to finish. It didn’t happen all the time, less than 1% chances. It causes app timeout when it happens.

We observed more wait time on “SQL*Net Message to client” in statspack data, no other obviously abnormal events beside this. We suspect it’s caused by network issue. SA told us the network usage is only 20%~30%. We did ping test from db host to client. It didn’t show any obviously problems.

Later, we did a test to switch 2 DB’s host in a db family. It makes the problem moved to the other host. At this point, we are sure it’s not DB problem. It must be host or network issue. Later we found the “TCP retransmission” rate is high at the problematic host. Finally, SA changed the network interface to fix this issue.

We can check the “TCP retransmission” using below scripts:

while true loop
do
 echo -n "`date` : "
 netstat -s -P tcp | grep tcpRetrans
 sleep 1
done;

You can reference this webpage for more about the TCP retrasmission issue.