We had a piece of client software which would open a persistent Oracle SQL*Net (Net8) connection to the server. However, due to the ways the users would work, often this connection would sit idle for very long times.
We began getting reports from our service desk that users at remote sites and users connecting over VPN would get a bizarre Oracle error and the application would crash. After much digging, we discovered the software would attempt to issue an Oracle command, and it seemed that the server would send a TCP RST in reply.
It turns out, however, that the server was not sending the RST–it was a PIX firewall sending this. The default behavior on Cisco PIX and ASA firewalls is to timeout an idle TCP connection after 1 hour. This essentially is a broken behavior, as the RFC 793 says nothing about the longevity of a TCP session or about keepalives. However, firewalls are a reality, so we need some way of getting around this.
Increasing the global parameter on the PIX or ASA firewall is an option, but on a very busy firewall, this could begin to use up resources, so I don’t recommend it. It might be an option to set this to eight hours on a fairly quiet firewall:
timeout conn 08:00:00
A more attractive option is to use TCP keepalives. Oracle by default does not send keepalives, and, in a Windows client environment, the OS only typically sends keepalives once every two hours. So, two changes are needed.
- In
tnsnames.ora add the atom (ENABLE=BROKEN) in the (DESCRIPTION) container. This will cause Oracle SQL*Net (or Net8) to ask the OS to send TCP keepalives.
- In the Windows Registry, add a
REG_DWORD named HKEY_LOCAL_MACHINE\System\CurrentControlSet \Services\Tcpip\Parameters\KeepAliveTime with a value of 1800000 decimal. This corresponds to 30 minutes.
I’m fairly certain a reboot is in order, as this is Windows.
When we applied this to all our mobile users, the problem went away completely.