One of our customers has intermittent database connection problems. The error logs periodically show messages like this:
A database error occurred. : Software caused connection abort: recv failed
at which point someone must restart the application. (Connection pooling doesn't recover from the error)
This post describes a problem; one of the answers points to wireshark.
Suspicions
I suspect the problem is either:
- someone wrote a database script that runs on the server that closes sleeping connections
- an error in their network
As I can't find anyone owning up to #1, I need to hunt down #2.
My question
- Once I have captured the data, what do I look for in the captured data to point to a root cause?
Example
For example, I assume based on past experience that the server logs will show everything "runs fine" until an error like this occurs:
2012-11-01 04:02:14 A database error occurred. : Software caused connection abort: recv failed
Then I'd like to look at the traffic the seconds/minutes before 04:02:14 for any "smoking gun".
Pointers?
Additionally, I ask for pointers on these items:
- setting up the traffic capture. (is capturing on the jdbc server's port 1433 enough?)
- port-mirroring tips. I assume I can ask their "network guy" to set up port mirroring to a windows box running wireshark and he can do it easily ("Port mirroring" copies all network traffic for a particular connection off to another port for later analysis)
- any wireshark 'scalability issues'--will the data overload wireshark ?
- filtering through the data for "interesting events"
- "smoking guns"--what to look for prior to that 'connection recv error'
Environment
- MSSQL (2005 or later. It's on the customer site and I don't know the details)
- Jboss 4.2.2
- Microsoft jdbc driver
- jdk 1.6 (not sure the exact version)
asked 01 Nov '12, 05:54
milspec
1●1●1●1
accept rate: 0%