This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Troubleshooting Network Slowness by Comparing I/O Graphs

0

Hi Fellow Wireshark Enthusiasts

I'm having a hard time troubleshooting a Real Time Stocks Trading Application. Apparently, this is the only Real-Time Application in the company I'm working at that's experiencing slowness. Here are the Steps I've done so far:

Performed Traceroute from App to DB, Client to App, DB to App, traceroute averages at 4ms.

I have also performed Sniffing using Wireshark. At first I only sniffed one device at a time and that didn't give me the details so I sniffed two devices at a time: The Switch closest to Trade Application Server and the Switch closest to the Database Server. I compared their I/O graphs and I've found out that the delay of packet transmission from App to DB is the same on both switches but the latency of DB to App is different on both switches. The switch closest to the db shows 10ms or less latency while the switch closest to the Trade server shows 100 - 254ms delay!.

One problem I'm having with wireshark though is that all our switches(except those that connects to the client's PC)perform loads balancing so it's hard to monitor a specific transaction. Also, the communication between the Client PC to App and App to DB is encrypted so what I'm just looking at is the tcp.time_delta****

Unfortunately, I don't know if the high delay is caused by the Application Response Time or by the Network Transmission. Also, the are a lot of network devices in between the App, DB and the Client's PC, around 30 devices including the 2 two firewalls. I'm planning on sniffing at least all the devices between App and DB excluding the firewalls and that would take me 6 Laptops running at the same time.

Before I perform that though, I have a few questions:

1.) Am I doing the troubleshooting process correctly?

2.) Are there other things I could look at/check?

3.) Does the number of network hops increases the latency? If yes, does it vary between switches, routers and firewalls?

4.) The vendor for the trading application recommends to implement QoS for this trading application. Is that really necessary?

Thank you in advance for all your help and sorry for the long block of text. :)

asked 08 Aug '14, 03:07

Sharknado's gravatar image

Sharknado
1336
accept rate: 0%

"switch closest to the Trade server shows 100 - 254ms delay!" it's very unlikely that a switch would be buffering/queueing frames this long. How do you believe you are determining this result?

It sounds like a reasonable approach you are using. It is probably important to determine application response time near the server. On many occasions I have been asked to troubleshoot or diagnose "network performance issues". Only to find things like DNS queries being unresolved correctly and resulting in default fallbacks (leading to seconds of delays), or authentication issues (where client and server spend a lot of negotiating there way around various options), application running in debug mode and spending precious seconds executing wasted processing, and finally poorly understood database queries that consistently chew up seconds as the many CPUs seem to pull in the whole database index while they search for a result. This seems to be more prevalent in custom or bespoke application when developers really don't have a good handle on what they are asking their server to do - and only when they deploy to production do they realise that going from a handful users in dev doesn't scale to 100's of real world users.

I usually try to even do wireshark captures on the server (running transactions to localhost), and also have run their CPU/disk perf mon tools when the transactions are being run.

(10 Aug '14, 18:30) martyvis

2 Answers:

0

this is the only Real-Time Application in the company I'm working at that's experiencing slowness.

please define 'slowness' first.

1.) Am I doing the troubleshooting process correctly?

basically yes. In details: hard to say as you did not provide enough information about the whole application data flow.

2.) Are there other things I could look at/check?

Obviously the logs of all involved systems (app server, db server, firewalls, etc.)

3.) Does the number of network hops increases the latency?

Sure, every node needs some time to process the packet.

If yes, does it vary between switches, routers and firewalls?

sure, as they need a different amount of time to process the packet. A firewall usually needs more time than a switch, as it has to do much more with the packet. It also depends on the load (CPU, line, etc.)

4.) The vendor for the trading application recommends to implement QoS for this trading application. Is that really necessary?

Well, hard to tell with the amount of information you provided so far. If the traffic is passing the internet, you can forget QoS, as you won't have any influence on the packet as soon as it leaves your network.

If you implement QoS you'll have to do it properly on all involved systems. You mentioned 30 devices between the client and the backend. So, good luck with that endeavor ;-)) But if this is really a 'real time' application (please define that as well), you'll probably need QoS to make the application work fast enough.

Regards
Kurt

answered 10 Aug '14, 07:27

Kurt%20Knochner's gravatar image

Kurt Knochner ♦
24.8k1039237
accept rate: 15%

0

Hi sharknado,

An alternative approach is to use a Wireshark plugin called TRANSUM. You can find a demo of using it for a similar problem at http://www.lovemytool.com/blog/2014/08/transum-analyzing-a-website-problem-by-paul-offord.html

TRANSUM is available at http://www.tribelabzero.com/resources

It's a top down approach. So you start by identifying which flows are a problem and then drill down to service or network problem.

Best regards...Paul

answered 11 Aug '14, 14:43

PaulOfford's gravatar image

PaulOfford
131283237
accept rate: 11%