This is our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

I am helping a volunteer community run a local library. We are working with the local authority and have to use their book loan/reservation system. Currently this is driving the volunteers crazy with unexplained and undiagnosed IT problems. The problem is not unique to our site - other volunteer libraries also experience it with the same frequency (up to 10 or more times per day). We are using a web based app and while part way through a transaction we see the app freeze - this is usually apparent when the bar code reader beeps as the bar code is read but no response message is displayed from the central system (this should happen without the need for any key being pressed). If a volunteer presses the enter key when they suspect a freeze they get a message saying "xxx.gov.uk is not responding". If they take any action the record in question is locked for the rest of the day - alternatively they can do nothing and wait for the above message to go away of its own accord - if they do this (and it takes around 2 minutes for this to happen) then no record is locked and they can continue. I have used Wireshark to trace the client side of the transactions and the page spanning the problem is shown here.

The volunteers say the problem started at 12:41:26 but this is definately later than when it started because they had to realise they had a problem and then get the time stamp, so the problem will have started several seconds, or possibly longer, before. The system recovered by itself by 14:43:27 - this time is probably 2 seconds or so after the "not responding" message had gone away, again because of the time it would have taken the volunteers to respond.

I'm really out of my depth in terms of understanding the exact meaning of these trace records hence my appeal for anyone with better knowledge to help. I can supply the detailed trace file if needed.

Many thanks for any help. Overview Records

asked 21 Jun '12, 08:19

Bernard46's gravatar image

Bernard46
1334
accept rate: 0%

edited 21 Jun '12, 08:24

"...and the page spanning the problem is shown here." - where exactly? :) Can you upload the trace to www.cloudshark.org so we can take a look? Only do that if the trace doesn't contain critical data, like plain text login details etc.

(21 Jun '12, 08:27) Jasper ♦♦

Sorry about that - I couldn't quite work out how to get the image uploaded - despite copying what someone else appeared to have done I still got it wrong. I have uploaded the trace to Cloudshark as you suggested and the name is https://www.cloudshark.org/captures/546436d1939d - I noticed the times are displayed as relative times (not absolute) - do you need me to give the record numbers for the times I mentioned? Thanks for your help.

(21 Jun '12, 08:41) Bernard46

I've upload a trace for a successful session from logon to logoff as https://www.cloudshark.org/captures/a0fa1c782945 This consisted of the following end user actions: 1) App start 2) Logon 3) Enter Loan section of app 4) Read borrower's card via bar code scanner and auto submit 5) Return to main menu 6) Select book return option 7) Read book bar code via scanner and auto submit 8) Logoff I agree with you that the app is probably not the best in the world - my objective is to try and provide the evidence for the Library Authority to go back to the vendor and get them to fix it so that it stops driving our volunteers mad. Right now we're in danger f losing volunteers because of the way this app is bahaving unfortunately. Anything you can suggest would be very welcome. Thanks for helping.

(21 Jun '12, 09:34) Bernard46

It would indeed really help if you can share the tracefile, please look at Jaspers suggestion for that.

Looking at your description, there are a few things that pop up in my mind:

  • There is a difference in Application behavior in the way the session ends. So a full trace of both cases (pressing enter or waiting two minutes) would be beneficial. Please also add a trace of a transaction without problem for reference on how things should be working.
  • You mention an Encrypted Alert. These usually are CloseNotify messages which are the SSL equivalent of a TCP FIN. Ie, it is indicating that the sender of the CloseNotify is ready to terminate the SSL session. So that in itself is not a problem, but it may indicate where a problem might be present

All in all, it feels like the application does not cope well with unexpected things in the network. Looking at the traces will tell more :-)

permanent link

answered 21 Jun '12, 08:42

SYN-bit's gravatar image

SYN-bit ♦♦
17.1k957245
accept rate: 20%

looking at your trace files and accessing the https server, I wonder what I get:

https://xxx.yyyyyyyy.gov.uk/ (aaa.bbb.ccc.ddd) UPDATE: info blanked, on request of OP
fred was here

Who is fred and why was he there???

Anyway, can you please connect to the site through Fiddler (http://www.fiddler2.com) and post the *.saz file somewhere? HINT: Please activate the SSL "proxy" in Fiddler (see options).

Regards
Kurt

permanent link

answered 21 Jun '12, 10:26

Kurt%20Knochner's gravatar image

Kurt Knochner ♦
24.8k1039237
accept rate: 15%

edited 21 Jun '12, 13:32

Hi Kurt I didn't want to mention the actual site name in the Q&A (hence the xxx in the original post)because I didn't want to draw too much attention to where the issue is but its not that important. However you will probably appreciate from my earlier comments about us being volunteers and that this is a local government website I'm a bit limited in what I can do. The PC we have to use for this app is actually owned by the local govt organisation and is locked down such that I cannot install anything on it or change anything. They have total control over it, and I mean total. I did manage to persuade them to install Wireshark but I'm not at all sure I would get away with Fiddler as well - in any case I have to get them to come out to the PC to do anything with it - they won't give me the password to install code. If you can make any inspired guesses about what might be going on I can try asking for permission to get further diagnostic info. One thing I have noticed is that the clock on this particular PC is about 7 minutes fast compared with real time - previously it was 60 minutes slow - I don't know (but I think its irrelevant here) whether this could have any bearing on the problem? regards, Bernard

(21 Jun '12, 11:07) Bernard46

Hi Kurt I didn't want to mention the actual site name in the Q&A

Sorry, I did not understand that. As you posted the whole capture file, I thought that information was public anyway. I "blanked" the site in my answer.

The PC we have ... is locked down such that I cannot install anything on it or change anything.

Well, not really ideal for troubleshooting ;-) Anyway, see my other answer/idea.

(21 Jun '12, 13:23) Kurt Knochner ♦

No problem Kurt, if it had been really important I would have said when I loaded the trace, but thanks for editing you response anyway.

I've passed on your comments to my contact, including the request for using Fiddler, so we'll have to wait and see what comes back later tomorrow - at least I hope they will respond tomorrow. Many thanks for your help, much appreciated. Bernard

(21 Jun '12, 15:16) Bernard46

If you can make any inspired guesses about what might be going on

How about this?

When we run this command:

tshark.exe -r Tue.cap -R "ip.src eq 195.225.188.193" -T fields -e frame.number -e tcp.stream -e frame.time_relative -e ip.src -e ip.id -E header=y

we can see alternating IP IDs for connections from the same IP. IP IDs are increased in steps of one: 0x1063, 0x1064, 0x1065, .... Then there is a new set of IP IDs: 0x1f26, 0x1f27, 0x1f28, then they jump back to: 0x1069, 0x106a, 0x106b, then this sequence: 0x5cde, 0x2f90, 0x2f91, 0x5cdf.

This looks like a Load balancer for me. There are different TCP connections to at least 3-4 backend servers. Probably they did NOT configure "sticky sessions" on the load balancer (or they don't work properly).

Again, I can only guess, but maybe your application needs the client to connect to the same server for a defined amount of time (usually the duration of an application session), which is probably not the case (according to your capture file). Please ask the server admins, if there is a load balancer in place and if "sticky sessions" are enabled for that virtual server.


frame.number    tcp.stream      frame.time_relative     ip.src  ip.dst  ip.id
141     1       52.312666000    195.225.188.193 192.168.1.251   0x0000
144     1       52.360153000    195.225.188.193 192.168.1.251   0x1063
145     1       52.364080000    195.225.188.193 192.168.1.251   0x1064
146     1       52.365726000    195.225.188.193 192.168.1.251   0x1065
148     1       52.407958000    195.225.188.193 192.168.1.251   0x1066
153     1       52.455524000    195.225.188.193 192.168.1.251   0x1067
158     1       52.706312000    195.225.188.193 192.168.1.251   0x1068
174     0       55.310021000    195.225.188.193 192.168.1.251   0x0000
177     0       55.352403000    195.225.188.193 192.168.1.251   0x1f26
178     0       55.352960000    195.225.188.193 192.168.1.251   0x1f27
181     0       55.403374000    195.225.188.193 192.168.1.251   0x1f28
182     0       55.415094000    195.225.188.193 192.168.1.251   0x1f29
183     0       55.415096000    195.225.188.193 192.168.1.251   0x1f2a
184     0       55.415097000    195.225.188.193 192.168.1.251   0x1f2b
187     0       55.459248000    195.225.188.193 192.168.1.251   0x1f2c
189     1       55.535121000    195.225.188.193 192.168.1.251   0x1069
190     1       55.535125000    195.225.188.193 192.168.1.251   0x106a
193     1       55.576531000    195.225.188.193 192.168.1.251   0x106b
283     2       59.679644000    195.225.188.193 192.168.1.251   0x0000
289     2       59.728471000    195.225.188.193 192.168.1.251   0x4d90
290     2       59.731326000    195.225.188.193 192.168.1.251   0x4d91
291     2       59.733016000    195.225.188.193 192.168.1.251   0x4d92
293     2       59.775033000    195.225.188.193 192.168.1.251   0x4d93
297     2       59.887827000    195.225.188.193 192.168.1.251   0x4d94
298     2       59.891748000    195.225.188.193 192.168.1.251   0x4d95
300     2       59.971949000    195.225.188.193 192.168.1.251   0x4d96
301     2       59.972342000    195.225.188.193 192.168.1.251   0x4d97
302     2       59.972343000    195.225.188.193 192.168.1.251   0x4d98
307     2       60.015764000    195.225.188.193 192.168.1.251   0x4d99
313     3       60.512457000    195.225.188.193 192.168.1.251   0x0000
315     4       60.512927000    195.225.188.193 192.168.1.251   0x0000
319     3       60.555590000    195.225.188.193 192.168.1.251   0x5cdd
320     3       60.556136000    195.225.188.193 192.168.1.251   0x5cde
322     4       60.559388000    195.225.188.193 192.168.1.251   0x2f90
323     4       60.560549000    195.225.188.193 192.168.1.251   0x2f91
327     3       60.612383000    195.225.188.193 192.168.1.251   0x5cdf
328     4       60.657790000    195.225.188.193 192.168.1.251   0x2f92
329     4       60.665440000    195.225.188.193 192.168.1.251   0x2f93
330     4       60.665446000    195.225.188.193 192.168.1.251   0x2f94
334     4       60.713424000    195.225.188.193 192.168.1.251   0x2f95
335     5       60.741567000    195.225.188.193 192.168.1.251   0x0000
338     5       60.793973000    195.225.188.193 192.168.1.251   0x0093
339     5       60.794927000    195.225.188.193 192.168.1.251   0x0094
342     5       60.847745000    195.225.188.193 192.168.1.251   0x0095
343     5       60.848353000    195.225.188.193 192.168.1.251   0x0096
344     5       60.848354000    195.225.188.193 192.168.1.251   0x0097
347     5       60.889826000    195.225.188.193 192.168.1.251   0x0098

Regards
Kurt

permanent link

answered 21 Jun '12, 13:18

Kurt%20Knochner's gravatar image

Kurt Knochner ♦
24.8k1039237
accept rate: 15%

edited 21 Jun '12, 13:58

Hi Kurt,

The response from the server admin is that there is no load balancer in place since there is only a single server involved, so it looks like we draw a blank on that one.

I had a longish discussion with the lady who is our interface point this morning and she told me the app was designed to run on a dedicated corporate network without any 'outside' internet connectivity. In this context she said her telco and server admin people think the issue is related to the relatively slow speed connection we have to their corporate network, and possibly other components like additional firewalls required to protect them from non-approved access attempts. We have an ADSL link giving us 8 mbits download and up to 1 mbit upload - there is no vast amount of data transfer involved in this app and in speed tests we get very near to the rated speeds and our line utilisation is very low.

I'm a bit reluctant to accept this explanation since I'm not sure the relative slowness of our link compared with a corporate LAN (which will incorporate remote links over many miles) should give rise to these sort of problems.

What do you think? regards, Bernard

(22 Jun '12, 04:13) Bernard46

Bernard,

O.K. if there is no load balancer in place, the "jumping" IP IDs, still indicate that something interferes with the traffic. Could be a firewall with Layer 7 security (Content Scanning), a WAF (Web App Firewall), or similar things.

However, there are other "issues" in the capture file as well, which require the use of Fiddler.

I found several TCP connections, where the client does not send any data for a few seconds. Take a look at this:

tcp.stream eq 27: Frame 451 - 453. Delta: 15.2 seconds  
tcp.stream eq 51: Frame 952 - 960. Delta: 15.4 seconds  
tcp.stream eq 52: Frame 1069 - 1070. Delta: 5.8 seconds  
tcp.stream eq 73: Frame 1473 - 1474. Delta: 8.2 seconds  

Without further information about the application (ActiveX or Java used), and without insight into the decrypted traffic, it's hard to say why the client waits that long before it sends data to the server. It's most certainly the HTTP GET/POST, as it's allway the first packet with 'Application Data' from the Client.

I really suggest to use Fiddler!

(28 Jun '12, 03:04) Kurt Knochner ♦

Kurt, Thanks for your feedback and the time you are spending on this - I'm sorry I can't reply in a more positive manner to some of your suggestions about follow-up actions but the issue is that I'm helping a volunteer organisation to run a library but the local government organisation they have to interface with provide us with a PC to access their system for managing the Library Stock. Unfortunately this system is giving us a load of problems but as you might expect we are finding it difficult to work with the organisation at a detail level because they quote government security regulations at us whenever I suggest anything. It is not as if we are dealing with state secrets (actually its library books!) but it is very difficult to get them to do specific things for us - I'm trying to identify patterns in the problem from the outside so that I can try to persuade them to look at particular areas for further investigation.

In your earlier analysis you mentioned increasing ip ids - I asked the question you posed about load balancers and as I mentioned they replied there was only one server and no load balancer. However I have to admit I did not really understand what you meant by increasing ip ids - as far as I could see there was only one ip address involved at the server end - were you referring to the port number? There could certainly be firewalls and other 'security' mechanisms in place because as I have mentioned we are a 'non-trusted' external organisation, so we are not allowed to access the system through the same physical and logical path as the organisation's own libraries. It could well be that these additional security hurdles are causing the problem, but what I need is some evidence that points to this so that I can get them to investigate - hence my efforts and questions here.

One thing that does bother me is that whenever we do get a problem we see a flurry of NTP records from the client - it seems to be asking 4 different time servers for the time and makes around 7 requests - each of these requests incurs a delay of 5 seconds or more. Now the clock on this client is 7 minutes ahead of UTC time - I'm wondering if this out-of-sync time or flurry of NTP requests are anything to do with our problem - incidentally each of these problem scenarios ends with an "encrypted alert" message. Does any of this help at all or just confuse the picture?

Bernard

(30 Jun '12, 15:12) Bernard46

because they quote government security regulations at us whenever I suggest anything.

That's a convenient way to say: It's not our problem ;-)

I have to admit I did not really understand what you meant by increasing ip ids

don't care about the IP IDs any longer. If there is no load balancer, the problem I suspected (sticky sessions) is not the reason.

Regarding the NTP traffic. Look at your original capture file with a filter on the client IP address (Tue). You will see NTP traffic, but there is no relation between the NTP requests (only few) and the SSL connections.

I don't think that the NTP requests are a problem, at least they are not a problem in the capture you posted.

Here is another idea: Can't you force the PC to use a proxy (Fiddler on another PC) by changing the proxy settings of IE?

If that does not work (because IE locked is down), you could add a local DNS server (LLMNR) that answers the clients WPAD requests. See WPAD on Wikipedia.

If that does not work, you could still add a transparent proxy between the client and the LAN and do SSL decryption there (SQUID). However, that's a lot of effort for a volunteer project.

Maybe it's easier to insist that the local government organisation provides a working solution.

BTW: Did you try to use that PC on another network?

(01 Jul '12, 03:08) Kurt Knochner ♦
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×14
×11

question asked: 21 Jun '12, 08:19

question was seen: 7,011 times

last updated: 01 Jul '12, 04:15

p​o​w​e​r​e​d by O​S​Q​A