This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

HELP! Looking at a specific (HTTPS) TCP stream (1936 packets) I start to get Malformed Packet: SSL after packet 782

0

All,

we're baffled with an issue encountered. We're monitoring (using tshark) off an inline TAP sitting between a client browser (pc) and web server. I can see the 3-way handshake incl. the SSL v3 handshake thereafter. I can see the GET / requests from client to server incl. the 'Continuation or non-HTTP traffic' from the server back to the client (in producing the HTTP response). All fine so far.

At packet 782 we start to see 'Continuation Data[Malformed Packet]' messages and I'm unable to decrypt the conversation thereafter... No more SSL dissector :(

It's also at this stage where I get heaps of duplicate ACKs from the client back to the server. The server then responds with a number of TCP retransmissions. This continues till the connection is closed (RST).

I don't see any dropped packets from the captured interface so we should have all the segments and hence be able to decrypt it, no? We really have our hair in a twist with this one :) I'm happy to share the pcap sample with anyone keen enough to help. I've searched the Internet and this forum and I didn't come up with anything tangible.

Please help!

Thx

Jaco Greyling

asked 05 Dec '12, 08:23

jgreyling's gravatar image

jgreyling
1111
accept rate: 0%

We've been seeing this behavior, started about a year ago, and is getting more and more common. Only thing we could do until now is to exclude the Sending side and the network Probe connected inline to be the culprit. The Receiver side seems to be OK too. It is as if something on the wire is starting to force TCP/IP congestion handling to be overall brutal in dealing with the traffic.

What has been observed so far is that the TCP Window is filled up pretty fast, but is not reduced until after some time. We see most ACK's (up to 100) for SEQ #1 and the gradually it goes down until the windows size is adapted to the network speed. The fact that several TCP Window Full messages show up tends to pioint into a dynamic rerouting into network of different speed to happen during the entire conversation. Could that be the case ? Measurements have pointed this to happen mainly on g3 initiated connections. Manually reducing the speed (traffic shaper) did not really bring any conclusive answer/hint.

However - even if this is the the case - why would we all of a sudden have so many problems decrypting the traffic in a Monitoring env. using a MiMa method with private key etc. ? We are using a probe that relies on the OSI application layer, and requiring the HTTP code to be decrypted correctly so that it can be analyzed for errors (It's quality/SLA Monitoring done here, not security)

Any hints ? What to check for ? we'd really only like to understand why it is happening... Thx.

(06 Dec '12, 01:01) Smurphy

2 Answers:

0

What kind of TAP is your "inline" TAP? If it is a link aggregation TAP (meaning that it outputs RX/TX on one single port to the tshark PC) you may encounter one of the frequent problems with link aggregation TAPs. I have plenty of cases where those TAPs got me or someone else in trouble, for example when it did not deliver packets that were really on the link (which had a 10% load, so no buffering should occur), resulting in missing packets in the trace file. Other cases include inducing CRC errors to a connection between a Cisco switch and a router that were gone as soon as we removed the TAP. Both cases happened with major vendor devices, so they were no cheap knockoff but worth a four digit number of dollars each.

Personally, I only use full duplex TAPs to avoid that kind of trouble, since it is annoying to be unsure if the TAP worked correctly or not, and in case of link aggregation TAPs I had too many bad experiences.

answered 05 Dec '12, 08:34

Jasper's gravatar image

Jasper ♦♦
23.8k551284
accept rate: 18%

Thank you for the prompt response!

The TAP is a VSS Monitoring TAP Model: V12.4C.C-F-V3 running software version 2.5.65

The thing is in an attempt to eliminate the TAP, I ran a Wireshark capture from my laptop to compare it with the pcap from the monitoring device. I used a mobile 3G connection and I got over 200 malformed SSL packets just with the secure website login?! So the fact that I could reproduce it on the 'live' wire vs. TAP tells me it's not the TAP (am I wrong in my synopsis?) I did the same exercise over a fixed line and I got 0 malformed SSL packets. How do you explain that?

(05 Dec '12, 21:32) jgreyling

When I looked at it, I didn't pick up any missing packets...(I compared the sequence numbers with the acknowledgements) so surely the SSL libraries should be able to decrypt the conversation (irrespective of the 'quality' of the network - retransmits, dup ACKs, etc.), right? Under what conditions won't Wireshark be able to decrypt a specific packet?

Again, I have plenty of samples if someone would like to take a look, we're really stumped :(

(05 Dec '12, 21:35) jgreyling

Do I understand this correctly: 1. You captured on the client where you opened the SSL session over 3G and had errors 2. You captured on the client using a fixed line and had no errors 3. Capturing at the TAP on a fixed line you had errors

Can you post the samples on www.cloudshark.org and give us the links with an explanation what the capture setup was for each sample?

(06 Dec '12, 01:50) Jasper ♦♦

Sure thing, here is the upload:

http://www.cloudshark.org/captures/175e88a3891b

Please note that I was unable to upload the SSL session key as well? Please let me know and I can email the session key to you directly?

41.160.153.130 is the client and 196.11.125.154 is the web server. In this example I captured it off the inline TAP.

http://www.cloudshark.org/captures/261e0b82e70d

In this example I captured off my PC using Wireshark. Here I am the client (192.168.1.101) connecting to the same server. Here I'm not capturing off the TAP but on the live wire so to speak.

Thx for your time!

(06 Dec '12, 03:43) jgreyling

Sure, try sending the key to jasper [at] packet-foo.com, but only if the key can be shared and isn't secret (meaning: it should not be a production key).

(06 Dec '12, 04:31) Jasper ♦♦

I'm correct in saying, if I share the session key (export in Wireshark) you will ONLY be able to decrypt that one session, right?

(06 Dec '12, 06:20) jgreyling

For the session key that is correct AFAIK. Please do not share the private server key, because with that any session can be decrypted.

(06 Dec '12, 07:31) Jasper ♦♦

Okay, looks to me like there is a problem with the SSL decoding module, since as you already mentioned it starts running into exceptions in packet 782 for no apparent reason - all packets up to that number are seen in the trace with nothing missing. I guess it would make sense to open a bug report in the bug tracker for this. Maybe a developer can take a look and determine the cause for the exception of the decoder.

(06 Dec '12, 14:28) Jasper ♦♦

On a side note - the client is pretty slow when acknowledging packets - which leads to window full messages. At the beginning the ACK times are alright, but they degrade continuously even though the client window size climbs to 64k.

As a result, the server becomes impatient and retransmits frames that are not really lost (just acked too late). The client clearly indicates that the retransmissions it receives are duplicates by using SACK options with edges below the acknowledge number, so it really got the original AND the retransmission. I haven't checked all frames, but it seems consistent.

(06 Dec '12, 14:42) Jasper ♦♦

Yes - that's what we experience too. For no apparent reason, decryption stops. On the other side though, the clients are not that slow (Standard laptop (2.4GHz CPU, 8GB Ram, 7200RPM HD) which usually has no problem using it through ADSL Link, where that issue won't show up. IMHO - it's in the PATH from the Server to the client that something is slowing things down that much. Can that be ? And - can the interaction in (slowdown, congestion handling etc. ?) the PATH cause such a sudden decryption capability problem ? Thx

(07 Dec '12, 00:43) Smurphy
showing 5 of 10 show 5 more comments

0

All-

Thanks to Jasper and Evan this issue is now resolved. It turned out to be a software bug in the SSL dissector after all. Evan committed the change to the trunk (revision 46510) and we've tested it this morning.

Log Message: Use the complete fragment length to reassembly SSL frames. The old method of picking them up one at a time failed on jumbo frames.

The full report can be found here: https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=8075

Thanks

Jaco

answered 12 Dec '12, 06:21

jgreyling's gravatar image

jgreyling
1111
accept rate: 0%