I have started seeing a strange issue on a new platform I am using. It is a Microsoft hyper-v environment with virtual guests running XP. I installed the latest wireshark/winpcap and started testing some applications but noticed that the timestamps do two very strange things.
In both of these instances, I am running it in a pretty focused way (no more than about 4000 packets/hour) and often Wireshark will crash with the "Microsoft visual c++ runtime error" message. asked 24 Sep '13, 12:28 ipchains1 edited 24 Sep '13, 12:34 |
One Answer:
Strange, indeed. I have seen things like that when capturing on multiple NICs at the same time - do you capture on a single card, or are there multiple cards involved? Out of sync by 3.14 seconds is A LOT - are you sure that it's not milliseconds or microseconds (just making sure)? I suspect that the system time of your VM gets adjusted by the host, and that causes the jump in time. Keeping time in VMs is a big problem because the OS doesn't get hardware interupts in regular intervals to advance the time, because the hardware interupts are received by the virtualization host. So the time of the VMs is often advanced in unregular intervals, and sometimes the virtualization host has to "push" the time ahead. This can lead to time jumps into the future. If the VMs use other time setting mechanisms like NTP or the time from the Domain Controller, it may get adjusted backwards. All in all, this can lead to very strange timestamps when you capture inside a VM. If your timestamps are off, but the frames are in the correct order when sorting by time, you can use reordercap (installed together with Wireshark) to create a correctly ordered trace file from your existing file. Oh, and the "packets per hour" do not count with regard to Wireshark crashes, it's the total sum of packets that does it. See my blog post at http://blog.packet-foo.com/2013/05/the-notorious-wireshark-out-of-memory-problem/ about the reasons for this behavior. answered 24 Sep '13, 13:38 Jasper ♦♦ edited 24 Sep '13, 22:27 showing 5 of 34 show 29 more comments |
Thanks for the feedback. The hosts are doing NTP time sync but I suspect that doesn't stop the VM host from messing with their clock; but why would it be doing it on a strange 20 minute pattern? The problem with the timestamps is that they are demonstrably false, request/reply sequences are shown in order (I am mostly looking at snmp traffic) but the time stamp insists the reply arrived before the request. As to the 3.14 figure, I got that by switching to "time since last captured packet" and many of them show up as -3.14... and while it seemed odd that they were right around pi, I doubt that there are relevant trig functions in wireshark (or are there?)
I do mostly capture with two interfaces selected but I have tried one session (I will do more after your suggestion) with only one interface and it still crashed. The captures are always "goofing" in less than 20 minutes, and always crash after less than two hours (less than 10,000 packets total, barely over 1MB of data).
Can you check the memory use of the Wireshark process during the run too determine if it is running out of memory?
I have seen broken NTP servers that distributed a false time. You could monitor the NTP traffic with Wireshark to figure out if there are time leap signs in the NTP protocol.
Regarding the two interfaces - I asked that in correlation to the negative time stamps, not the crashing. Crashing is a out-of-memory situation caused by the amount of packets and their meta data. Can you confirm that there are no negative time stamps if you capture on just one interface?
The NTP is the windows default which I think only updates every 7 days or something like that. The captures don't show any NTP traffic, the time jumps do seem to be common during a group of STP packets but those are constant anyway so it is probably not related. The rollback is always 1200 sec and is never on a wall clock interval (i.e. never on the hour or half hour or even to the minute). And yes, the invalid timestamps/jumps have so far only happened during captures across two interfaces.
As for memory yes that does seem to be causing the crashes. The memory usage is growing by about 2 megabytes per packet received, so after 30 seconds of capture I have about 30 packets (a quiet day on the network) and the memory has skyrocketed, from about 100mb at startup to almost 200mb.
okay, then the time jumps are explained. They happen because frame time stamping is done by the NIC, and then delivered to the capturing program. When two NICs compete to deliver frames it sometimes happens that one NIC delivers sooner than the other, basically leading to older frames appearing after newer frames in the trace. Since the time stamp was already assigned you'll end up with negative delta times. Use reordercap to reorder your captures after recording them, so that the TCP expert doesn't get crazy :-)
I was capturing on two interfaces but all the packets for that particular exchange should have only been on one interface since it was specific to one subnet (the two interfaces are used in a multihome setup and not a redundancy setup) so unless there is something untoward going on with the way hyper-v manages the network cards, I would think the offset cant be explained that way. Also, the -1200s offset happened today on a capture set up for just one interface (I am using dumpcapgui now to avoid crashes and only specify one interface in a session).
what is dumpcapgui?
BTW: Where do you capture the traffic? In the virtual machine (XP) or on the hyper-v host?
@Kurt: DumpcapGUI - http://www.netskillz.com/dumpcapui/ I also added it with a screen shot at the end of my post at http://blog.packet-foo.com/2013/05/the-notorious-wireshark-out-of-memory-problem/
Thanks. http://bit.ly/184xOj2 did not show that much when I was searching. It's showing up now, maybe due to this question :-))
Dumpcapgui is great since I have had so much crash trouble lately. I can start dumpcapgui working, and open the cap file in wireshark and refresh it as needed. Sadly wireshark still likes to crash when browsing files > 20k-30k packets but I can restart it easily, and the capture marches on without losing anything.
To Kurt's question re: the capture location, I am capturing within the XP guest and not on the Hyper-v host so I can see what the OS experiences, since Hyper-v has none of the really cool promiscuity features like VMware to make off-guest capturing practical.
O.K. to figure out if you experience time jumps, you could write a short script that records the current date/time into a file. As soon as you detect those time stamp jumps, you could check the file to see if it was the OS time that jumped. If the OS time is 'stable', it must be another strange effect. Is it possible for you to do that and report back, as I'm always interested in these little nasty hard to kill bugs :-))
Sure, I will run a script for 1 hour that takes the windows timestamp from a vbscript timer function call (the best way I can think of on short notice) and also run a wireshark cap for the hour and see how the two compare. If you have a more specific test I would be happy to try it.
-update: I just ran the test, it only took a few minutes before I saw the first time jump. The data the script gathered is pitch perfect, the vbscript timer stamped out values that matched the wall clock during the same time that the wireshark stamp jumped back 1200 seconds.
O.K. Actually you just need the time with a resolution of a second, as the time leaps are much bigger than a second. However, you should record the time every second, if possible with vbscript, every 1/10 of a seconds, just to be sure not to miss short time leaps.
let's wait for the results and then think about more/other tests ;-)
Argh, I cant post imgur links and as punishment it throws away my comment? Is there a good way to share images? I took a screenshot of the shark cap and the log file formatted in excel, that shows the weird loop in 'shark and a normal march of time in the vbscript log.
How about this, if you want to see it make your way to imgur and use the image code CPUYCp5 (just put it after the normal url of that site).
well, anyone of these.
or any other ;-)
BTW: I 'upped' your question, which gave you some karma. Can you please try to upload an image now?
If it still does not work, use one of the above.
Nope, it still throws it away.
O.K. can you upload on one of the sites I mentioned and post the link?
I think any url I use is flagged as spam. Tried tinypic and imgur and both were rejected.
can you send me an e-mail? click on my name for the address.
May I post the images here?
@Kurt, sure, you can post the images there is no sensitive info in those packets.
Here we go. As we can see, there is a time leap in the capture file, but not in the recorded time (see graph). Can you please filter the capture file you sent me (filter: tcp.stream eq 0) and post the exported frames on cloudshark.org ? Maybe somebody else has an idea. I can't explains what's going wrong here.
One last idea: Did you check the properties of the virtual interface in the virtual machine. Does the driver support TCP/IP Offloading or anything that sounds like time stamping? If so, please disable all of those features and try again (maybe after a reboot).
Was there a reason to focus on tcp stream 0? I took a capture and filtered for STP (since it is a constant heartbeat every 2 seconds) which demonstrates the issue nicely. I also posted it to cloudshark but as usual I cant post any URL since it is rejected as spam. So, if you want to see it you can use this tag at the URL of cloudshark: http://www.cloudshark.org/captures/e8ecdd72fcf2
Interestingly, cloudshark doesn't recognize the negative timestamps, instead it starts again at 0 and it also ignores the last 200 packets or so (the capture has 866, the cloudshark window only shows 664) so apparently they don't decode the same way wireshark does.
Just to limit the amount of data.
I added the link for you.
BTW: Did you check the TCP/IP Offloading (see my last comment)?
I checked the interface info in Wireshark and at the OS level, and there appear to be no offloading options. It is using the "Microsoft virtual machine bus network adapter".
well, then it's either a bug in WinPcap or a bug in Hyper-V. Can you please capture with Microsoft Network Monitor and check if you see the same effect.
If no, I'd say there could be a WinPcap bug.
If yes, it's more likely to be a 'bug' of Hyper-V.
After running the two side by side the Microsoft NM 3.4 capture doesn't show any strange rollback when Wireshark does. It does have an odd stamp that runs 44 seconds faster than the clock on the virtual OS (so packets appear timestamped 44 seconds in the future). The "offset" column does not appear to be accurate since the first packet (seen <2 sec after start) is stamped with an offset of 44s-45s. The clocks in the virtual guest and virtual host are synced to the same NTP server (And I forced them both at the start of this test) so it is not explainable by clock drift. After observing through a rollback in Wireshark (where wireshark started stamping 20m in the past) the MS NM window still showed the same +44 offset.
O.K. sounds like a big problem with WinPcap and a minor problem with Network Monitor ;-) Maybe you just replace your hypervisor with a product that does not cause any trouble :-))
If it were my hypervisor I would do that, but alas. I tried one last effort at gathering clues, I ran the capture from two different guests at the exact same time (as near as I could get, < 2 sec start difference) and they both looped -1200s at close to the same time, 13 seconds apart, which is outside the window that would be explained by the startup delay. So, whatever is doing this is slightly nondeterminstic and not instigated by the hypervisor.
Prior to this test I also synced all clocks via NTP, and then disabled NTP on the host and guests just to avoid re-sync issues.
What is your host OS version?
Server 2008 R2 STD with SP1. One other clue I gathered (in case anyone else comes along to pick up the torch) after looking at way too many capture files is that the loop is actually -1199.85 seconds (+/- 10ms) which I thought was odd, it never centered on 1200 +/-.
maybe a pcap modifying hoax malware ;-))