I 've been trying to get trace in a network where we exchange SNMP messages between a server and a PC! In some SNMP messages that are sent by the Server and never reach the PC - while other do -, I am seeing a bad checksum message on the wireshark analysis! I also have a print screen of this capture! Can anybody advise on this!
This question is marked "community wiki".
asked 14 Jan '11, 03:01
Okay, first of it is important to know where you are capturing the data. If you're capturing either on the server or the PC your checksum errors are most likely caused by the checksum offloading mechanism of your network card, and not a real error. There are tons of questions in this board where you can look that one up in greater detail.
If, on the other hand, the frames with checksum errors are exactly those SNMP messages that do not reach the PC you have a problem where some device damages frames somewhere in your network. That could be caused by a faulty network interface, a bad cable, strong interference and (sadly enough, I have to say, but very rare) by a network aggregation tap. So if I were you I'd determine if the checksum errors correlate to the missing SNMP messages, and if so, find the source of the damaged transmission (check CRC error counters on your switches/routers).
If you have checksum errors on packets that DO get received you can probably rule checksum errors out as a cause. In that case you'll have to move your Wireshark box(es) along the communication path and track down the location where the frames go in but not out on the other side. Best way to do this is doing a simultaneous capture on both sides to have something to compare. It can be a very tedious process, but usually there's no way around it.
answered 14 Jan '11, 04:07
I can verify that it does not seem like just a problem of false alarms. I have checked both Server and Client PC and in all the cases which I have checked when the checksum error occured then there was no message reaching the client PC. If it was the first case where the checksum is caused by the offloading mechanisms, wouldn't I get the message on the other side? Also I need to add that this Server transmits in several Client PCs and not all of the experience this malfunction and even among the ones that do not correctly get their messages not all of them have this behavior all the time! Any ideas?
answered 17 Jan '11, 03:08
Yes, you are correct, if offloading would cause the CRC errors you'd still see the same packets but with valid CRCs on the recipient side. So I guess we can rule that out.
Okay, before we go into the quite annoying option to track down a packet "destroyer" on the network - have you checked if the clients that do not receive the packets have no local firewall software running that blocks them? I would probably verify that the frame never gets to the client at all by tapping into it's cable directly and capture with a passive Wireshark PC just to make sure, but first I'd try to turn of any personal firewall on the client and see if the packets get through.
The other possibility is a device on the path from server to client is sometimes damaging packets. If you say it doesn't happen to all clients you can try to identify what the affected clients have in common (same physical path?). Check all switches and routers for their error counters over a period of time and investigate if they go up, and maybe that even corresponds to the lost messages. If you can't do that it gets tricky, because then you have to track down the location where the packets get corrupted yourself. I usually do that by capturing on both sides of each device in the path from sender to receiver until I find the device where the packet goes in okay and comes out damaged (or even lost, so no coming out at all). This is usually very time consuming work and no fun at all.
answered 17 Jan '11, 05:24
Jasper thank you very much for your answer!
I can verify that all the recipients "wear" the same software! And we are not talking about 1-2 PCs but more like 200! On which I can experience this faulty behavior on 20 of them overall! And there is no firewall set on those machines! So it seems like a network issue rather than anything else from what we've been talking about!
answered 18 Jan '11, 00:05
Agreed, this looks like a network issue. I think either some device is damaging packets or you might have a congested link somewhere that results in your SNMP packets being dropped due to performance reasons. They are usually the first to get axed because they're UDP and low priority anyway. Maybe you even have strict QoS settings that are a reason for it. Good luck tracking it down, and let me/us know if you have further questions.
answered 18 Jan '11, 02:50
SNMP is UDP so the delivery is best effort.
Try a continuous PING from client to server and see if any ICMP packets are dropped.
Is this a Network Monitor of some type, PRTG, MRTG Orion etc?
answered 18 Jan '11, 10:01