This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

NLB only works with Wireshark capturing

0

How is this for weird, we have been having some issues with a Microsoft NLB cluster so I put wireshark on one of the nodes, stopped and powered off the second node. The cluster stopped working after a short while (part of the issue we have been trying to resolve). This means that hitting the vIP for the cluster did not return the website hosted on the server. I ran wireshark on my laptop, saw the request and then the TCP retransmission. No replies at all. I then ran wireshark on the server, and immediately the site works. I stop capturing (wireshark still running) and it stops working. Start capture, starts working. I can reproduce this at will. Anyone have any idea what is causing this?

The server is Server 2012R2 as a VMware VM. (ESXi 5.5). Running NLB set up in IGMP Multicast mode. IIS 8.5 with ARR 3. Wireshark is 1.12.2 PortableApp. WinPCAP is 4.1.0.2980

Other information that might be useful: Static ARP is done on a Watchguard firewall There is also a static ARP entry and CAM table entries on the Cisco 2960 stack connected to all the hosts. Routing for this VLAN is done on the watchguard

asked 03 Feb '15, 03:55

cakelord's gravatar image

cakelord
11114
accept rate: 0%


One Answer:

2

When you capture traffic with Wireshark the NIC will be put into promiscuous mode by default. This means the NIC will forward all frames to the OS. In normal (non-promiscuous mode) the NIC only forwards:

  • Unicast frames for the mac-address of the system
  • Broadcast frames
  • Multicast frames, but only for the multicast groups to which the system was subscribed

So, if starting a wireshark trace makes the setup work, it is most probably because some frames were rejected by the NIC otherwise. Most likely the multicast frames.

You can do a little test by making a trace with wireshark while you disable the option to put the NIC in multicast mode (go to capture settings, double-click on the interface on which you want to capture and disable "prmiscuous mode"). My bet is that now the server won't respond, just like when no capture was made.

You can look at the differences between the capture with promiscuous mode on and the capture with promiscuous mode off to learn which frames were dropped by the NIC. I'm no NLB expert, but my bet is you need to configure something on the server to make it accept the multicast traffic in a multicast NLB setup.

answered 03 Feb '15, 04:16

SYN-bit's gravatar image

SYN-bit ♦♦
17.1k957245
accept rate: 20%

I think that is most likely the issue, restarting the server and un-installing WinPCAP has made it work. That makes me think that when I stopped a capture in wireshark (I will have run one before and just forgotten) I will have taken the NIC out of Promiscuous mode and NLB didn't know to put it back into that mode. I have a snapshot before these changes I will do some testing on later and see if I can confirm it. Thanks

(03 Feb '15, 04:25) cakelord

also, VMware and Microsoft NLB can get into trouble when the ESXi sends RARP frames to the network, e.g. when a VM moves via vMotion or when a snapshot is taken/removed.

(03 Feb '15, 14:15) Jasper ♦♦