This is our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

We had an event in our network that generated an ARP storm. There were two contributors to this: 1) Gratuitous ARP Requests from VMware: at 09:35:02, we started seeing Gratuitous ARP's with Sending Eth MAC of particular VMGuest, Destination Eth MAC = broadcast (ff's) In ARP:

  • Sender MAC all zeros,
  • Sender IP all zeros,
  • Target MAC All FF's,
  • Target IP all Zeros

This gradually increased to a very high level from multiple VM Guests across every VLAN that the ESX host has on its trunk. Trying to understand:

  • Has anyone seen this before?
  • Why are there All zeros in those ARP fields?
  • What could generate this traffic?

2) Once this hit one of our VLAN's, I had GARP Replies from 5 different hosts with his own MAC in Sender MAC but all zeros in the other Sender/Target fields. This huge storm led to a collapse of our older core switches because the ARP storm reached >2GBps. Question on this traffic is: - Why would a host reply to the GARP in this way?

Packet dissection below for request and reply.

Thanks in advance for help.

-Tim

GARP Request:
 No.     RelativeTime   Delta Display  Time                          Source                Src Port Destination           vlan_id    Length Info 
2715936 96.183749      0.001018       2016-05-06 09:36:06.341627    Vmware_ac:00:57                Broadcast             2081,127   68     Gratuitous ARP for 0.0.0.0 (Request)

Frame 2715936: 68 bytes on wire (544 bits), 68 bytes captured (544 bits) Encapsulation type: Ethernet (1) Arrival Time: May 6, 2016 09:36:06.341627000 Eastern Daylight Time [Time shift for this packet: 0.000000000 seconds] Epoch Time: 1462541766.341627000 seconds [Time delta from previous captured frame: 0.000000000 seconds] [Time delta from previous displayed frame: 0.001018000 seconds] [Time since reference or first frame: 96.183749000 seconds] Frame Number: 2715936 Frame Length: 68 bytes (544 bits) Capture Length: 68 bytes (544 bits) [Frame is marked: False] [Frame is ignored: False] [Protocols in frame: eth:ethertype:vlan:ethertype:vlan:ethertype:arp] [Coloring Rule Name: ARP] [Coloring Rule String: arp] Ethernet II, Src: Vmware_ac:00:57 (00:50:56:ac:00:57), Dst: Broadcast (ff:ff:ff:ff:ff:ff) Destination: Broadcast (ff:ff:ff:ff:ff:ff) Address: Broadcast (ff:ff:ff:ff:ff:ff) .... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default) .... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast) Source: Vmware_ac:00:57 (00:50:56:ac:00:57) Address: Vmware_ac:00:57 (00:50:56:ac:00:57) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) Type: 802.1Q Virtual LAN (0x8100) 802.1Q Virtual LAN, PRI: 1, CFI: 0, ID: 2081 001. .... .... .... = Priority: Background (1) ...0 .... .... .... = CFI: Canonical (0) .... 1000 0010 0001 = ID: 2081 Type: 802.1Q Virtual LAN (0x8100) 802.1Q Virtual LAN, PRI: 0, CFI: 0, ID: 127 000. .... .... .... = Priority: Best Effort (default) (0) ...0 .... .... .... = CFI: Canonical (0) .... 0000 0111 1111 = ID: 127 Type: ARP (0x0806) Padding: 00000000000000000000 Trailer: 0000000000000000 Address Resolution Protocol (request/gratuitous ARP) Hardware type: Ethernet (1) Protocol type: IPv4 (0x0800) Hardware size: 6 Protocol size: 4 Opcode: request (1) [Is gratuitous: True] Sender MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00) Sender IP address: 0.0.0.0 Target MAC address: Broadcast (ff:ff:ff:ff:ff:ff) Target IP address: 0.0.0.0

GARP Reply: No. RelativeTime Delta Display Time Source Src Port Destination Dst Port vlan_id Length
1571028 0.984442 0.000000 2016-05-06 09:37:23.970442 Netscout_02:ce:e2 00:00:00_00:00:00 2340,228 68 Gratuitous ARP for 0.0.0.0 (Reply)

Frame 1571028: 68 bytes on wire (544 bits), 68 bytes captured (544 bits) Encapsulation type: Ethernet (1) Arrival Time: May 6, 2016 09:37:23.970442000 Eastern Daylight Time [Time shift for this packet: 0.000000000 seconds] Epoch Time: 1462541843.970442000 seconds [Time delta from previous captured frame: 0.000000000 seconds] [Time delta from previous displayed frame: 0.000000000 seconds] [Time since reference or first frame: 0.984442000 seconds] Frame Number: 1571028 Frame Length: 68 bytes (544 bits) Capture Length: 68 bytes (544 bits) [Frame is marked: False] [Frame is ignored: False] [Protocols in frame: eth:ethertype:vlan:ethertype:vlan:ethertype:arp] [Coloring Rule Name: ARP] [Coloring Rule String: arp] Ethernet II, Src: Netscout_02:ce:e2 (00:80:8c:02:ce:e2), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00) Destination: 00:00:00_00:00:00 (00:00:00:00:00:00) Address: 00:00:00_00:00:00 (00:00:00:00:00:00) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) Source: Netscout_02:ce:e2 (00:80:8c:02:ce:e2) Address: Netscout_02:ce:e2 (00:80:8c:02:ce:e2) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) Type: 802.1Q Virtual LAN (0x8100) 802.1Q Virtual LAN, PRI: 1, CFI: 0, ID: 2340 001. .... .... .... = Priority: Background (1) ...0 .... .... .... = CFI: Canonical (0) .... 1001 0010 0100 = ID: 2340 Type: 802.1Q Virtual LAN (0x8100) 802.1Q Virtual LAN, PRI: 0, CFI: 0, ID: 228 000. .... .... .... = Priority: Best Effort (default) (0) ...0 .... .... .... = CFI: Canonical (0) .... 0000 1110 0100 = ID: 228 Type: ARP (0x0806) Padding: 00000000000000000000 Trailer: 0000000000000000 Address Resolution Protocol (reply/gratuitous ARP) Hardware type: Ethernet (1) Protocol type: IPv4 (0x0800) Hardware size: 6 Protocol size: 4 Opcode: reply (2) [Is gratuitous: True] Sender MAC address: Netscout_02:ce:e2 (00:80:8c:02:ce:e2) Sender IP address: 0.0.0.0 Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00) Target IP address: 0.0.0.0

asked 07 May '16, 09:39

CMH_Tim's gravatar image

CMH_Tim
11226
accept rate: 0%

edited 07 May '16, 19:53

Apologize for poor formatting...first time posting - wasn't sure how to maintain clean view.

PLease advise if repost is needed.

(07 May '16, 09:40) CMH_Tim

Thanks, Jasper for the formatting assist! What button should I have hit?

(07 May '16, 19:51) CMH_Tim

Forget about the buttons as many of the necessary ones are missing. Press "edit" under the Question or Answer post (the page layout is different when editing comments); to the right from the text entry pane with the buttons above, there is another (read-only) one, called "Markdown Basics", with a link to "learn more about Markdown". And look for "code" there.

Or edit your own Question and see what formatting characters Jasper had to add so that the text would look that way.

(08 May '16, 01:25) sindy

I don't think that the ARP where the root cause.
I think the broadcast storm is more a follow up. I think you had got a looping network. But some info about that kind of ARP packets can be found here:

https://crnetpackets.com/2015/08/28/special-type-of-arp-packets/

permanent link

answered 07 May '16, 14:10

Christian_R's gravatar image

Christian_R
1.8k2625
accept rate: 16%

Thanks Christian for the response and the link. My response below:

1) I believe a loop is a possibility, but we've not found proof of it. I'm scrubbing packets to see if there are any spanning tree changes noted. Nothing in our logs so far indicates any network change that would have suddenly caused us to have a loop and triggered this. Regardless of that outcome, I do still believe that VMWare's packets are not properly formatted.

2) Maybe I missed it, but that link doesn't show any packet with an all-zeros sender MAC or all-zeros on sender and target IP address. My understanding is that the sender should always put there own MAC at the very least.

Since posting, I've also confirmed that these packets are NOT actually originating from the guest but are generated by the ESX host (had a local packet capture agent running on some guests during the start of this event that didn't show the GARP packets when my network taps did show them).

I've done some more research and it appears that this behavior may be related to VMWare's "Notify Switch" setting where it sends a RARP packet when a VMGuest joins the network. This RARP has both sender and target MAC set to the Guest MAC. Here's a reference: http://rickardnobel.se/vswitch-notify-switches-setting/ Since original post, I've confirmed the RARP's are there and look correct (also not seen on guest VM capture), but the GARP's are not mentioned in that reference and look wrong to me.

So I'm working with our design engineers to figure out how a loop could have occurred, but still need answers to my original questions about why the GARP requests and subsequent GARP replies were generated.

Thanks, Tim

(07 May '16, 19:51) CMH_Tim
1

still need answers to my original questions about why the GARP requests and subsequent GARP replies were generated

While it is hard to say why the original GARP requests have been generated (most likely candidates are mere bug and some malware), I'm afraid that many (if not all) implementations would respond to an ARP request containing

Target MAC address: Broadcast (ff:ff:ff:ff:ff:ff)
Target IP address: 0.0.0.0

regardless whether such request has been sent as gratuitous or not. The reason is that 0.0.0.0 normally means "any of my local IPs" in local context (see section 3.2.1.3 of RFC 1122, especially the point "must not be sent except as source under specific circumstances").

So you don't even need a loop, it is enough that all machines on the LAN segment respond to a single broadcast ARP request to have a kind of LAN Smurf Attack.

(08 May '16, 01:49) sindy

Well if I were you, I would go with my findings into the lab and try to get a better understanding of this ARPs and their interaction with the network devices.

There are so much ARP implementation out there and everyone is a little bit different.

Let´s assume this is the root cause of your problem! How often can you see these Requests? And does it always tear down the network? And how many hosts will answer? And how long (time) can you see this high BC rate?

Also a loop could occur without some log entrys. Your link was filled up with >2GBit/s Broadcast -> this is normally done by a loop. How have you stoped this incident?

But I agree with you and @sindy this ARP packets looks different to the normal GARPs.

(08 May '16, 12:12) Christian_R

@sindy - Thanks about the replies. That makes sense, although not every device replies like that, of course so I think a lot of implementations are interpreting those as incorrect/not relevant and not responding.

@Christian_R - We are working with the vendor but not had much luck getting them to figure out why they generate these packets. We do have a low volume still going on without noticeable impact, but I'm concerned that if we hit the VLAN where devices actually respond, we could see minor issues again.

That being said, I'm fairly certain now that those are a symptom - no matter how strange - but not root cause. As noted below, I'm pretty sure that you were right that the root cause was a loop.

I did go back into our logs from Friday's event and found a 3750 switch stack that logged MAC flapping shortly after the GARP storm started. The flaps were on its two uplinks to our core switches and contained the MACs of the core switches, so there was some type of loop at that time, just no logs of it before the GARP's. Switch/router engineers took a look but couldn't find any evidence of a loop so Saturday night, we proceeded to bring redundancy back by restarting and reconnecting all links to the 2nd core switch.

When we brought the 2nd core switch back into the network late Saturday night and began reconnecting the redundant links between the core and edges, all went well until that 3750 stack was connected. Once that happened, we had another GARP storm and same impact as before.

We have no changes made by anyone on Friday when this happened, but my guess is that something happened that started the loop but nothing got logged. On Saturday's reconnect, we're certain there was no storm before the MAC flaps started again.

Thanks again for the help. Would love to hear from someone who could explain those GARP requests.

(09 May '16, 06:57) CMH_Tim

Well here could be found more info for the ARPs:
https://ask.wireshark.org/questions/5178/why-gratuitous-arps-for-0000
http://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/8021x/116529-problemsolution-product-00.html
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1028373
But I have not read them all. And I have no real world experience with those special type of ARPs.

Back to your loop: - Sometimes high CPU Load could cause a loop. - Sometimnes this load could be caused by ARP Storms, when you use managed switches. - Some Spanning Tree configurations might end in high CPU load, too.

(09 May '16, 11:24) Christian_R
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×78
×23
×7
×1

question asked: 07 May '16, 09:39

question was seen: 2,701 times

last updated: 09 May '16, 11:24

p​o​w​e​r​e​d by O​S​Q​A