O.K. I admit that this is a rather long post, but if you are willing to wade through it, you may (or may not ;-)) learn something new about the arp subsystem of the linux kernel, as I did during my 'research/tests'.
As @packethunter mentioned, there is the parameter delay_first_probe_time and its default value of 5 seconds matches the observed delay, however the description in the man pages does not really match the observed behavior and there is actually a difference in behavior between ICMP and TCP (see my tests below).
I did some tests with Win XP SP3 (client - 192.168.158.139) and Ubuntu 12.04, Kernel 3.2.0-32 (server - 192.168.158.154). Here is what I observed.
RFC RFC1122 defines how to handle validation of ARP cache entries.
2.3.2.1 ARP Cache Validation
That chapter defines four methods to flush out-of-date cache entries (please read the RFC). One of the methods is "Unicast Poll", which periodically sends ARP requests to verify ARP cache entries. The proposed interval is ~ 1 minute.
arp man page
http://www.kernel.org/doc/man-pages/online/pages/man7/arp.7.html
When there is no positive feedback for an existing mapping after some time
(see the /proc interfaces below), a neighbor cache entry is considered stale.
Positive feedback can be gotten from a higher layer; for example from a
successful TCP ACK. Other protocols can signal forward progress using the
MSG_CONFIRM flag to sendmsg(2). When there is no forward progress, ARP tries
to reprobe.
O.K. So, if a higher level protocol signals the arp subsystem, that the ARP entry is still good, it will not probe (for some time), otherwise it will probe. The example mentions just TCP ACK and that's where I started my tests.
=================================================
== TEST #1: ping from windows -> linux
=================================================
I cleared all arp caches and then I captured the traffic on linux with tcpdump and this filter: 'icmp or arp'.
I ran that command in background (shell). The output went to STDOUT and thus the output of tcpdump and the output of the loop command (see below) were merged. In the foreground I queried the arp cache with a loop.
[email protected]:~# while true; do date; ip neigh show; sleep 1; done
Result: As you can see below (TEST #1: Data), there is no ARP entry on linux until I started ping on windows, which triggered an ARP request from windows. Immediately after the ARP reply there is an entry in the linux arp cache, that is marked with DELAY
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de DELAY
So, linux obviously knows the MAC address of the windows box (from the ARP request). However, the entry is marked DELAY instead of REACHABLE (although a higher layer protocol could have given positive feedback - in this case ICMP).
Then, after 5 seconds, there is an arp request to verify the entry. After a valid answer, the entry is marked as REACHABLE. It remains in that state for 46 seconds. Then it's marked as STALE
So, how does that match the definition in the arp man page?
Cite: Positive feedback can be gotten from a higher layer ... When there is no forward progress, ARP tries to reprobe.
So, apparently ICMP does not give positive feedback to the arp subsystem, otherwise there would have been no probe (this is different for TCP, see below).
Now, let's look at the time until the probe is sent (according to the capture files: 5 seconds). There is the following definition in the arp man page:
delay_first_probe_time (since Linux 2.2)
Delay before first probe after it has been decided that a neighbor is
stale. Defaults to 5 seconds.
The default value is 5 seconds, but the description does not match the observed behavior. The entry was not STALE (as described in the man page). It was in the state DELAY.
So, how can we explain this discrepancy?
- it could be an error in the man page
- the value could be used to probe DELAY entries as well
- the value is unrelated to the probe sent for the DELAY entries, but then which value is used and where is that one defined (also 5 seconds)?
I had not yet time to check the kernel source to find an answer to that question.
Finally: How long is an entry considered to be REACHABLE before it changes to STALE. From the arp man page.
base_reachable_time (since Linux 2.2)
Once a neighbor has been found, the entry is considered to be valid for
at least a random value between base_reachable_time/2 and
3*base_reachable_time/2. An entry's validity will be extended if it
receives positive feedback from higher level protocols. Defaults to 30
seconds. This file is now obsolete in favor of base_reachable_time_ms.
base_reachable_time_ms (since Linux 2.6.12)
As for base_reachable_time, but measures time in milliseconds.
Defaults to 30000 milliseconds.</code></pre><p>So, the enrty is <strong>REACHABLE</strong> for a random amount of seconds in the range of <strong>base_reachable_time/2</strong> and <strong>3*base_reachable_time/2</strong>. The observed 46 seconds do fall into that range and if you repeat the test, you will get different values.</p><p><strong>=================================================</strong><br />
== TEST #1: Data
=================================================
Sun Nov 11 21:42:07 CET 2012
Sun Nov 11 21:42:08 CET 2012
Sun Nov 11 21:42:09 CET 2012
Sun Nov 11 21:42:10 CET 2012
Sun Nov 11 21:42:11 CET 2012
Sun Nov 11 21:42:12 CET 2012
21:42:12.473391 ARP, Request who-has 192.168.158.154 tell 192.168.158.139, length 46
21:42:12.473434 ARP, Reply 192.168.158.154 is-at 00:0c:29:c5:f6:9b, length 28
21:42:12.473572 IP 192.168.158.139 > 192.168.158.154: ICMP echo request, id 512, seq 4096, length 40
21:42:12.473656 IP 192.168.158.154 > 192.168.158.139: ICMP echo reply, id 512, seq 4096, length 40
Sun Nov 11 21:42:13 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de DELAY
Sun Nov 11 21:42:14 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de DELAY
Sun Nov 11 21:42:15 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de DELAY
Sun Nov 11 21:42:16 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de DELAY
Sun Nov 11 21:42:17 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de DELAY
21:42:17.481965 ARP, Request who-has 192.168.158.139 tell 192.168.158.154, length 28
21:42:17.482126 ARP, Reply 192.168.158.139 is-at 00:0c:29:34:0b:de, length 46
Sun Nov 11 21:42:18 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:42:19 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:42:20 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:42:21 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:42:22 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:42:23 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:42:24 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:42:25 CET 2012
...
Sun Nov 11 21:43:00 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:43:01 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:43:02 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de STALE
Sun Nov 11 21:43:03 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de STALE
Sun Nov 11 21:43:04 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de STALE
Sun Nov 11 21:43:05 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de STALE
Sun Nov 11 21:43:06 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de STALE
=================================================
== TEST #2: tcp connect from windows -> linux:22
=================================================
I cleared all arp caches and then I captured the traffic on linux with tcpdump and this filter: 'icmp or arp or tcp[13]&2 ==2'.
I ran that command in background (shell). In the foreground I queried the arp cache with this command:
[email protected]:~# while true; do date; ip neigh show; sleep 1; done
Result: As you can see below, there is no ARP entry on linux until I connected to port 22 of the linux box, which triggered an ARP request from the windows system. Immediately after the ARP reply there is an entry in the linux arp cache, that is marked with DELAY
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de DELAY
So, linux obviously knows the MAC address of the windows box (from the ARP request). The entry is marked with DELAY.
HOWEVER, this time there is no arp probe after 5 seconds. The entry changes from DELAY to REACHABLE automatically after 5 seconds (again that value!). So, this looks like a 'positive feedback from a higher layer', as described in the arp man page.
=================================================
== TEST #2: Data
=================================================
[email protected]:~# while true; do date; ip neigh show; sleep 1; done
Sun Nov 11 21:51:08 CET 2012
Sun Nov 11 21:51:09 CET 2012
Sun Nov 11 21:51:10 CET 2012
Sun Nov 11 21:51:11 CET 2012
Sun Nov 11 21:51:12 CET 2012
21:51:12.986592 ARP, Request who-has 192.168.158.154 tell 192.168.158.139, length 46
21:51:12.986638 ARP, Reply 192.168.158.154 is-at 00:0c:29:c5:f6:9b, length 28
21:51:12.986720 IP 192.168.158.139.3393 > 192.168.158.154.22: Flags [S], seq 1430831265, win 64240, options [mss 1460,nop,nop,sackOK], length 0
21:51:12.986742 IP 192.168.158.154.22 > 192.168.158.139.3393: Flags [S.], seq 1752069330, ack 1430831266, win 14600, options [mss 1460,nop,nop,sackOK], length 0
Sun Nov 11 21:51:13 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de DELAY
Sun Nov 11 21:51:14 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de DELAY
Sun Nov 11 21:51:15 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de DELAY
Sun Nov 11 21:51:16 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de DELAY
Sun Nov 11 21:51:17 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de DELAY
Sun Nov 11 21:51:18 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:19 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:20 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:21 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:22 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:23 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:24 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:25 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:26 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:27 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:28 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:29 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:30 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:31 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:32 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:33 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:34 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:35 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:36 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:37 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:38 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:39 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:40 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:41 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:42 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:43 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:44 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de REACHABLE
Sun Nov 11 21:51:45 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de STALE
Sun Nov 11 21:51:46 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de STALE
Sun Nov 11 21:51:47 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de STALE
Sun Nov 11 21:51:48 CET 2012
192.168.158.139 dev eth0 lladdr 00:0c:29:34:0b:de STALE
CONCLUSION: One needs to check the kernel source code to fully understand why the linux kernel behaves in the observed way. If one want's to know why it was implemented that way (the difference between icmp and tcp), I suggest to post a question on the kernel developer list and prepare to get some friendly rant ;-)
Regards
Kurt
Kurt, packethunter: thank you for replies! Both hosts are Debian machines with 2.6.32-5-amd64 kernel. Content of the /proc/sys/net/ipv4/neigh/eth0/gc_stale_time is following:
I tested this several times and 10.10.10.1 always sent ARP request 5s later. One example where I waited about 8min before and after the ICMP "echo request":
Wireshark output can be seen here.
Maybe it is some sort of default behavior of Linux IP stack to check the accuracy of ARP table if the ARP table entry was built not by the ARP reply packet but some other unicast packet(ICMP “echo request” in this case)? In addition, is it possible to configure Linux kernel to use broadcast messages for building up ARP table?
well, hard to say, based on a single incident. But nevertheless, the idea about arp cache stale checking is a good idea.
O.K. now it’s clear, that it is not coincidental. So, it looks like a feature of the IP stack and the mentioned stale check is a good candidate.
@m4rtin: can you please retry and post the output of the following command before and after the ping. Run the command on 10.10.10.1.
It will show, if the ARP entry is stale.
Does anybody find it curious that 10.10.10.1 is answering the ping without ARPing for a MAC address?
@m4rtin, I am assuming that you cleared the ARP cache on both systems?