This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Modbus TCP connection drops after 7 hours and 20 minutes

1

VMware ESXi 5.5 Windows Server 2012 connected via VLAN with 3 hops (Cisco switches) to Modbus TCP Devices:

Advantech ADAM-5000/TCP (x5) Advantech ADAM-6260 TCP. (x1)

The devices can be polled using the Advantech ADAM/Apax.net utility version 2.05.10.

I'm running Wonderware System Platform 2014 R2 SP1 as the SCADA with DASMBTCP Version 3.0.1 Data Acquisition Server.

The ADAM-6260 device is rock solid and never loses connection.

The ADAM-5000/TCP devices all drop connection via port502 after 7 hours 20 minutes of stable operation.

Pinging the devices is always possible, even after the Modbus TCP port closes.

A physical reset of the ADAM5000 brings the device back to life.

If I connect a Win7 laptop to the Server switch and route through the VLAN to the ADAM 5000, connection is rock solid.

If I connect a Win7 laptop to the ADAM 5000 via cross-over cable the connection is rock solid.

As soon as the device is connected and polled from the VMWare Windows Server 2012, the Modbus TCP fails after 7 hours and 20 minutes.

I've tried using a Telnet client but Port 502 is definitely closed.

The Advantech Utility error message is "Connect Module Failed! Reached maximum number of connections"

The issue seems to be independent of the Wonderware DA Server or the Advantech ADAM/Apax.net Utility because when both are disabled / not polling the devices still drop out after the 7hrs20mins.

The ADAM6260 is solid and Advantech support put this down to a function / feature called "Host Idle Timeout". This is not available on the ADAM5000, although I have requested it.

The table below should help guide through the Wireshark log

  • IP Source - 123.111.4.101
  • IP Dest - 123.111.4.10
  • Device ADAM 5000L TCP
  • Source Port before drop out 53609
  • Source Port after drop out 61587
  • Time in wireshark log 16:04:51

    55688 2017-04-12 16:04:01.496375 123.111.4.10 123.111.4.101 Modbus/TCP 79 Response: Trans: 1319; Unit: 1, Func: 3: Read Holding Registers
    55726 2017-04-12 16:04:01.557603 123.111.4.101 123.111.4.10 TCP 54 53609 → 502 [ACK] Seq=469 Ack=976 Win=64867 Len=0
    57140 2017-04-12 16:04:11.495071 123.111.4.101 123.111.4.10 Modbus/TCP 66 Query: Trans: 1320; Unit: 1, Func: 3: Read Holding Registers
    57162 2017-04-12 16:04:11.616774 123.111.4.10 123.111.4.101 TCP 60 502 → 53609 [ACK] Seq=976 Ack=481 Win=2368 Len=0
    62801 2017-04-12 16:04:51.493880 123.111.4.101 123.111.4.10 TCP 54 53609 → 502 [FIN, ACK] Seq=481 Ack=976 Win=64867 Len=0
    62802 2017-04-12 16:04:51.494132 123.111.4.101 123.111.4.10 TCP 66 61582 → 502 [SYN, ECN, CWR] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    62803 2017-04-12 16:04:51.495346 123.111.4.10 123.111.4.101 TCP 60 502 → 53609 [ACK] Seq=976 Ack=482 Win=2367 Len=0
    62804 2017-04-12 16:04:51.496270 123.111.4.10 123.111.4.101 TCP 60 502 → 61582 [SYN, ACK] Seq=0 Ack=1 Win=2368 Len=0 MSS=528
    62805 2017-04-12 16:04:51.496303 123.111.4.101 123.111.4.10 TCP 54 61582 → 502 [ACK] Seq=1 Ack=1 Win=65392 Len=0
    62806 2017-04-12 16:04:51.496403 123.111.4.101 123.111.4.10 Modbus/TCP 66 Query: Trans: 1321; Unit: 1, Func: 3: Read Holding Registers
    62807 2017-04-12 16:04:51.497312 123.111.4.10 123.111.4.101 TCP 60 502 → 61582 [RST, ACK] Seq=1 Ack=1 Win=2368 Len=0
    62808 2017-04-12 16:04:51.498110 123.111.4.10 123.111.4.101 TCP 60 502 → 61582 [RST, ACK] Seq=1 Ack=13 Win=2368 Len=0
    64258 2017-04-12 16:05:01.493965 123.111.4.101 123.111.4.10 TCP 66 61587 → 502 [SYN, ECN, CWR] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
    64259 2017-04-12 16:05:01.494879 123.111.4.10 123.111.4.101 TCP 60 502 → 61587 [SYN, ACK] Seq=0 Ack=1 Win=2368 Len=0 MSS=528

I wanted to attach several screenshot of Packet Captures and upload a capture to Cloudshark but I'm unable to due to company policy and I'm unable to add files directly to this post.

Any Guidance or insights would be very much appreciated, I've been working on this issue for 6 weeks now. Many thanks in advance.

Chris Dell.

asked 13 Apr '17, 03:47

chrisdell's gravatar image

chrisdell
26116
accept rate: 0%

edited 13 Apr '17, 14:51


One Answer:

2

Working with the brief text excerpt of the capture I can see the following:

  • In frame 57140 the master makes a slave query (from port 53609)
  • In frame 57162 the slave sends a TCP ack for the query segment
  • In frame 62801, 40 seconds after the query, the master closes the connection with a FIN, probably due to timeout for no slave reply.
  • In frame 62802 the master opens another connection (from port 61582).
  • In frame 62803 the slave acknowledges the FIN from 62801 but doesn't supply its own FIN keeping the connection alive.
  • In frame 62804 the slave responds to the SYN in frame 62802 with its own SYN.
  • In frame 62805 the master completes the 3-way handshake for the connection from port 61582.
  • In frame 62806 the master sends a query, presumably using the connection just established.
  • In frames 62807 & 8 the slave sends TCP RST's for port 61582,thus closing the connection.
  • In frame 64258 the master attempts to open another connection (port 61587).
  • In frame 64259 the slave responds to the SYN in frame 64258 with its own SYN.

In summary, the master sends a requests, the slave fails to respond, the master closes the connection, the slave fails to complete the connection close, the master successfully opens another connection and sends a query to which the slave hard-closes the connection.

Looks to me like an application issue on the slave.

answered 13 Apr '17, 05:22

grahamb's gravatar image

grahamb ♦
19.8k330206
accept rate: 22%

Thanks for your quick response Graham. I've tried to get the vendor to make a firmware change but so far I've hit resistance. If you don't mind I'm going to send your explanation to Advantech and see if they can make any firmware change.

(13 Apr '17, 08:05) chrisdell

Having access to the full capture would also enable analysis of other things such as multiple master connection attempts etc.

Personally I don't think there's any sensitive data in a Modbus capture, it's only register values. The exception is where multiple registers are used to return such things as strings, then there might be sensitive data. The IP addresses can be anonymized, but you've already shown those (although they might have been fuzzed).

(13 Apr '17, 08:57) grahamb ♦

This afternoon I tried a Tofino Modbus Firewall between the network and the ADAM5000 and early tests are looking good. The units aren't pingable anymore but the Modbus traffic is steady. I'll know for sure over the weekend.

(13 Apr '17, 13:34) chrisdell

I missed the part in your question about it only being a problem when running in VMWare.

This implies that the VMWare solution is doing something different, in which case where are you capturing? If capturing on the VMWare guest or host you may not be seeing the "real" network traffic. Ideally capture with a tap or mirror or span port between the slave and it's switch.

As it works other than when under VMWare, it would seem that the different behaviour can only be seen by analysing the full captures. As your're unable to provide them, then you're on your own there.

The Modbus firewall must be subtly modifying the traffic in some way such that it doesn't trigger the abnormal slave behaviour. Again you'll need to analyse the captures (again at the slave side of the switch) to try and spot any differences.

(13 Apr '17, 15:40) grahamb ♦

The unit in between the firewall device has been stable for over 16 hours now. I'm going to implement this as the solution. Further investigation will have to wait as the project has been on hold since I hit this issue over a month ago. Thanks for your input.

(14 Apr '17, 02:49) chrisdell

I know that sometimes business needs preclude completing an investigation, but I'd still be concerned that the issue has been postponed and not actually fixed.

(14 Apr '17, 05:32) grahamb ♦
showing 5 of 6 show 1 more comments