This is our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Hello guys,

I'm working on the issue with my Nagios server. Nagios monitoring was working fine, but for few days already I see these errors:

"CHECK_NRPE: Error - Could not complete SSL handshake. " But theses error not consistent. So, first it gives this error, but after 5 minutes check became OK.

So, I check my configuration, but as no changes was made in last time, find no issues as well.

So I try to analyse Nagios traffic with Wireshark. Mostly it look ok for me, but I find strange thing - when Nagios try to establish SSL handshake it sends packet with protocol shows in Wireshark as "SSL". It receive no answer. Then after a minute it sends the same packet for SSL handshake, but with TLSv1 ptotocol. And then it works fine. http://piccy.info/view3/4921610/583ca764bdb98d778b0f605c3e0b3a22/orig/

So, question is - what is the difference between this SSL and TLSv1 protocols? As they look the same for me.

http://www.cloudshark.org/captures/e957011a18ef

asked 30 Jul '13, 08:51

Macumazan's gravatar image

Macumazan
1112
accept rate: 0%


The Client Hello is a TLS 1.0 handshake in both - tcp.stream eq 10 or tcp.stream eq 11 - connections.

The difference in the Protocol interpretation (SSL vs. TLSv1) is due to the fact that in stream 11 the negotiation does not complete and wireshark sets SSL in this case.

I extracted only the first 5 packets of tcp stream 10 and the Protocol field then changed to SSL also, when it was TLSv1 before with the full handshake.

So the real question is, why does the "server" send a FIN in the middle of the SSL handshake. Looking at the RTT and TTL it is probably NOT the real server but maybe the riverbed appliance, but this is just a guess.

permanent link

answered 31 Jul '13, 00:05

mrEEde2's gravatar image

mrEEde2
3364614
accept rate: 20%

Thank you for a great explanation.

(31 Jul '13, 01:54) Macumazan

As @mrEEde2 points out, the SSL version of the client hello is actually the same, it is the interpretation of Wireshark based on the rest of the session that makes it show SSL or TLSv1. So that is not the issue.

What I do see in your trace is that all traffic is sent to a TyanComp system with the mac address 00:e0:81:45:5c:a8 and that the return traffic either comes from a Cisco device with mac address 64:00:f1:c1:da:01 or from a Riverbed device with mac address 00:0e:b6:99:9e:e4. There is only one session that fails in the trace file. It is after a couple of sessions over the Cisco and before a couple of sessions over the Riverbed. As the Riverbed device is most likely a WAN optimizer, could it be that the tunnel to the remote location is flapping and that when Nagios polls while the tunnel is being rebuilt, the SSL session to the server 10.49.32.186 fails?

What is the LAN setup at the nagios side of the connection?

permanent link

answered 31 Jul '13, 01:38

SYN-bit's gravatar image

SYN-bit ♦♦
17.1k957245
accept rate: 20%

Thank you for the reply. Traffic from Nagios goes to the router. Router sends packets to the WAN provider router through the Riverbed hardware. The same setup on the other side of the WAN.

I don't think that tunnel to other location is flapping, but is there a way to check this? I have access to routers before the Riverbed, but WAN provider routers is not accessible for me.

(31 Jul '13, 02:01) Macumazan

As the return packets from the Riverbed in stream 11 have a ip.ttl of 64, it looks like the Riverbed is directly connected to (in the same vlan/ip-subnet as) the Nagios server. Are you sure it is behind the router with the TyanComp mac-address (as seen from the Nagios server)?

Can you identify all mac-addresses in the trace (TyanComp and Cisco) and tell me which device uses which mac-address? What does a traceroute from Nagios to the server show and what does a traceroute from the server to Nagios show? Are the routers redundant? What kind of first hop redundancy protocol do you use (HSRP, VRRP, etc)?

(31 Jul '13, 02:18) SYN-bit ♦♦

So, network is look like this:

Nagios -> Host with Linux as router() -> HP switch -> Riverbed hardware -> Verizon router() -> ... WAN ... -> Verizon router -> Riverbed hardware -> HP switch -> Linux host as router -> Nagios client

Cisco is the Verizon router()
TyanComp is Host with Linux as router(
)

Traceroute from Nagios:
traceroute to 10.49.32.186 (10.49.32.186), 30 hops max, 40 byte packets
1 10.32.240.92 0.197 ms 0.194 ms 0.191 ms -> Linux router
2 10.32.255.99 0.606 ms 0.605 ms 1.072 ms -> Verizon router
3 * 3.579 ms 3.587 ms 3.586 ms
4 * 19.380 ms 19.384 ms 19.873 ms
5 * 37.538 ms 37.776 ms 38.008 ms
6 10.49.32.186 37.529 ms 37.382 ms 37.359 ms

Tracerouter from Nagios client:
traceroute to 10.32.241.141 (10.32.241.141), 30 hops max, 40 byte packets
1 10.49.32.202 0.081 ms 0.062 ms 0.063 ms
2 10.49.47.99 1.229 ms 1.451 ms 1.467 ms
3 * 15.417 ms 15.539 ms 15.446 ms
4 * 38.673 ms 38.908 ms 38.912 ms
5 * 34.135 ms 34.131 ms 34.355 ms
6 10.32.241.141 37.851 ms 38.078 ms 38.076 ms

I think we don't use any redundancy protocols.

(31 Jul '13, 03:54) Macumazan

Is the Nagios host connected on the same HP switch? And is it on the same vlan as the Linux Router and the Riverbed?

What is the subnetmask used on the Nagios host, the linux router and the Verizon router? I suspect a subnet mask of 255.255.240.0, putting all devices in the same IP subnet and therefor creating asymetric routing. I bet step 5 in the reverse nagios trace was actually a response from the Verizon router (public interface).

Does the Verizon router point back to the Linux router (if it's subnet mask is smaller than 255.255.240.0)?

Regarding the flapping of the Riverbed tunnel, do you have access to the Riverbed device? Or can you contact someone with access to it to check whether there is anything in the logging at the times Nagios reports the server as down?

(31 Jul '13, 04:11) SYN-bit ♦♦

Yes, vlan the same for them. Nagios connected to the same switch. Subnet mask is 255.255.240.0, yes. 5 step is Verizon router. It point to the Linux router.

Yes, I find that when Nagios show this error, Riverbed gives the error as well:

[io/inner/prod.ERR] 128089795 {10.32.241.141:60513 10.49.32.192:5666} Err while reading: Connection reset by peer

It's other host with the same error from today.

(31 Jul '13, 04:30) Macumazan

OK, that indeed explains the asymmetric routing seen in the tracefile, as the verizon host is in the same subnet as the Nagios host it will send the return traffic straight to the Nagios host instead of the Linux router.

Although the reason for this design is not clear to me, I don't think it is the reason for the failing connections. It would be interesting to see a packet capture made on the Riverbed when this problem occurs. On both the inside interface (connected to the switch) and the outside interface (connected to the Verizon router).

I still suspect the Riverbed tunnel as the response time for the FIN after the Client Hello is ~11 ms, while the RTT in the 3-way-handshake was ~45 ms. This means the Riverbed must have decided to send the FIN without waiting on the response to the ClientHello from the other side.

Did you also make a packet capture on the remote side? It would be interesting to see what is seen on the network there (preferably also before and after the riverbed device).

(31 Jul '13, 05:09) SYN-bit ♦♦

I think I found what was the cause -> Riverbed hardware. So after I check logs from Riverbed I find that it gives these type of errors sometime:

[admission_control.NOTICE] - {- -} Connection limit achieved. Total Connections 611,Branched Warmed Connections 0
Jul 31 15:37:38 sport[7943]: [admission_control.WARN] - {- -} Pausing intercept: Connection limit achieved;
Jul 31 15:37:38 sport[7943]: [admission_control.NOTICE] - {- -} Memory Usage: 1341 Current Connections: 611 TCP Memory Usage: 114;

So looks like we have a limits of optimize connections - 611. I put Nagios server to passthrough the optimize tunnel - don't see any ssl issues for now.

I will check this until tomorrow to see if issue is fixed.

It was very nice of you to help me figure this out! It was like a lesson and vector to point me where I need to develop my network debugging skills :)

(31 Jul '13, 08:16) Macumazan
showing 5 of 7 show 2 more comments
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×1,620
×319
×23

question asked: 30 Jul '13, 08:51

question was seen: 22,984 times

last updated: 31 Jul '13, 08:43

p​o​w​e​r​e​d by O​S​Q​A