Sorry if this is a bit long. I have been losing sleep over it for 5 weeks now... I left on vacation for 2 days and over the weekend there was a power outage. None of the servers, routers, etc went down to my knowledge due to the UPS. We have two sites, Ferndale (main 192.168.142.x) and Warren(192.168.143.x)connected by a MPLS running at 20Meg. The firewall is housed at the ISP's location with an internal of 192.168.10.2. At the ferndale site sits the main Windows servers along with an iSeries (AS/400) running our ERP and the main mitel phone controller with the PRI. The warren location has another phone controller that links to the one in ferndale through the MPLS and uses VOIP. So Warren is basically a satellite with the users using 5250 emulation to talk to the iSeries (telnet). Starting the Monday I returned, the users began to be kicked out of their telnet and the phone system would drop communication between each other causing anyone on a call to lose it. Sometime this happens a few times in several hours, other times very frequently causing major problems with our shipping in warren. I started with various traces and ping tools and noticed packet loss. I contacted the ISP, they ran their ping test with 100 packets, said all was good on their end. We had replaced the switches a month earlier, so I began pulling them. 6 weeks later, the problem is no better. I ripped out the all of the switches and put the old ones back, replaced the cables, tested everything I could think of. During testing (straight into their router, I noticed I could only get 50% upload speed in Ferndale from FTP and couldn't run any of the speedtests sights at either location. Ferndale and Warren do NOT touch the firewall when speaking to each other. That is only for internet access. I called the ISP, they sent a tech who plugged into the router using their own network IP, going out their own internet gateway. They could run speedtest.net, it came back fine, so again, not their issue...its our firewall. The problem with that is 1) any XP/2003Server machine could run speedtest.net (yes, we tried many other speed sites without success). 2) OCCASIONALLY speedtest.net and megapath would complete on the other machines, but it was rare. It ALWAYS failed during the second half of the upload. 3) I configured my machine (Win10) to turn off auto-tuning in TCP and I can now run it most of the time. 4) The firewall should NO rejections for any of the machines running the tests. A firewall wouldn't do this if it was the cause.
I have been running wireshark and its full of Dup Acks and retransmits. I have even sent them to the ISP. In https://www.dropbox.com/sh/8ujc1k86dektf2q/AAC1uir8CFKdixVOK88iKnysa?dl=0 you can find 3 packet captures, all run at the same time. One is for Ferndale, another Warren, and the third is the firewall. I am concerned with 3 machines, 192.168.142.7 (the AS/400), and 192.168.142.210 and 192.168.143.216 which are the phone controllers talking to each other.
In you look at the Firewall capture, line 2116 and the Ferndale (line 46304), this is my machine .75 doing a speedtest. It doesn't appear the packets are all making it through the MPLS. The captures were done mirroring the ISP router ports and the entry into the firewall (Checkpoint).
Can SOMEONE tell me what is going on. I have run out of equipment to replace and there have been no changes to the phones or iSeries in months....
asked 11 Aug '16, 10:50
edited 11 Aug '16, 11:25