This is my first post to the forum, and I couldn't find a detailed analysis for my issue. I've worked in the network forensics analysis field for awhile at a major financial company using enterprise tools, but never really dug into all the intricacies of Wireshark - until I changed jobs and my new employer didn't have any tools! Now I'm working on my Wireshark cert, and taking opportunities to look at any/all traffic of interest. As part of this effort, I bought a cheap aggregated 100 Mb TAP for use at home - and didn't install it. I knew that I should be watching my own traffic, but just didn't have time or motivation at the time to get started.
I recently changed ISP's from AT&T to Charter for various good reasons - when suddenly the need for the TAP presented itself. I hadn't anticipated probs and at first the 100 Mb BW was awesome. However, weirdness began after about 1 month of service.
I had sent AT&T's 2WIRE router back to them, and moved my personal Buffalo AirStation into position connecting to Charter's Cisco cable modem with the rest of my home network behind it. I didn't optimally configure the internet link on the router, so I wasn't very surprised when I first experienced lost connectivity. I figured that Charter had changed DNS or DHCP servers, the default gateway, or some combination thereof. I'd heard of that before.
I properly reconfigured my router for DHCP on the internet link, release/renewed and obtained a new Charter IP. Service was restored, problem solved (for the moment). A week later I had another disconnect. When I removed the router and directly connected my old XP workstation used to originally test the link, service was again restored. I reconnected the router and rebooted it, and was online again, for another few days. Then the disconnect started happening every day or two.
At this point, my wife was flaming me to fix it (must be my fault) - and I wanted some background info before calling Charter. I knew that they were going to blame my router, suggest I should pay for theirs instead, and it was going to be a long and painful support call.
So I inserted the TAP between the cable modem and router, installed a USB Ethernet adapter on my XP PC for TAP data collection, turned off the TCP stack, and configured dumpcap to collect a large ring buffer of files with snaplen set to 92. The XP's primary interface was still connected inside the router, and I downloaded a ping plotter and set it to ping Charter's gateway for my current subnet. I was ready for the next outage.
Within a couple of days, I had another incident. This time I download the router's system log before rebooting it, checked the ping plotter to see exactly when the disconnect happened, and grabbed the corresponding PCAP to keep it from being overwritten. The ISP link continues to drop, so now I have several instances recorded. It usually happens between 4:30-5:30 AM. There is no indication of a problem in the router's logs.
While the connection is OK and when initiated by the router, a DHCP request resolves successfully. The XP workstation connected directly to the cable modem never has a problem reconnecting via DHCP that needs intervention. The ping plotter provides a log for attempts to ping the gateway, every 10-15 sec. These average 10 ms almost all night until the failure occurs.
The traces shown lots of Charter network activity, but nothing that would indicate a specific type of initiated conversation behavior between any Charter system and my router/workstation. The only thing you see just before the pings stop is lots of ARP (but NO USER) activity. After the gateway ping fails, the router starts broadcasting ARP to see where the gateway is, and it continues without ever reconnecting on it's own.
There appears to be some other attempted connections/browser activity from the router IP that show as retrans during this time. DHCP requests are directed to the wrong server or don't answer. I haven't ruled out possible malware on my network, but even if present - I don't think it's contributing to this problem. Charter's not telling me that they're booting me based on my outbound traffic.
You have to physically power cycle both the modem and router to re-establish link. Just releasing/renewing the IP on the router won't work. I'm patched to the latest software version on my router. Right now the only solution I have is to give my wife a wireless light switch that has the cable modem and router plugged in, and when the internet doesn't work turn it off and then back on.
I had to google a phone support number, but after that opened a support case with Charter, got an English speaking tech on the phone within 5 mins, and started hoping that maybe it wouldn't be so bad. In fact, he determined in less than 5 minutes that if the directly connected PC worked fine, then it HAD TO BE my router and he was done. Sigh.
I have anonymized my capture with TraceWrangler (cool tool!), and can provide it for analysis. I'm hoping that I have enough info to prove or at least accurately infer what Charter is doing at 5 AM that causes this. I'll probably never get them to admit any fault and unless Buffalo has a suggestion, I most likely will have to continue with the remote power cycling. However, I'm hoping that we can discuss enough detail here to at least shame them publicly and confirm what everybody already suspects - it's not always the user's router. Maybe we can even diagnose the activity to determine if it's something they can do to avoid causing the condition (a change in their procedures), and make the world better for Charter customers. Unless they're TRYING to make you use their router...
Is this the place to out an ISP? Looking forward to your replies!
asked 09 Jun '15, 21:20
edited 13 Jun '15, 18:45
WARNING (I guess that's fair, but unnecessary if you have good people skills and kinda irritating don't you think?) - you may not like what you read here. I'm going to call this my answer to my own question and close it out. You must get a lot of exercise jumping to conclusions. I didn't say I was going to sue Charter, I'm not that naive. I've been insulted by smarter guys than you before about lots of different network issues, and they were wrong too. Everybody's entitled to their opinion.
I think I did prove something here - these problems are NOT the fault of my router or network behind it, and they DO originate from Charter's network. As Laura Chappell or Gerald Coombs would tell you - packets don't lie. HOWEVER, they don't always tell ALL of the truth, depending on where and how they were collected, and when taken out of context. From the packet data I was able to collect from my TAP, I don't think that any of my statements can be ruled out.
I never really expected to fix this problem, I just wanted more eyes on the traces to make sure I wasn't missing anything, get used to the forum, play with anonymizing data, use CloudShark, and leave some food for thought about Charter along with solid packet details (my trace) - mission accomplished! So even though I can't prove any of my accusations against Charter, I'm also not the first person to make these same claims. In this case if it's NOT MY ROUTER, then how many other Charter customers are experiencing the same problems - and it's not their routers either? What Charter is doing is WRONG whether by stupid negligence or intent, and more people should know.
I won't be taking your recommendation, because that's also what Charter wants and it goes against my moral standards. I'll just keep power cycling for now, wait until AT&T catches up to and matches the price and bandwidth that Charter currently offers, and switch back when they do. Notice I used some all caps words, but stopped short of bolding things and including little winks. Enjoy!
answered 17 Jun '15, 21:00
edited 17 Jun '15, 21:02
A few comments:
ALWAYS take a systematic approach to troubleshooting packet captures, and start with what the user expects to be able to do. Here the troubleshooting seems a bit all over the place. Not only is it wrong technically to look at DNS as a cause for a failed ping to an IP address (for example), but when you go down that rabbit hole you make a simple problem more complicated needlessly. Trust me, you can spend hours chasing down red herrings if you just start by opening a capture file and looking around at things like this.
From the trace file, assuming your topology info is correct and your router not being in the same network as your IP gateway is just a result of packet sanitization, we can say for sure that during the service interruption your router could not ping its IP gateway. THAT is something you can potentially take to your ISP without speculating on a cause, and it's a symptom their first-line support should understand. Further, if you see it from the tap leaving your router, it puts all your thoughts on 'network malware' to bed where you can focus on this single L3 hop without speculating on (uncommon) things you suspect they may be doing beyond their CMTS.
As has been said, there isn't enough information in that trace file to conclude anything. A ping request not responded to after it had been successful in packets 1-2 means that this level of connectivity was lost, but from this there's no way to say why, and no 'tail to be pinned', so to speak, from this stage.
Having said that, for the comments about ISPs not caring based on the math of one customer's monthly rate versus the support cost of a call, I can't speak for all ISPs but at least for myself I'd never look at it that way. In my experience, if a problem is enough of an outlier to not fit within what initial support lines can troubleshoot, it WILL get to engineering-type support people who would understand what a network tap is fairly quickly (provided it really is an outlier, and not a common problem being over-complicated). Even now where I'm in a design/planning role, and mainly in the the cellular side now, I'll still regularly read through escalated 'one-off' tickets for residential internet service and put an A-game effort in to them because 1) they represent the squeaky wheel of a greater customer base that warrants caring of such wheels and 2) one-off cases reported by individuals can often represent greater problems that few people have reported yet, and I really want to know about those. Many/most network engineering-types feel the same way, and the HFC/Docsis network admins that I know care about every single cable modem.
answered 18 Jun '15, 19:05
WARNING:: You may not like the facts in the following text! Continue to read at your onwn risk :-)
I don't think that ANY (large) ISP out there is interested in a conversation like this with (small) customers (home users) and you won't get the right people on the phone, unless you are a really big customer.
So, what are your options? First of all, they are very, very limited ;-))
Yes, you can try to proof that they are doing something wrong, but this is going to take a looooong time and the chances that they will listen to you are extremely low, because they get hundreds of similar tickets every day, all claiming that the ISP is doing something wrong, and in the end it's 99% of the users doing something wrong. So, you will automatically fall into the 99% for them.
I've done such investigations in professional environments with business contracts of several thousand $ per month, and even there the ISPs claimed it's not their fault, regardless the fact that the proof was right there in front of them. So, good luck with that approach as a "small home user" ;-))
However, I don't want to discourage you. So, if you think you can achieve something, go ahead and post the capture file somewhere, so we can have a look. But, don't invest too much time, as your chances that something will change are not that good. And, after all: it could be your router ;-)
What you could try:
Then, after the next "disconnect", take a look at the ping results.
As you see, it's not that easy to draw meaningful conclusions from ping tests, if you are capturing only on your side. You would need to convince the ISP to do a packet capture in parallel on their side, which is 100% not going to happen for a home user!
answered 10 Jun '15, 03:38
Kurt Knochner ♦
edited 10 Jun '15, 03:40