I did some analysis of your data and have attached a picture that captures a few separate problems.

First, on 11/9, 8:37pm, your router locked up. You made note of that (nice work!) and it's pretty obvious because hop 1 stopped responding. Since hop 1 is entirely on your network, packet loss at hop 1 is localized to hardware under your direct control. Note that there was some packet loss in the period before the router locked up - I'm not sure why it locked up, but for the period before lockup, it decided it showed quite a bit of packet loss that did not affect downstream hops. This means that your router decided to stop responding to ICMP TTL=0 packets, but to pass everything else downstream. It's an interesting pattern, but isn't the problem you were asking about, so I won't spend any more time on this.

There is a pretty bad latency curve, along with some packet loss, from the period at the beginning of the graph all the way through about 3:00am on the 10th. I focused the upper graph on this period because it's relatively interesting. Hop 4 is nice and solid, while hop 5 has packet loss and latency - abnormally so.

One possible reason for this is if there's not enough bandwidth to service all the customers. Hop 5 is probably a peering link, and the bandwidth between hop 4 and hop 5 is probably something that your ISP has to pay for. It looks like there's not enough.

The second and third packet loss periods you noted all start with the first hop inside your ISP (a private address, which is a bit odd!). They may have had equipment problems and they were working on it (although the outage at 6:00 pm is pretty long). I can't really speculate on what caused that - you'll need to have your ISP explain that for you.

At roughly 12:45 pm, you did get two more hops inserted between hops 4 and 5. Maybe your ISP decided to insert some monitoring devices, or maybe they reconfigured some of their hardware to decrement the TTL on packets. Again, it's hard to speculate what happened. There really isn't anything odd happening before those two hops were inserted.

I would definitely be looking at your ISP here for a solution. The long packet loss periods are almost certainly their doing, as is the latency problem.


Attachments
1547-64.85.65.229-1.png




Edited by Pete Ness (11/09/05 11:48 PM)