Topic Options
#1791 - 12/13/06 08:08 AM Constant packet loss on our servers
DavidH Offline


Registered: 12/13/06
Posts: 2
Hi Peter,

We've been using PingPlotter Pro for a while now and helped us solve myriads of problems. Great tool!
Straight to the point...We are constantly experiencing rather heavy packet loss on our servers as you can see from the picture. I'm aware of fact that packet loss on routers is irrelevant as long as it doesn't appear on the destination server (see hop 6 on graph), however, hop 11 (217.204.92.34 – it’s actually hop 10 on graph above) is constantly showing packet loss that's clearly being carried over to our server (82.110.148.130).

We are experiencing same packet loss from different remote offices (different ISP) connecting to our Data Centre once they cross to Data Centre’s ISP (namely router 217.204.92.34). Typical symptoms like lag, stickiness, cursor catching up while typing a text are present.

Would that signify a problem with the particular router (which our ISP denies)? If not, how come “the red packet loss shape” on our server always mirrors the packet loss shape on their router.

Sorry for long answer.

Many thanks in advance.

David


Top
#1792 - 12/13/06 12:19 PM Re: Constant packet loss on our servers [Re: DavidH]
Pete Ness Offline



Registered: 08/30/99
Posts: 1106
Loc: Boise, Idaho
Hi, David.

It looks like you may be having some route changes too - I say this just because the time graphs are showing the wrong hop numbers. If you're not seeing a regular route change where the route length changes, then please contact us via email so we can troubleshoot why the hop numbers on the time graphs don't match the trace graph.

Moving on from that ... for the rest of this email, when I say "hop x", I mean the hop number shown on the upper graph.

First off, having "Samples to include" set to only 10 makes it a bit tough to have any vision into hops that don't have their time graphs turned on. Setting it to 500 would probably give a bit better vision on the upper graph, since you're only seeing about 5% loss at the final destination (and at 5% loss, there's a pretty good chance will see some statistical trends if we look at 500 samples, but not much chance if we only look at 10).

Hop 5 is showing packet loss, but as you noted, this isn't translating to downstream hops / routers, or the final destination, so that packet loss doesn't necessarily indicate any problem. This is especially the case because hop 6 looks pretty good (although not perfect).

It looks like there is some router between hop 6 and hop 11 that is the reason for the packet loss (and latency!) at hop 1. Because hop 10 shows this problem, we *know* it's not a problem with hop 11 itself. Based on the information I see here, it could be a problem with hop 7, though, or hop 8 (since hop 8 is showing some packet loss and jitter). You'd have to look at the time graphs for all the hops to find the pattern that matches what you see at the final destination (which it sounds like you've already done).

Since your data is staying on the Easynet network, it's pretty easy to point to their network as the culprit here. The fact that hop 6 looks good and hop 10 does not pretty much eliminates your own network as the problem (I'm assuming that the your own equipment is all before hop 2 or after hop 10).

If you were trying to make a compelling case to your ISP in your situation, you want to show time graphs for 3 hops - the final destination, the first hop showing problems, and the first hop before the one showing problems. Ideally, if hop 9 looks good, you'd want to show them hops 9, 10 and 11 (good, bad, bad). Sometimes, though, you'll find that the "good" hop is also one that has its own packet loss reporting problems (like hop 5 does). Hop 9 looks pretty good in your case, though, since it has 10 consecutive samples of 8ms.

Based on what I see here, this shows problems with Easynet relatively conclusively. I'd want to show hop 9. I'd also want to increase my "Samples to include" to 500 or 1000 (or some other 250+ number that shows a good picture of the problem). I'd also want to focus on a time period that is characteristic of problem periods so the upper graph shows a good summary of the hops for the time in question. I'd also want to tie some other application performance problem to the PingPlotter data. If you have reports of an application problem (say, slow Citrix performance or a dropped VoIP call), you can note that on the time graph by right-clicking and "Create Comment". This correlates the network problems you're feeling with the data in PingPlotter, which makes it harder for your ISP to dismiss that as "It's just traceroute - that doesn't mean anything".

You might also trace the other direction and show that you're seeing packet loss at hop 3 (or whatever the hop is) on all outbound data from the other site.

Here is some reference information you may find useful:

If your ISP says traceroute / PingPlotter isn't a good way to troubleshoot:
http://www.nessoft.com/kb/46

If you want to look at some good examples of problems / troubleshooting (some with significant similarities to your problem):
http://www.pingplotter.com/tutorial/VoipTroubleshooting.html

Let us know if this leaves you with questions.

- Pete

Top
#1793 - 12/14/06 06:14 AM Re: Constant packet loss on our servers [Re: Pete Ness]
DavidH Offline


Registered: 12/13/06
Posts: 2
Many thanks Pete for your time and nice explanation.

Below is the latest Graph Stats that confirms the problem on Easynet's side (samples to include set to 1000). First ISP is Viatel here (different gateway), but there is no problem up to hop 9, where first red appears. Also included hop 10 you was after - no problem there.
Hop 11, though, seems to be the culprit with quite heavy packet loss (5%). Hop 12 is Easynet's last router (in my original question incorrectly named as our server - ICMP was blocking at the time so that was the last possible hop).

My last question, Pete, would be:
Hop 12 doesn't seem to be affected from packet loss on hop 11 (high jitter, though), however, our server (hop 13) suffers from packet loss regardless. Would you still say hop 11 is to blame?

I can confirm that we get very similar graphs from all our remote offices in UK (last 3 hops constantly in red).

I really appreciate your help and all the best with next versions of your great tool.

David


Edited by DavidH (12/14/06 06:19 AM)

Top
#1794 - 12/14/06 12:11 PM Re: Constant packet loss on our servers [Re: DavidH]
Pete Ness Offline



Registered: 08/30/99
Posts: 1106
Loc: Boise, Idaho
This complicates things somewhat - since hops 12 only shows light packet loss. We can't conclusively say that hop 11 is the culprit, in fact this even argues that the problem may not be hop 11. If your ISP is looking for an "out", then this data won't help your cause much. There are a number of things that could cause this type of picture, though - including problems at hop 11. Usually, though, it's a problem with a router that can only be seen by tracing the reverse direction, though. Another possibility is that there are frequent route changes coming that are disrupting the data collection to the final destination.

Do you have control of the far end of things? Can you reverse the trace and run it the other direction? That would probably add some clarity to this situation. If you have control of that site but don't normally work there (or have easy access to a remote computer), you may be able to install the remote agent there so you can run a trace from that side while sitting at another site (http://www.pingplotterpro.com/remote_trace.html).

- Pete

Top

Search

Who's Online
1 registered (Robertdee), 21 Guests and 0 Spiders online.
Key: Admin, Global Mod, Mod