Topic Options
#1537 - 11/11/05 04:24 PM Latency diagnostic request: proxad routers
Anonymous
Unregistered


Hello,

Just discovering your program and loving it. Fabulous docs, too.

In the previous thread, you point to Haynsey's ISP as the cause for service interruption. The graph is fairly explicit in that respect. Over here in France, we've been having connection trouble for a few weeks, and trying hard to locate the exact source of latency/packet loss on the network, since our ISP hasn't been communicating much on the problem.

A couple of screenshots:





From another user, same thing:



Reverse traceroute (from http://www.eu.org/cgi-bin/nph-traceroute?212.27.34.254):
Quote:
traceroute to 212.27.34.254 (212.27.34.254), 64 hops max, 44 byte packets
1 giga-2.enst.fr (137.194.2.254) 0.266 ms 0.181 ms 0.174 ms
2 gw-enst-free.enst.fr (137.194.4.253) 0.488 ms 0.927 ms 0.539 ms
3 gw-free-th2.enst.fr (137.194.4.2) 1.984 ms * 3.914 ms
4 * * *
5 bzn-6k-2-po2.intf.routers.proxad.net (212.27.56.6) 4.163 ms * 4.217 ms
6 * * *
7 * * *
8 * * *
9 * * *, etc.
64 * * *

It is believed that distributed QoS is being performed on DSL connections devoid of fixed IP addresses, and that high ports are being blocked. My main issue comes with SSH telnet sessions and SFTP, which do not seem to be prone to disconnections but are very unresponsive: i sometimes have to wait 20-30 seconds before characters typed at the keyboard are sent and come back. Others report trouble with VPN sessions, and many others still with online game lags and deconnections. Other than for regular web browsing and mail access, the result is, um, consistently erratic.

Thanks for taking a look.
Eric

Top
#1538 - 11/11/05 04:40 PM Re: Latency diagnostic request: proxad routers
Pete Ness Offline



Registered: 08/30/99
Posts: 1106
Loc: Boise, Idaho
Hi, Eric.

There is a ton of lost packets and latency at the intermediate hops, but ... those hops are passing downstream data just fine, so the fact that you're seeing packet loss here means you pretty much have to ignore these hops in your analysis (well, at a minimum you need to ignore the packet loss and latency that occurs normally).

We talk about this a little bit here:

http://www.nessoft.com/kb/24

and a bit more obliquely here:

http://www.nessoft.com/kb/46

Ideally, what you're looking to do is correlate a performance problem in some other application (ie: SSH / SFTP / VPN). Most of these protocols use TCP, so you might set up PingPlotter's TCP packet type with the port of the application you're using (ie: port 23 for SSH) and then run PingPlotter while you're using that application.

When you experience slowness, go to PingPlotter and make a note of it with comments (or, if you have the time, you might analyse things right then).

What you're trying to do is determine if there's a pattern, recognizable in the PingPlotter graphs, that happens when performance problems occur. If you can establish this pattern, then you might be able to determine where this pattern first occurs (ie: which hop or combination of hops) - which can help you understand what's causing the problem.

The huge packet loss numbers you're constantly seeing at the intermediate hops are not causing the problem - since that occurs all the time, even when performance is OK. The fact that those hops respond poorly makes it a bit more difficult to troubleshoot, but certainly not impossible.

Look for patterns in the final destination - packet loss periods, latency. Put comments on them when they happen. Then, go back and see if you can see things that happened in common at the times when you're seeing problems. Once you've identified that, use the route information and turn on some of the other time graphs to find the first hop where it occurs. That's a likely area where you'd start with phone calls / emails for more help.

- Pete

Top
#1539 - 11/12/05 09:45 AM Re: Latency diagnostic request: proxad routers [Re: Pete Ness]
Anonymous
Unregistered


Hello, and thanks for the prompt reply!

Opening an SSH session on my server's port (set to 6***, not the standard 23), I'm immediately feeling erratic behaviour: one time it takes 15 seconds to see the "ls -la" command appear in Putty, next time it'll show up instantly (it's not the server, load is low, memory usage within the limits).

So I'm firing up PP, go to the options to trace TCP packets on port 6*** (leaving all other values at default). All I see is: Destination address unreachable (100% packet loss) right after the router's address (192.168.0.254). Ethereal shows 192.168.0.254 sends back "Time to live exceeded" ICMP messages to 192.168.0.2, even when I take down the firewall. I must not be using the right settings.

When I stop the trace in PP, Ethereal reports packets out of order, retransmissions and many TCP Dup ACK while SSH commands are sent and acknowledged.

Top
#1540 - 11/12/05 12:27 PM Re: Latency diagnostic request: proxad routers
Pete Ness Offline



Registered: 08/30/99
Posts: 1106
Loc: Boise, Idaho
Hi, Eric.

Try changing your packet size to 40 bytes and see if that changes anything.

The default 56 byte packet size in PingPlotter means we add some additional "cargo" padded on the end of a packet - and that cargo isn't really an official part of the TCP SYN packet spec.

ICMP TTL Exceeded packets are expected - that's how we know about 192.168.0.254. If things are working right, you'll see a bunch of ICMP TTL Exceeded packets - from the entire route (except for the final destination where we'll see a TCP SYN-ACK packets).

- Pete

Top
#1541 - 11/13/05 06:45 PM Re: Latency diagnostic request: proxad routers [Re: Pete Ness]
Anonymous
Unregistered


Cool, that works: I can see useful data with the proper port and 40-bytes packets.

1.

Here's what I get when trying to synch and download from an FTP server in Eclipse's FTP perspective, on small PHP files:



Most of the times I get incomplete synchronization data, and as seen here, I often have to make several attempts in order to download all needed files.

Do I need to graph more points for the trace to be significant, or is this sufficient to pinpoint the router that's causing packet loss?

Thanks,
Eric

Top
#1542 - 11/13/05 06:52 PM Re: Latency diagnostic request: proxad routers
Pete Ness Offline



Registered: 08/30/99
Posts: 1106
Loc: Boise, Idaho
That's probably enough, but the route changes are making it difficult to see the intermediate hops very well. Can you send the save file (.pp2) to support@pingplotter.com so we can do a bit of analysis on it?

Otherwise, try combinging the routes into a single one, where possible. To do this, select the routes you want to combine in the route change window. The hops that are changing will be highlighted. Right-click on those hops and "Add route change mask", which will combine those routers.

Of course, combining routes mights hide important information. Send a .pp2 file if you can and we'll see if we can extract some good screen shots (we can mask the target server name before we post anything, or we can just interact over email and you can post a reply back to here with the findings, if you'd prefer to vet anything we post with your data).

- Pete

Top
#1543 - 11/14/05 09:56 AM Re: Latency diagnostic request: proxad routers [Re: Pete Ness]
Anonymous
Unregistered


Duh, forgot to save the data file...

Ran another FTP session, and sent you the data by mail at the given address. Quick graph from this second Eclipse FTP session:

2.



I'm also sending you data from latencies and disconnections to an online gameserver (port 3004), which might show complemetary patterns and provide more data points.

Thanks,
Eric

Top
#1544 - 11/14/05 12:11 PM Re: Latency diagnostic request: proxad routers
Pete Ness Offline



Registered: 08/30/99
Posts: 1106
Loc: Boise, Idaho
The route changes we're seeing your data indicate that there is a router somewhere in your path that doesn't like multiple outstanding requests. There are about 6 routers that just "disappear", which is a sure sign that the data we're collecting is wrong. That doesn't help us troubleshoot much! Here's a knowledgebase article on what I suspect is happening:

http://www.nessoft.com/kb/22

Unfortunately, the TCP packet type doesn't respect some of these settings as much as we'd like. In particular, all packets use the same thread, so changing the number of threads doesn't change the number of outstanding requests, so the only way you can control outstanding requests is to "pace" the outgoing requests.

Try changing the setting "Time interval between hop traces" to something on the order of "350", and then rerun the collection again. You'll need to have a trace interval of 5 seconds or longer (which will support roughly a 15 hop length).

Collect data like that and send us the results - we'll see if things change in a way that we can do anything else to improve the quality of the results. Once we get results that seem relatively real, we can try and build a case to communicate that information to whoever might be able to help solve things.

- Pete

Top
#1545 - 11/17/05 09:45 AM Re: Latency diagnostic request: proxad routers [Re: Pete Ness]
Anonymous
Unregistered


Hello again,

Did as you suggested. FTP seems to be a little better today, although it left two temporary upload files (.ns...) on the server:



Also did a connection test to a gaming area, could not log in:



Am also sending .pp2 data via email.
Eric

Top
#1546 - 12/19/05 10:39 AM Re: Latency diagnostic request: proxad routers
Anonymous
Unregistered


Hello again,

Just to let you know this problem is (still!) affecting many users. Apparently the problem only affects IP/DSL lines run by a different operator than the ISP we connect to (meaning the phone line is run by one company, and DSL access by another; there's probably a technical term for that but I don't know what it is...). Our ISP has been low on bandwidth for some time now for those lines run by a third party, and appears to be conducting QoS (and more precisely, bandwidth prioritization) on non-standard ports until the extra bandwith is received. The web and mail, etc., are running at reasonable speed, while access to ports > 1024 is problematic, with the latencies and paquet loss shown on the graphs provided earlier.

Thanks for your earlier responses. Although I'm still far from being an expert at building and reading these graphs, I learned a few things in the process :-)

Best,
Eric

Top

Search

Who's Online
0 registered (), 19 Guests and 0 Spiders online.
Key: Admin, Global Mod, Mod