Topic Options
#1145 - 05/19/04 11:11 AM Understanding route changes
blashley Offline


Registered: 05/19/04
Posts: 4
Loc: Boise, ID
Hi,

I would like to discuss the route changes listed in ping plotter. Are high number of routes changes in a certain time period bad? I know how to exclude an IP address, but I would rather leave all route changes listed so that I can view how many occur when troubleshooting connection issues with remote clients. I will give an example.

I have a client that is having problems getting disconnected from our servers. When I ran a trace route I am seeing extremely high numbers of route changes in the Savvis.net network. Just yesterday, I counted over 1300 route changes in a 5 hour time period. Is this normal?

My remote clients use Citrix to connect to our servers and the web client software is very senstive to network connectivity to begin with without having to contend with numerous route changes. Would this kind of connection be enough to 'simulate' an interrupted connection for a remote client?

I have contacted my ISP and they assured me that route changes are normal and needed. This was when I was seeing less than 100 route changes in an 8 hour period and the changes were taken place inside of Fiberpipe.net's network.

I am running tests again this morning to my remote client and I have already seen 100+ route changes in the Savvis.net network. This has been in a 2 hour period.

Any suggestions would be very helpful. Thank you.

Top
#1146 - 05/19/04 11:37 AM Re: Understanding route changes [Re: blashley]
Pete Ness Offline



Registered: 08/30/99
Posts: 1106
Loc: Boise, Idaho
A route change, by itself, isn't necessarily bad.

*MANY* service providers use load balancing and a route will oscillate between two or three routers at the same load-balanced hop. This is not a bad thing, and shouldn't have any affect on the end user experience.

There *can* be interesting relationships between packet loss or latency and route changes, however.

One of the valuable things that the route change information in PingPlotter can tell you is when a route change occurs that causes or fixes packet loss or latency issues. This type of situation might show deteriorating conditions (more packet loss, more latency), and then it suddenly clears up. If you look at the route change history in PingPlotter, you'll see a route change right when things cleared up - someone or something saw that some router was causing problems, and routed data around that router. This is a good thing.

The thing you always want to look for is packet loss or latency at the final hop. If the final hop is showing consistent latency and no packet loss, the route changes you're seeing are probably ensuring that the network stays as a well performing network. Unless you're seeing big latency variances or packet loss, this is almost certainly not going to cause problems with your Citrix connections.

Most often, a route with an oscillating router like you're probably seeing (a single hop that is shifting between multiple routers) can be safely ignored with a route change mask, which then highlights the route changes that make a real difference. If you're getting 1300 in 5 hours, then these route changes have reached the level of "noise", so you should almost certainly mask the particular oscillating router. Note that this only excludes that specific oscillation - and if a new router is introduced, you'll still be notified of that via the route change panel. The easiest way to create this mask is by right-clicking on the hop in the upper graph and using the add route change mask option there.

Some discussion of the importance of starting your troubleshooting efforts by reviewing the final destination are discussed in our knowledgebase: http://www.nessoft.com/kb/24

This is a great discussion topic - thanks for asking this question. Please feel free to continue the discussion if you think the route oscillation is causing packet loss or latency variances, or if you still think that the route changes are impacting the end user experience. We'll be happy to do what we can to help.

Top
#1147 - 05/19/04 12:25 PM Re: Understanding route changes [Re: Pete Ness]
blashley Offline


Registered: 05/19/04
Posts: 4
Loc: Boise, ID
Thank you Pete.

I have started a new trace route to the osciliating IP addresses. I setup the option to mask the IP so that it doesn't show in the main trace route to my remote client. When I trace route the 2 different osciliating IP address, I am seeing packet loss at the hop just before these addresses. It isn't a great amount of PL, about 5% at that most, but I do not see this loss when running the trace route to the remote client.

The packet loss seems to be random. Sometimes, it is within Fiberpipe.net and then the next 5 hops are fine. The next 4 hops have packet loss and then the next 2 hops are fine. Finally, I will have packet loss at the hop just before the destination hop. Now, this is only occurring when I trace route the osciliating IP addresses within the Savvis.net network.

If numerous route changes would not cause an interuption in connection, then I need to start troubleshooting the issue at the remote site. It is only affecing 2 clients and they are experiencing the same symptoms; disconnects. They have seperate IP addresses, but both trace routes to seperate locations show numerous route changes. My main concern was the route changes and how they affected network connectivity. I need to eliminate all possibilites. <img src="/forums/images/graemlins/wink.gif" alt="" />

Thanks again.

Top
#1148 - 05/19/04 01:41 PM Re: Understanding route changes [Re: blashley]
Pete Ness Offline



Registered: 08/30/99
Posts: 1106
Loc: Boise, Idaho
Quote:
I have started a new trace route to the osciliating IP addresses. I setup the option to mask the IP so that it doesn't show in the main trace route to my remote client.


I might be misunderstanding what you're using for a trace target here, but routers should generally *not* be targets for PingPlotter. Since you never actually send application data to routers (just through routers, based on the routing tables of providers), the data you collect by targetting a router is generally discounted (with reason) by anyone you might send troubleshooting data to. You really want to target a final destination that you're using, and look for packet loss and latency patterns there.

Quote:
The packet loss seems to be random. Sometimes, it is within Fiberpipe.net and then the next 5 hops are fine. The next 4 hops have packet loss and then the next 2 hops are fine. Finally, I will have packet loss at the hop just before the destination hop. Now, this is only occurring when I trace route the osciliating IP addresses within the Savvis.net network.


The pattern you describe here can often be correlated with a problem in the *return* route. Unfortunately, the internet protocols we have available to us today don't give us an opportunity to investigate the return route. Each router along the way has its own routing table to determine which route it sends packets down. In *many* instances, the decisions made by an intermediate router will return packets on a route different than the final destination will. When this occurs, you'll often see situations similar to what you are. An additional complication in this, however, is that sometimes organizational standards set up ICMP priorities such that a router (or group of routers) will drop packets - sometimes on purpose, sometimes because of priority issues. It's relatively difficult to say exactly what is causing this without tracing the return route from the router in question.

Since this is only occuring when you target one of the oscillating IP addresses, you should probably just ignore that data and move back to tracing the final destination. Internal routers often respond poorly to ICMP requests, even when they pass data through just fine.

Quote:
If numerous route changes would not cause an interuption in connection, then I need to start troubleshooting the issue at the remote site. It is only affecing 2 clients and they are experiencing the same symptoms; disconnects. They have seperate IP addresses, but both trace routes to seperate locations show numerous route changes. My main concern was the route changes and how they affected network connectivity. I need to eliminate all possibilites. <img src="/forums/images/graemlins/wink.gif" alt="" />


I suspect that you're tracing from you back to your clients. Although this can often be extremely useful as a technique, at many other times, you have to trace directly from the PC experiencing problems out to the target they're having problems with.

One technique our customers like you have had significant success with is to have their customer run PingPlotter for the course of a day. If their customer experience outages, go to PingPlotter and add a comment by right-clicking the period in the time graph where they experienced the outage, and entering a comment like "lost connection here". The 3 hour graph works pretty well for this as it only shows 3 hours and gives them enough detail to pick the right time period if they don't create a comment immediately.

If they experience outages during the day, at the end of the day (or at some period), have them save the data in PingPlotter, and then email the resultant .pp2 file to you (it's already compressed, so no need to .zip it up first unless you have email filter reasons for doing so). Once you get the file, you can try to correlate packet loss or latency numbers with their noted outages, and try and determine which hop is causing the problem. In an enourmous number of cases similar cases (a vast majority), you'll see that the problem exists in their network, in their connection to the internet, or someplace *very* close to this point. The information from PingPlotter will help you pinpoint which hop, and from there you can make suggestions about contacting their ISP, checking for excessive bandwidth usage on their network (see here for a discussion on that), replacing some piece of internal hardware, or some other action, depending on where their problem originates.

- Pete

Top
#1149 - 05/20/04 02:44 PM Re: Understanding route changes [Re: Pete Ness]
blashley Offline


Registered: 05/19/04
Posts: 4
Loc: Boise, ID
Ok, let me see if I can explain this without too much confusion. I have my remote client installed with the ping plotter software. I had them run a trace route to the server here in my office. I asked them to get a sample of the data every time they got the disconnect message. I have received 4 different samples so far and I am having a very difficult time determining exactly what this data is telling me.

The data samples I have show an excellent connection with average speed times of 107ms. There are 27 hops between the remote client and the server. I do see that there have been some spikes upto 500ms, but the average times are around the 110 mark which tells me that the spikes do not last very long or are not very frequent. At first glance, I would say there isn't a problem with this client.

Now for the mysterious part...

The graph at the bottom of the software has a strange gray area that is just blank. The graph will show a red space whenever packet loss has occurred throughout the path. This graph does not show any PL. It only shows a 'break' in the black life line, if you will, of the connection. If I were to give my opinion, I would say there was a break in the connection altogether.

Are these gray blank areas route changes?

FYI, this graph is listed as the destination IP address.

Thank you.


Attachments
1164-2 209-161-16-209.payrollamerica.com1.png



Top
#1150 - 05/20/04 11:02 PM Re: Understanding route changes [Re: blashley]
Pete Ness Offline



Registered: 08/30/99
Posts: 1106
Loc: Boise, Idaho
The blank spot in the graph indicates that no data was being collected during that period. This happens if the trace is stopped, and then resumed, or if the data is saved, then PingPlotter is closed and reloaded.

The ideal situation is really to have a data collection period running all day, and then to have the customer make note of the times of outages. If they save the data to then send you, you want to make sure they keep tracing - they might be stopping it to send data to you, and then resuming during the next outage (and possibly missing the next outage). You want to make sure they were collecting data during the period of the outage.

The periods of 500 ms might be causing problems with Citrix. I've heard reports of that - 500+ ms latencies can cause disconnects in Citrix.

If you want to email a copy of the data files to support@pingplotter.com, I'll have a look at them.

Top
#1151 - 06/08/04 11:58 AM Re: Understanding route changes [Re: Pete Ness]
blashley Offline


Registered: 05/19/04
Posts: 4
Loc: Boise, ID
Thank you Pete. I have another question. Does Ping Plotter record any TCP/IP or UDP errors when tracing?(e.g., netstat -e -s). If so, are these counted as packet loss? I am still at a loss on trying to figure out this disconnect issue. <img src="/forums/images/graemlins/confused.gif" alt="" />

Top
#1152 - 06/08/04 01:39 PM Re: Understanding route changes [Re: blashley]
Pete Ness Offline



Registered: 08/30/99
Posts: 1106
Loc: Boise, Idaho
Any TCP/IP or UDP errors that impacted PingPlotter's data collection will be shown as lost packets in PingPlotter. There are a few numbers in Netstat that have a direct relationship to PingPlotter data.

Destination Unreachable - in UDP mode, this will increment based reaching the final target.

Time Exceeded - intermediate hops use Time Exceeded packet, so both ICMP and UDP traces will increment this number pretty quickly.

Echos - Sent - in ICMP mode, each sample will increment this for each hop - so PingPlotter will change this number relatively rapidly - this is the number *sent*, so the number doesn't really mean anything for stability / connection.

There is no direct relationship between "Errors" and PingPlotter data, although if an error happens while sending a PingPlotter packet, it will be captured in PingPlotter.

I would suggest trying to correlate the real-life results with PingPlotter data. When a disconnect happens, go to PingPlotter, right-click the lower graph at the time-point where the disconnect occurred, and then record "Service got disconnected, couldn't reconnect for 30 seconds", or "Painfully slow performance - lasted 10 minutes", or whatever symptom is being observed. After recording a few of these, examine the PingPlotter data for events that might have caused these events - if PingPlotter is showing solid latencies and no packet loss, then the problem might be something that PingPlotter can't help capture (ie: application problems, network problems specific to the application in use, or similar).

Top

Search

Who's Online
0 registered (), 36 Guests and 0 Spiders online.
Key: Admin, Global Mod, Mod