Topic Options
#1893 - 07/19/07 04:17 AM Data Analysis - Help!
Roflcopter Offline


Registered: 06/29/07
Posts: 6
Hi All,

I've been logging data from one of our customers over the last couple of weeks. We have two PP traces running at 2.5s intervals, one from the customer's site to our servers and the other from a server in our datacentre to the customer's site.

The problem is that the customer is quite often reporting that our application is performing slowly, when we're not seeing the same behaviour ourselves. I'm really at a loss as to where the problem might lie. I'll attach the two trace images in a sec...

Thanks in advance for any advice that can be given!

Chris.

Top
#1894 - 07/19/07 04:18 AM Re: Data Analysis - Help! [Re: Roflcopter]
Roflcopter Offline


Registered: 06/29/07
Posts: 6
Attached is the data tracing the route from our customer to our servers


Attachments
1929-To Servers.png



Top
#1895 - 07/19/07 04:19 AM Re: Data Analysis - Help! [Re: Roflcopter]
Roflcopter Offline


Registered: 06/29/07
Posts: 6
And here is the trace from our servers to the customer... notice the sudden 100% packet loss on two of the hops - what's going on there!?


Attachments
1930-From servers.png




Edited by Roflcopter (07/19/07 04:20 AM)

Top
#1896 - 07/19/07 02:13 PM Re: Data Analysis - Help! [Re: Roflcopter]
Pete Ness Offline



Registered: 08/30/99
Posts: 1106
Loc: Boise, Idaho
Ideally, you'd want to have your customer record the points when they have performance problems. PingPlotter lets you do this by right-clicking on a time graph at the point where you had a problem, then use the "Add Comment" menu option. This gives you a great way of comparing collected PingPlotter data with the customer experience - and also makes your customer think about the problem a bit as it's happening, sometimes making the problem description a bit more useful than "It's slow sometimes!". Alternately, if you can have them write it down somewhere and send you the problem times, then you can look at PingPlotter at these times and see if it's a network problem that PingPlotter is capturing.

Let's have a look at your data. First off, there's some packet loss when tracing from one side that isn't there coming from the other side. When going from your server to the customer, the hop 5 -> hop 6 link is dropping packets. This is probably the return route dropping ICMP TTL expired packets, and isn't a problem that needs to be addressed, since the type of packets being dropped aren't data packets - they're status packets about an error condition. The fact that this is happening on all hops to the final destination might indicate that there might be a bandwidth constraint somewhere there which is causing the router to drop these packets, but that's just speculation since it could be a router rule that always drops this number of that type of packet.

You asked about the 100% packet loss at the two intermediate hops. It's pretty likely that there was a route change that occurred when this packet loss started and there are other routers servicing those hops. Have a look at the route change area to see - clicking on one of the route change times will show the route that was current at that time period, which might show you different routers.

There is one interesting problem that shows up on both of your graphs - at 7pm or so on the 15th there is a packet loss spike all the way through your route. This also caused the previously mentioned route change. There was probably a network disruption at this point - and it happened somewhere near Edinburgh.

There's also another "disruption" at 9am or so on the 16th, but this one is a lot closer to the customer. There's not quite enough data here to see where (not all the time graphs are tuned on), but this is a distinct and different problem from the Edinburgh problem the day before.

If the customer is complaining about problems that happen more than a few times a day, there is some latency and packet loss showing up on the 16th, but the 48 hour time scale is making it a bit hard to see the specifics of this. It looks, though, to be happening between hops 1 and 2 for the customer - possibly some kind of bandwidth saturation. If you zoom in to a 30 minute view on the data where the customer is tracing to you, you'll see some packet loss and latency spikes that are being carried from hop 2 (or maybe 3) all the way through to the final destination. These might be perceived as lag by the customer on some kinds of applications (Citrix, for example). You'd need to zoom in on that data a bit more to see for sure, though - I can't see the scale of these problems without zooming in to see it in more detail.

- Pete

Top
#1897 - 07/19/07 04:45 PM Re: Data Analysis - Help! [Re: Pete Ness]
Roflcopter Offline


Registered: 06/29/07
Posts: 6
Brilliant reply (as always, it seems!), thanks for that Pete.

I'm trying to get details of when they're having problems - unfortunately since they're on the other side of the country I have to rely on this information being sent to me separately, and I don't have enough to draw any conclusions yet.

I'm continuing to gather data from both ends, so I'll give it a few more days and see if we can get a better idea of when they're having problems.

Once again, thanks for the quick and detailed reply - the best software support I've ever come across!

Chris.

Top

Search

Who's Online
0 registered (), 29 Guests and 2 Spiders online.
Key: Admin, Global Mod, Mod