Home > Archive > CCNP > February 2001 > Name that fault: or are you ready for the real world of networking?





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Name that fault: or are you ready for the real world of networking?
teza

2001-02-24, 3:40 pm

Here is a true life fault for everyone to ponder over, see if you get the answer (or ay least what steps you would take to track it down). Ill tell how I solved it in a few days.

A customer of mine reported that they could not connect to host systems in Australia (they are in New Zealand). They are connected to Australia by a router to router circuit from New Zealand that has a Frame relay PVC and uses STATIC routes only (no dynamic routing is used). Static routes are used because the Australian side of the network is run by another company and they dont want to use BGP etc (boo hiss ).
1: the circuit to Australia is up
2: the PVC appears to be up
3: the other Network provider in Australia says his network is working.
4:There appears to be no problems with the local (New Zealand) Lan.
5: routes between the 2 countries are static

Here is a brief diagram of the connection:

Australian:-----------------:NZ Router:----:-Lan
Router : PVC :
:
This is all the info I had to work with at the start. What fault finding steps would you take knowing this info? Post commands, fault finding steps etc and Ill tell you what results you would get i.e. Was there a problem with the line etc etc?
This should be an interesting excercise for those about to do the support exam or those interested in how it works in the real world.
This fault actually happened and resolution will suprise you.

2001-02-24, 5:22 pm

I assume you have access to the New Zealand router and that the serial interface is up/up and error free. From New Zealand router, ping Austrailia's ethernet interface using NZ's ethernet as the source IP. Since you say there is a problem I imagine that ping will time out. Do a Sh ip route from NZ for the Austrailian ethernet. Ping the serial int on the Aust router that is directly connected. This is a basic circuit and the solution should be easy to find.

Yankee

2001-02-24, 6:43 pm

Yankee is correct... bearing that the assumption is true (NZ is in up/up state at the serial port). Next step (actually one I'd do before pinging) is to determine the port/protocol state of the AU router. If port but no protocol... do shut - no shut in serial interface... if still down check with telco NOC for LMI. This happens more than I like...

2001-02-24, 8:39 pm

NZ router is up. Serial interface in NZ is up and up. No CRC's, no timeouts, no interface resets. NZ router is able to Ping Australian router from serial to serial, Serial to Eth and Eth to eth in the Aussie router(different company). From the NZ router I could also ping the hosts in the Australian network. Austalia said that they could also ping the NZ router interfaces from the DIRECTLY connected router (hint hint).
Yup it seems a basic problem but often they are not.
Right so far we have... The physical interface is up, we can ping the directly connected router in Australia. No one has said to check the Frame relay part of the circuit yet (always work from the bottom up i.e Physical, datalink etc etc)although in this case is is not the cause of the fault.


quote:
Originally posted by Yankee
I assume you have access to the New Zealand router and that the serial interface is up/up and error free. From New Zealand router, ping Austrailia's ethernet interface using NZ's ethernet as the source IP. Since you say there is a problem I imagine that ping will time out. Do a Sh ip route from NZ for the Austrailian ethernet. Ping the serial int on the Aust router that is directly connected. This is a basic circuit and the solution should be easy to find.

Yankee

2001-02-24, 9:42 pm

OK - so the circuit between the sites is up... check the AU router for access lists? I am thinking that maybe this has something to do with the configuration of the eth port on the AU router (based on the info of teza's latst post). Teza- i assume from your description that Au acan only ping NZ from the router and not any hosts- is that correct?

2001-02-25, 1:09 am

The Australian network provider couldnt test from the hosts as they had no access to them but had routes in their network to the hosts and could ping them(so they said). I could ping the hosts from the router in New Zealand. There were no access lists applied to either(?) router that would stop traffic.
Now we have the following:
Physical line up and no problems noted
Frame relay up no errors
able to ping to the hosts from the New Zealand router
Aussie able to ping the New Zealand router from the directly attached router.
No access lists on either router that could affect traffic.
There is still a few test steps that havnt yet been carried out and should be next on your list.
I told you it wasnt a simple fault

quote:
Originally posted by network geek
OK - so the circuit between the sites is up... check the AU router for access lists? I am thinking that maybe this has something to do with the configuration of the eth port on the AU router (based on the info of teza's latst post). Teza- i assume from your description that Au acan only ping NZ from the router and not any hosts- is that correct?

2001-02-25, 5:16 am

One of the first things I have to ask when something like this happens is (assuming this was a previously working link): What changes were made, regardless of how small they seem, before this "outage".
While waiting for the answer to that, I would have to trace from the router in NZ to a host in Aus. Watch the trace . . . see where it goes. If it fails along the way, then you have a focal point for further troubleshooting.
In addition to that, trace from Aus to NZ. Same logic applies.
On the frame side, has LMI been verified? For S&Gs (Sh**s and giggles) check the logs on all routers involved to see if there's been any low level hardware problems.
If possible, turn on debugging where necessary (IP packet if you can). Watch the messages, act accordingly.
Just some thoughts, I'll wait for a response before I add anything else.

2001-02-25, 6:49 am

Seems to me that we have proven "connectivity" from NZ to AU... I guess my question at this point is what specifically is not working. At the start of this- Teza said "they could not connect to host systems in Australia". Here's the "anal" question- what does that mean? What kind of "connections" were they having problems with? It couldn't be "all" connections because you were able to ping from workstations at NZ to hosts at AU... that is considered a connection... so they "were" able to "connect" to hosts is AU from that perspective. Am I rambling...?? Sorry-

T! please give us some more info on what the workstations in NZ could and could not connect to in AU. Is this an IP only network?

Meanwhile... I'll go get an icepack for my head.

2001-02-25, 8:15 am

No one has said to check the Frame relay part of the circuit yet (always work from the bottom up i.e Physical, datalink etc etc)although in this case is is not the cause of the fault.


[/B][/QUOTE]

No need to debug frame LMI to check the "frame side of the circuit" as you say. Protocol is up that means Layer 2 is up. You neatly avoided answering the sh ip route comment I made, which of course would explain the "directly connected" hint you unnecessarily gave.

I troubleshoot frame problems everyday and this one would take only a couple of minutes to fix with access to both routers or with a tech on the router at the other end.

Yankee

2001-02-25, 1:56 pm

Yankee It wasnt a frame relay problem and yes I thought it would take only acouple of minutes to fix as well (Ive worked with frame for 3-4 years) . The interface being up up dosnt mean there cannot be a frame problem its always good to make sure there are no problems on the PVC itself i.e Becns,Fecns etc etc
Network geek You are definatly on the right track to solve the problem. If you read back a bit I have so far only attempted to prove connectivity from the Router (to prove out the international link etc). The next step would be to ping and traceroute from a workstation on the NZ Lan (NT) to see where the path is falling down. I did this and could trace route and ping only to the serial port in Australia from the workstations on the NZ Lan. Now as I said earlier it is all static routes to Aussie. The routes in NZ are all there correctly,I checked them and also got someone else to check (which is always a good idea a second pair of eyes always helps). This led my co workers and i to summise that there was a problem in the Australian routing table, this led us to ask the question that Blue baron asked (and we had asked before) had there been a change in the Australian network ? The answers we got from 2 different techs in Australia were from one 'yes but shouldnt affect us' and from the other Tech (who was their boss) 'no there have been no changes'. When you get those kind of conflicting answers you start to worry . We requested read access to the Aussie router to check their routing tables to make sure they had routes back to the Lan, but were not allowed to log on. I then asked for them to fax me a copy of the static routes to our networks. After about 30-45 minutes they arrived. When I checked them the were correct. We (in NZ) were starting to get a bit perplexed and were about to give someone else a call to get some advice. I did another check from the workstation and low an behold I could now ping the hosts in Aussie. Hmmm they were not working before they faxed us the routing table but were after.....They said they changed nothing...I checked with one of my collegues (CCIE) and he came to the same conclusion as I did, that there must have been missing routes in the Australian router and they had been added after we requested the fax .
We are pretty sure of this due to the step by step fault finding process we went through (and I asked one of our CCIE's ).
The worst outcome of this whole affair was that the Australian network provider (who my customer was not to happy about even before this fault)have told my customers head office (in Australia) that the cause of the fault was us. As the fault caused large amounts of disruption to my customer it has gone all the way to their Executive board i.e Directors, Chairman etc (eek). I spent 2 days last week preparing an incident report for them and as I documented everything I did during the fault we are of the hook. Its not the way I would have liked to get my name in front of my customers big wigs but they now seem to be confident with my companies running of the network and this could lead to gaining the bussiness in Australia at the expense of the other provider .

Lesson to be learnt?
1: check the line
2: check the PVC
3: check layer 3 connectivity from routers and Lans
4: If possible get your work peer reviewed (2 pairs of eys are better than 1)
5: document everything
6: keep the customer updated on progress(even if there is none)
7: NEVER believe the other provider (thats not to say they are all like the ones I worked with)always double check.
8: never punch a desk (it hurts)


[/B][/QUOTE]

No need to debug frame LMI to check the "frame side of the circuit" as you say. Protocol is up that means Layer 2 is up. You neatly avoided answering the sh ip route comment I made, which of course would explain the "directly connected" hint you unnecessarily gave.

I troubleshoot frame problems everyday and this one would take only a couple of minutes to fix with access to both routers or with a tech on the router at the other end.

Yankee [/B][/QUOTE]

2001-02-25, 2:15 pm

Don't you just hate it when you spend days troubleshoting a problem - while being under intense pressure from the customer- getting NO help from the "other" guys (telco) or out right lies... They all point their fingers at YOU! But through much patience, determination and know-how you finally break the problem and prove yourself- causing the customer to praise you all the more- victory is YOURS!

2001-02-25, 3:15 pm

Not wrong there. I always try to give the other guy the benefit of the doubt but i guess Ill have to change that . The worse thing is that if you want to diagnose faults and havnt got access to the other router then you rely on the other techs to help and if they are more interested in covering their butt than solving the fault then you get nowhere. If Ive cocked up then I would admit it trying to cover up causes all sorts of problems. Oh well you live and learn.

I think that putting real life problems like this on the board and then getting everybody to try and solve it is a great way for people to apply the knowledge they have learned, so lets have a few more. Im sure there is a lot of interesting faults out there.

quote:
Originally posted by network geek
Don't you just hate it when you spend days troubleshoting a problem - while being under intense pressure from the customer- getting NO help from the "other" guys (telco) or out right lies... They all point their fingers at YOU! But through much patience, determination and know-how you finally break the problem and prove yourself- causing the customer to praise you all the more- victory is YOURS!

2001-02-26, 5:27 pm

If as you initially stated there was no connectivity to hosts on the other ethernet you are wasting your time looking for DE packets and FECNs/BECNs. While I agree those can be problems, they don't cause your symptoms.

Another post you made that you could ping Aussie hosts from NZ. This couldn't be true if the static route was still missing because the return ICMP wouldn't know how to get back to NZ.

Had you done the show ip route and seen your routes were good to the Aussies, but no hosts were reachable on one ethernet then it had to be a routing issue. The disadvantage you had was not having access to NZ router and I really feel for you there! In that case you're only as good as the help at the other end

Yankee
Sponsored Links





Free Braindumps | MCSE braindumps software forum

Copyright 2003 - 2008 examnotes.net