Archive

Archive for the ‘BGP’ Category

Some important details of BGP

September 14, 2012 11 comments

We start out with a basic topopology of 3 routers.

R2 and R3 will peer to each others loopback. I have setup OSPF for full reachability
in the network. First we test connectivity.

R2#ping 3.3.3.3 so lo0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 3.3.3.3, timeout is 2 seconds:
Packet sent with a source address of 2.2.2.2
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 40/53/80 ms

There is connectivity. We setup the peering and set ebgp-multihop to 2 since
this is what most people do. I will explain why this is not a good idea.

R2(config)#router bgp 1
R2(config-router)#nei 3.3.3.3 remote-as 3
R2(config-router)#nei 3.3.3.3 update-source loopback 0
R2(config-router)#nei 3.3.3.3 ebgp-multihop 2
R3(config)#router bgp 3
R3(config-router)#nei 2.2.2.2 remote-as 1
R3(config-router)#nei 2.2.2.2 update-source loopback 0
R3(config-router)#nei 2.2.2.2 ebgp-multihop 2

The session comes up.

 %BGP-5-ADJCHANGE: neighbor 2.2.2.2 Up

All good so far. We are not advertising anything yet. We add another loopback
on R3 and advertise that into BGP. We check if R2 is receiving it.

R2#sh bgp ipv4 uni
BGP table version is 3, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 33.33.33.33/32   3.3.3.3                  0             0 3 i

It looks good so far. Now lets think for a while what ebgp-multihop
actually does. The default setting for eBGP is to check that incoming BGP
packets are destined for a directly connected interface. So the default is
to do a connected-check and ebgp-multihop = 1. When we set ebgp-multihop 2
the outgoing TTL is set to 2 and the connected-check is disabled. We confirm
this with a packet capture.

So the TTL is set to 2, is this really necessary? The common argument is that
because we are peering to a loopback the TTL must be set to 2 because the
TTL is decremented before reaching the loopback. When do routers modify packets
before transmitting them? On the egress interface right? We try this theory by
setting up a peering between R1 and R3. We will use no ebgp-multihop to begin
with and then we will debug ip icmp. We have to disable the connected-check
otherwise BGP will only stay idle because a loopback can never be directly
connected.

R1(config-router)#nei 3.3.3.3 remote-as 3
R1(config-router)#nei 3.3.3.3 update-source lo0
R1(config-router)#nei 3.3.3.3 disable-connected-check
R3(config-router)#nei 1.1.1.1 remote-as 1
R3(config-router)#nei 1.1.1.1 update lo0
R3(config-router)#nei 1.1.1.1 disable-connected-check

We can now see that R2 is sending ICMP time exceeded message to R1 and R3.

R1: ICMP: time exceeded rcvd from 12.12.12.2
R3: ICMP: time exceeded rcvd from 23.23.23.2

This is because the TTL was set to 1. The TTL expired while in transit.

Now we setup a peering between R1 and R2 using the loopbacks. We will disable
the connected-check.

R1(config-router)#nei 2.2.2.2 remote-as 1
R1(config-router)#nei 2.2.2.2 update lo0
R1(config-router)#nei 2.2.2.2 disable-connected-check
R2(config-router)#nei 1.1.1.1 remote-as 1
R2(config-router)#nei 1.1.1.1 update lo0
R2(config-router)#nei 1.1.1.1 disable-connected-check

Now according to the people that say that TTL must be 2 for peering to come up
we will prove that this is wrong. The reason peering does not come up when using
loopbacks is that BGP is checking if it is directly connected or not. We take a
look at a BGP packet sent when using the disable-connected-check.

We clearly see that the TTL is 1 but the session still comes up. This proves
that is is not TTL that is expiring when peering to loopbacks!

R1#sh bgp all sum
For address family: IPv4 Unicast
BGP router identifier 1.1.1.1, local AS number 1
BGP table version is 9, main routing table version 9
2 network entries using 240 bytes of memory
2 path entries using 104 bytes of memory
3/2 BGP path/bestpath attribute entries using 372 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
Bitfield cache entries: current 1 (at peak 2) using 32 bytes of memory
BGP using 772 total bytes of memory
BGP activity 5/3 prefixes, 5/3 paths, scan interval 60 secs

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
2.2.2.2         4     2      83      80        9    0    0 00:02:45        1

Finally I want to bring up another disadvantage of using the ebgp-multihop
command when peering between directly connected routers using loopbacks.
We have a peering between R2 and R3. What happens when we shutdown the
interface on either router?

R2(config-router)#int f1/0
R2(config-if)#sh
R2(config-if)#
%OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Interface down or detached
R2(config-if)#
%LINK-5-CHANGED: Interface FastEthernet1/0, changed state to administratively down
%LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet1/0, changed state to down
R2(config-if)#do sh bgp ipv4 uni
BGP table version is 11, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*  33.33.33.33/32   3.3.3.3                  0             0 3 i

When we shutdown the interface the peering still stays up. This is because when using
ebgp-multihop the fast-external-fallover feature can not be used at the same time. This could
lead to blackholes since the peering stays up until the hold time expires (180s). In our
case we have no valid next-hop but what if we put in a default route?

R2(config)#ip route 0.0.0.0 0.0.0.0 12.12.12.1
R2(config)#int f1/0
R2(config-if)#sh
R2(config-if)#do
%OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Interface down or detached
R2(config-if)#do
%LINK-5-CHANGED: Interface FastEthernet1/0, changed state to administratively down
%LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet1/0, changed state to down
R2(config-if)#do sh bgp ipv4 uni
BGP table version is 12, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 33.33.33.33/32   3.3.3.3                  0             0 3 i
R2(config-if)#do sh bgp ipv4 uni
BGP table version is 12, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 33.33.33.33/32   3.3.3.3                  0             0 3 i

Now the route stays in the BGP table until the holdtime expires which creates a
black hole. The default route is now functioning to make sure there is a next-hop
available.

By this post I hope you have got a better understanding of these BGP features
and how a router handles control plane packets. As usual post in comments
if you have any feedback or questions.

Conditional BGP, showing status

April 6, 2011 Leave a comment

This is some notes for the post that I did on conditional BGP advertisement.
I found an easier way of seeing the status for the advertise-map. It is
available by looking at the neighbor status, look at the following output.

Rack7R3#show bgp ipv4 uni nei 155.7.37.7 | i Condition
Condition-map NON_EXIST, Advertise-map ADVERTISE, status: Advertise

The prefix is currently being advertised. Then we bring up the interface.

Rack7R3(config)#int s1/2
Rack7R3(config-if)#no sh

Rack7R3#show bgp ipv4 uni nei 155.7.37.7 | i Condition
Condition-map NON_EXIST, Advertise-map ADVERTISE, status: Withdraw

The prefix is no longer advertised. Remember that the BGP scanner runs every
60 seconds by default so it may take some time before we see results.

Categories: BGP, CCIE Tags: ,

Conditional BGP advertisement with advertise-map

April 5, 2011 1 comment

This post will describe how to do conditional advertising with BGP. In a real life scenario this can be used to only announce routes to your backup provider when your primary link is down. In a lab scenario this can be used when you are faced with a scenario that says you have to make sure that traffic comes in on interface X/X but if that interface fails it should come in on interface Y/Y. The image below describes the scenario.

We start by putting addresses on interfaces and enable basic BGP. The loopbacks on the Cust router are used for announcing networks.

Cust:

interface Loopback1
ip address 1.1.1.1 255.255.255.0
!
interface Loopback2
ip address 2.2.2.2 255.255.255.0
!
interface Loopback3
ip address 3.3.3.3 255.255.255.0
!
interface Loopback4
ip address 4.4.4.4 255.255.255.0
!
interface FastEthernet0/0
ip address 136.1.13.3 255.255.255.0
no shut
!
interface Serial0/0
ip address 136.1.23.3 255.255.255.0
clock rate 2000000
no shut
!
router bgp 300
no synchronization
bgp log-neighbor-changes
network 1.1.1.0 mask 255.255.255.0
network 2.2.2.0 mask 255.255.255.0
network 3.3.3.0 mask 255.255.255.0
network 4.4.4.0 mask 255.255.255.0
network 136.1.13.0 mask 255.255.255.0
neighbor 136.1.13.1 remote-as 100
neighbor 136.1.23.2 remote-as 200
no auto-summary

ISP1:

interface FastEthernet0/0
ip address 136.1.12.1 255.255.255.0
no shut
!
interface FastEthernet0/1
ip address 136.1.13.1 255.255.255.0
no shut
!
router bgp 100
no synchronization
bgp log-neighbor-changes
network 136.1.12.0 mask 255.255.255.0
neighbor 136.1.12.2 remote-as 200
neighbor 136.1.13.3 remote-as 300
no auto-summary

ISP2:

interface FastEthernet0/0
ip address 136.1.12.2 255.255.255.0
no shut
!
interface Serial0/0
ip address 136.1.23.2 255.255.255.0
clock rate 2000000
no shut
!
router bgp 200
no synchronization
bgp log-neighbor-changes
neighbor 136.1.12.1 remote-as 100
neighbor 136.1.23.3 remote-as 300
no auto-summary

If we look at ISP2 we have two active BGP session with four prefixes over each.

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
136.1.12.1      4   100       6       7                       9              0       0            00:01:47        4
136.1.23.3      4   300       5       6                       9              0       0            00:00:33        4

ISP2#sh ip bgp
BGP table version is 9, local router ID is 136.1.23.2
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
              r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
   Network          Next Hop            Metric LocPrf Weight Path
*> 1.1.1.0/24    136.1.23.3             0                          0         300 i
*                       136.1.12.1                                         0        100 300 i
*> 2.2.2.0/24   136.1.23.3              0                          0        300 i
*                       136.1.12.1                                        0          100 300 i
*> 3.3.3.0/24    136.1.23.3             0                         0          300 i
*                       136.1.12.1                                         0         100 300 i
*> 4.4.4.0/24    136.1.23.3             0                         0          300 i
*                       136.1.12.1                                       0           100 300 i
 

Lets do a ping and traceroute to verify reachability first.

ISP2#ping 1.1.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/38/88 ms
ISP2#trace
ISP2#traceroute 1.1.1.1
Type escape sequence to abort.
Tracing the route to 1.1.1.1
  1 136.1.23.3 44 msec *  84 msec

We have reachability. The next step is to announce the Ethernet link on the
cust router into BGP. We need this prefix in BGP to be able to track it.

Cust(config)#router bgp 300
Cust(config-router)#network 136.1.13.0 mask 255.255.255.0

ISP will see this prefix as a RIB-failure since it has a route with better AD (connected).

We then configure the Cust router to only advertise 1.1.1.0/24 if the Ethernet link is down.

Cust#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
Cust(config)#ip prefix-list 1-NETWORK seq 5 permit 1.1.1.0/24
Cust(config)#ip prefix-list 13-NETWORK seq 5 permit 136.1.13.0/24
Cust(config)#route-map ADVERTISE permit 10
Cust(config-route-map)#match ip address prefix-list 1-NETWORK
Cust(config-route-map)#exit
Cust(config)#route-map NON_EXIST permit 10
Cust(config-route-map)#match ip address prefix-list 13-NETWORK
Cust(config-route-map)#exit
Cust(config)#router bgp 300
Cust(config-router)#neighbor 136.1.13.1 advertise-map ADVERTISE non-exist-map NON_EXIST
Cust(config-router)#^Z

The advertise-map permits prefixes to be announced when the prefixes in the NON_EXIST map are not in the BGP table.

Other prefixes will not be affected by this configuration. Lets look at what Cust is announcing to ISP2.

Cust#sh bgp ipv4 uni nei 136.1.23.2 advertised-routes
BGP table version is 7, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
              r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
   Network          Next Hop            Metric LocPrf Weight Path
*> 2.2.2.0/24       0.0.0.0                  0         32768 i
*> 3.3.3.0/24       0.0.0.0                  0         32768 i
*> 4.4.4.0/24       0.0.0.0                  0         32768 i
*> 136.1.13.0/24    0.0.0.0                0         32768 i
Total number of prefixes 4

We can see that 1.1.1.0/24 is no longer being announced. Ping from ISP2 confirms reachability and a traceroute shows that traffic is passing through ISP1.

ISP2#ping 1.1.1.1      
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 44/76/108 ms
ISP2#traceroute 1.1.1.1
Type escape sequence to abort.
Tracing the route to 1.1.1.1
  1 136.1.12.1 100 msec 96 msec 44 msec
  2 136.1.13.3 [AS 300] 92 msec *  116 msec

We then do a shutdown of the Ethernet link on Cust and look at the results.

Cust#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
Cust(config)#int f0/0
Cust(config-if)#sh
Cust(config-if)#
*Mar  1 00:27:22.007: %BGP-5-ADJCHANGE: neighbor 136.1.13.1 Down Interface flap
Cust(config-if)#
*Mar  1 00:27:23.983: %LINK-5-CHANGED: Interface FastEthernet0/0, changed state to administratively down
*Mar  1 00:27:24.983: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/0, changed state to down
Cust(config-if)#
Cust#sh bgp ipv4 uni nei 136.1.23.2 advertised-routes
BGP table version is 12, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
              r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
   Network          Next Hop            Metric LocPrf Weight Path
*> 1.1.1.0/24       0.0.0.0                  0         32768 i
*> 2.2.2.0/24       0.0.0.0                  0         32768 i
*> 3.3.3.0/24       0.0.0.0                  0         32768 i
*> 4.4.4.0/24       0.0.0.0                  0         32768 i
Total number of prefixes 4

BGP table on ISP2. Ping working and traffic now going the direct path.

ISP2#sh ip bgp
BGP table version is 15, local router ID is 136.1.23.2
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
Network                      Next Hop  Metric LocPrf Weight Path
*> 1.1.1.0/24            136.1.23.3 0                              0         300 i
*                                136.1.12.1                                 0         100 300 i
*> 2.2.2.0/24            136.1.23.3 0                               0         300 i
*                                136.1.12.1                                 0           100 300 i
*> 3.3.3.0/24           136.1.23.3  0                               0          300 i
*                               136.1.12.1                                  0            100 300 i
*> 4.4.4.0/24            136.1.23.3  0                             0           300 i
*                               136.1.12.1                                  0             100 300 i
r> 136.1.12.0/24        136.1.12.1 0                            0           100 i
*> 136.1.13.0/24       136.1.12.1                                 0            100 300 i
ISP2#ping 1.1.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/40/80 ms
ISP2#tra
ISP2#traceroute 1.1.1.1
Type escape sequence to abort.
Tracing the route to 1.1.1.1
1 136.1.23.3 92 msec * 24 msec

If we debug BGP updates we will se entries like this.

*Mar  1 01:05:03.067: BPG(0): Condition NON_EXIST changes to Withdraw
*Mar  1 01:05:03.067: BPG(0): Condition NON_EXIST changes to Withdraw
*Mar  1 01:06:03.079: BPG(0): Condition NON_EXIST changes to Advertise
*Mar  1 01:06:03.079: BPG(0): Condition NON_EXIST changes to Advertise

Categories: BGP, CCIE Tags: , ,

BGP troubleshooting – route not installed

February 12, 2011 Leave a comment

Sometimes prefixes in BGP do not get installed into the routing table, if the route is also in an IGP that might be a reason but then a RIB-failure would be indicated. This scenario shows another possible source of problems. Once again, the topology is this.

All internal routers are running iBGP in a full mesh. Routers R4 and R6 have eBGP peerings to the backbone routers which are injecting external prefixes into the AS. All internal routers are announcing their loopbacks into BGP. SW3 is trying to reach 119.0.0.1 in the prefix 119.0.0.0/8 but is unable to do so, lets look at some output.

Rack1SW3#ping 119.0.0.1 so lo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 119.0.0.1, timeout is 2 seconds:
Packet sent with a source address of 150.1.9.9
…..
Success rate is 0 percent (0/5)

SW3 can’t reach 119.0.0.1, why?

Rack1SW3#sh ip route 119.0.0.0
% Network not in table

We have no route there, what routes can we see from BGP?

Rack1SW3#sh ip route bgp
     150.1.0.0/24 is subnetted, 10 subnets
B       150.1.7.0 [200/0] via 155.1.79.7, 01:26:58
B       150.1.6.0 [200/0] via 155.1.67.6, 01:26:58
B       150.1.5.0 [200/0] via 155.1.45.5, 01:26:58
B       150.1.4.0 [200/0] via 155.1.146.4, 01:26:58
B       150.1.3.0 [200/0] via 155.1.37.3, 01:26:58
B       150.1.2.0 [200/0] via 155.1.23.2, 01:26:58
B       150.1.1.0 [200/0] via 155.1.146.1, 01:26:58
B       150.1.10.0 [200/0] via 155.1.108.10, 01:26:45
B       150.1.8.0 [200/0] via 155.1.58.8, 01:26:45

We can see all the loopbacks just fine but we have no route to the external prefixes. What is R6 announcing to us?

Rack1SW3# sh ip bgp nei 155.1.67.6 routes
BGP table version is 11, local router ID is 150.1.9.9
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
              r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
   Network          Next Hop            Metric LocPrf Weight Path
* i28.119.16.0/24   54.1.1.254               0    100      0 54 i
* i28.119.17.0/24   54.1.1.254               0    100      0 54 i
* i112.0.0.0            54.1.1.254               0    100      0 54 50 60 i
* i113.0.0.0            54.1.1.254               0    100      0 54 50 60 i
* i114.0.0.0            54.1.1.254               0    100      0 54 i
* i115.0.0.0            54.1.1.254               0    100      0 54 i
* i116.0.0.0            54.1.1.254               0    100      0 54 i
* i117.0.0.0            54.1.1.254               0    100      0 54 i
* i118.0.0.0            54.1.1.254               0    100      0 54 i
* i119.0.0.0            54.1.1.254               0    100      0 54 i
*150.1.6.0/24        155.1.67.6               0    100       0 i
Total number of prefixes 11

R6 is announcing the external prefixes to us but what do we have in our BGP table? Output has been abbreviated.

Rack1SW3#sh ip bgp
BGP table version is 11, local router ID is 150.1.9.9
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
              r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
   Network          Next Hop            Metric LocPrf Weight Path
* i119.0.0.0        204.12.1.254             0    100      0 54 i
* i                         54.1.1.254               0    100      0 54 i

So we do have 119.0.0.0/8 via 204.12.1.254 and 54.1.1.254 but how do we get to the next-hops, remember that route recursion will occur and that the first rule of the BGP best path is that we must have a valid next-hop. We can see that the route is valid but not best.

Rack1SW3#sh ip route 54.1.1.254
% Network not in table

We have an invalid next-hop, so that is why the route is not being installed, lets fix this.

Rack1R6(config)#router eigrp 100
Rack1R6(config-router)#network 54.1.1.0 0.0.0.255
Rack1R4(config)#router eigrp 100
Rack1R4(config-router)#network 204.12.1.0 0.0.0.255

That should take care of the next-hops, lets check the routing table.

Rack1SW3#sh ip route 54.1.1.254
Routing entry for 54.1.1.0/24
  Known via “eigrp 100″, distance 90, metric 2174976, type internal
  Redistributing via eigrp 100
  Last update from 155.1.79.7 on Vlan79, 00:02:04 ago
  Routing Descriptor Blocks:
  * 155.1.79.7, from 155.1.79.7, 00:02:04 ago, via Vlan79
      Route metric is 2174976, traffic share count is 1
      Total delay is 20200 microseconds, minimum bandwidth is 1544 Kbit
      Reliability 255/255, minimum MTU 1500 bytes
      Loading 1/255, Hops 2

We now have a route for the next-hop. Lets look at the BGP table again.

Rack1SW3#sh ip bgp
BGP table version is 31, local router ID is 150.1.9.9
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
              r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
   Network          Next Hop            Metric LocPrf Weight Path
*i119.0.0.0        204.12.1.254             0    100      0 54 i
* i                      54.1.1.254                 0    100      0 54 i

So the path is now have a best path, is it in the routing table?

Rack1SW3#sh ip route bgp
B    119.0.0.0/8 [200/0] via 204.12.1.254, 00:02:27

Route is installed, we should be good to go.

Rack1SW3#ping 119.0.0.1 so lo0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 119.0.0.1, timeout is 2 seconds:
Packet sent with a source address of 150.1.9.9
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/37/84 ms

Success. Always remember to have a valid next-hop in BGP. Next-hops are modified over eBGP peerings but not over iBGP. To resolve this kind of problem either redistribute connected interface to the external peer into IGP or use next-hop-self on iBGP peerings. A route-map can also be used to achieve the same thing. I hope this post has showed you how to do BGP troubleshooting step by step.

BGP added to flash cards

February 11, 2011 2 comments

I have added content for BGP to the flash cards. There is now a total of 112 questions so far. More content in other areas will be added as I go through them in my studies. The file as always is located here.

Categories: Anki, BGP, CCIE Tags: , , ,

BGP troubleshooting – peer address not matching

February 8, 2011 Leave a comment

Yesterday I did some Internetwork Expert vol1 labs on BGP. I was having trouble getting some of the peers to come up and had to troubleshoot. This post will describe how to troubleshoot when peers won’t form. First, lets look at the topology. Thanks to DennisD on IEOC forums for the image.

R2 and R5 should peer with each other in AS 100. R2 is setup to peer with R5′s IP 155.1.45.5 and R5 is setup to peer with R2′s IP 155.1.23.2. It would have been better to peer over the 155.1.0.0/24 subnet directly but this is to show the steps of troubleshooting. So the session will not form, why? Lets look at some output from debug ip tcp transactions.

Rack1R5#*Mar 1 01:31:39.291: TCP: sending SYN, seq 478218125, ack 0
*Mar 1 01:31:39.291: TCP0: Connection to 155.1.23.2:179, advertising MSS 536
*Mar 1 01:31:39.291: TCP0: state was CLOSED -> SYNSENT [56275 -> 155.1.23.2(179)]
*Mar 1 01:31:39.311: Released port 56275 in Transport Port Agent for TCP IP type 1 delay 240000
*Mar 1 01:31:39.311: TCP0: state was SYNSENT -> CLOSED [56275 -> 155.1.23.2(179)]
*Mar 1 01:31:39.311: TCP0: bad seg from 155.1.23.2 — closing connection: port 56275 seq 0 ack 478218126 rcvnxt 0
rcvwnd 0 len 0
*Mar 1 01:31:39.311: TCP0: connection closed – remote sent RST
*Mar 1 01:31:39.311: TCB 0x651784FC destroyed

We can see that R5 is initiating the connection, it is sending a TCP SYN to R2 on port 179 but R2 responds with a TCP RST which resets the connection. This could indicate that either R2 is not running BGP or that their is a problem with the neighbor statements.

So we want to know what IP R5 is using when sending TCP packets to R2. Lets debug IP packets.

Rack1R5(config)#access-list 101 permit tcp any host 155.1.23.2
Rack1R5#debug ip packet 101
*Mar 1 01:36:01.611: IP: tableid=0, s=155.1.0.5 (local), d=155.1.23.2 (Serial0/0), routed via FIB
*Mar 1 01:36:01.615: IP: s=155.1.0.5 (local), d=155.1.23.2 (Serial0/0), len 44, sending

R5 is using its IP of 155.1.0.5 to communicate with 155.1.23.2 but R2 expects R5 to setup the BGP session from the IP of 155.1.45.5. Let’s verify why R5 is using 155.1.0.5 to get to 155.1.23.2. This is a look at the routing table.

Rack1R5#sh ip route 155.1.23.0
Routing entry for 155.1.23.0/24
Known via “eigrp 100″, distance 90, metric 2681856, type internal
Redistributing via eigrp 100
Last update from 155.1.0.3 on Serial0/0, 00:42:06 ago
Routing Descriptor Blocks:
155.1.0.3, from 155.1.0.3, 00:42:06 ago, via Serial0/0
Route metric is 2681856, traffic share count is 1
Total delay is 40000 microseconds, minimum bandwidth is 1544 Kbit
Reliability 255/255, minimum MTU 1500 bytes
Loading 1/255, Hops 1
* 155.1.0.2, from 155.1.0.2, 00:42:06 ago, via Serial0/0
Route metric is 2681856, traffic share count is 1
Total delay is 40000 microseconds, minimum bandwidth is 1544 Kbit
Reliability 255/255, minimum MTU 1500 bytes
Loading 1/255, Hops 1

We can see that R5 has two equal cost paths to reach the IP of R2. The next hop is either 155.1.0.2 or 155.1.0.3 and these are reachable via the connected subnet of Serial0/0. That is why R5 is using the IP of 155.1.0.5 to source packets. How can we solve this? Either we can setup the neighbor statement to point at 155.1.0.5 or we can change the update-source.

Rack1R5(config-router)#neighbor 155.1.23.2 update-source s0/1

A debug IP packet confirms that the right interface is now being used.

*Mar 1 01:36:31.663: IP: tableid=0, s=155.1.45.5 (local), d=155.1.23.2 (Serial0/0), routed via FIB
*Mar 1 01:36:31.663: IP: s=155.1.45.5 (local), d=155.1.23.2 (Serial0/0), len 44, sending

Show ip bgp confirms that they are now peers.

Neighbor   V AS    MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
155.1.23.2 4 100  5                  5        9          0      0     00:00:05            1

Show tcp brief is a good command to see TCP sessions to/from the router.

Rack1R5#show tcp brief
TCB Local Address Foreign Address (state)
651791FC 155.1.45.5.26655 155.1.23.2.179 ESTAB

And this is how to do basic BGP troubleshooting.

Border Gateway Protocol (BGP) – notes

November 22, 2010 3 comments

  • Uses TCP as transport, port 179
  • Path vector protocol

Checks before becoming a neighbor

  • The TCP connection request must come from an IP associated with a neighbor command
  • The AS number must match that in the neighbore statement
  • The routers can not have duplicate router IDs
  • If authentication is configured it must also match

Timers

Uses a keepalive and hold timer, defaults to 60 and 180 seconds.

BGP neighbor states

Idle  -  BGP not initiated yet
Connect  - Listening for TCP
Active  - Initiate TCP
Open sent -  Open sent, TCP is up
Open confirm - Open receivec, TCP is up
Established - Peering has been established

BGP message types

Open  - Used to establish neighbor session and exchange parameters
Keepalive - Used to maintain the neighbor relationship
Update  - Used to exchange routing information
Notification - Used when BGP errors occur, resets neighbor session

Confederations

  • Uses a sub ASN, real AS divided into smaller sections where each section has an private ASN
  • The range is from 64512 to 65535
  • Every sub-AS has to be fully meshed internally and uses iBGP logic
  • Connections between different sub AS acts as an EBGP connection
  • Confederation ASNs is not considered when deciding the AS-path length
  • Painful to migrate since it requires to change AS number in router bgp command
  • Real AS identified with bgp confederation identifier
  • Peers defined with bgp confederation peers
  • Confederation AS numbers in AS-path will be removed before advertising to true eBGP peer

Route reflectors

  • Removes the need for full mesh, all iBGP routers peer with route reflector
  • RR responsible for reflecting routes to clients, RR is usually not in forwarding path
  • No change is needed on clients to implement RR
  • The RR and its clients create a cluster, it is possible to have multiple RRs in a cluster
  • Route reflectors in different clusters should be fully meshed

To ensure no loops in this topology BGP needs two new attributes:

Cluster_list - Route reflectors add their cluster ID to this attribute before sending an update.   Updates with same cluster ID as local RR will be discarded.

Originator_ID - The ID of the router that originated the prefix. If a router sees its own ID in this  attribute it will not use or propagate this prefix.

BGP PA

AS_PATH   - Lists ASNs trough which the route has been advertised  -  Well known Mandatory
NEXT_HOP  - Lists the next-hop IP address used to reach the NLRI -  Well known Mandatory
AGGREGATOR  - Lists the RID and ASN of the router that created a summary NLRI - Optional Transitive
ATOMIC_AGGREGATE - Tags a summary NLRI as being a summary -  Well known Discretionary
ORIGIN  - The origin of the route, igp, egp or incomplete - Well known Mandatory
ORIGINATOR_ID  - The RID of the iBGP neighbor that injected a NLRI into the AS -  Optional Nontransitive
CLUSTER_LIST  - Used by RRs to lister the RR cluster IDs in order to prevent loops - Optional Nontransitive

Injecting routes into BGP

Done via network command or redistribute from an IGP or static routes.

Injecting a default route into BGP

Use the network 0.0.0.0 command - Requires that 0.0.0.0 exists in routing table
neighbor default-originate - Always advertise default route even if not present in local routing table
default-information originate - Requires route in routing table and a redistribute command

BGP best path algorithm

0. Discard routes with invalid next-hop
1. Routes with highest weight (Cisco proprietary)
2. Routes with highest local preference
3. Routes locally injected
4  Routes with shortest AS-path
5. Routes with best origin
6. Routes with lowest Multiple Exit Discriminator (MED)
7. Prefer eBGP over iBGP (confederation eBGP treated as iBGP)
8. Routes with lowest metric to next-hop

Categories: BGP, CCIE, Notes Tags: , ,
Follow

Get every new post delivered to your Inbox.

Join 557 other followers