Archive
Some important details of BGP
We start out with a basic topopology of 3 routers.
R2 and R3 will peer to each others loopback. I have setup OSPF for full reachability
in the network. First we test connectivity.
R2#ping 3.3.3.3 so lo0 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 3.3.3.3, timeout is 2 seconds: Packet sent with a source address of 2.2.2.2 !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 40/53/80 ms
There is connectivity. We setup the peering and set ebgp-multihop to 2 since
this is what most people do. I will explain why this is not a good idea.
R2(config)#router bgp 1 R2(config-router)#nei 3.3.3.3 remote-as 3 R2(config-router)#nei 3.3.3.3 update-source loopback 0 R2(config-router)#nei 3.3.3.3 ebgp-multihop 2
R3(config)#router bgp 3 R3(config-router)#nei 2.2.2.2 remote-as 1 R3(config-router)#nei 2.2.2.2 update-source loopback 0 R3(config-router)#nei 2.2.2.2 ebgp-multihop 2
The session comes up.
%BGP-5-ADJCHANGE: neighbor 2.2.2.2 Up
All good so far. We are not advertising anything yet. We add another loopback
on R3 and advertise that into BGP. We check if R2 is receiving it.
R2#sh bgp ipv4 uni
BGP table version is 3, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 33.33.33.33/32 3.3.3.3 0 0 3 i
It looks good so far. Now lets think for a while what ebgp-multihop
actually does. The default setting for eBGP is to check that incoming BGP
packets are destined for a directly connected interface. So the default is
to do a connected-check and ebgp-multihop = 1. When we set ebgp-multihop 2
the outgoing TTL is set to 2 and the connected-check is disabled. We confirm
this with a packet capture.
So the TTL is set to 2, is this really necessary? The common argument is that
because we are peering to a loopback the TTL must be set to 2 because the
TTL is decremented before reaching the loopback. When do routers modify packets
before transmitting them? On the egress interface right? We try this theory by
setting up a peering between R1 and R3. We will use no ebgp-multihop to begin
with and then we will debug ip icmp. We have to disable the connected-check
otherwise BGP will only stay idle because a loopback can never be directly
connected.
R1(config-router)#nei 3.3.3.3 remote-as 3 R1(config-router)#nei 3.3.3.3 update-source lo0 R1(config-router)#nei 3.3.3.3 disable-connected-check
R3(config-router)#nei 1.1.1.1 remote-as 1 R3(config-router)#nei 1.1.1.1 update lo0 R3(config-router)#nei 1.1.1.1 disable-connected-check
We can now see that R2 is sending ICMP time exceeded message to R1 and R3.
R1: ICMP: time exceeded rcvd from 12.12.12.2 R3: ICMP: time exceeded rcvd from 23.23.23.2
This is because the TTL was set to 1. The TTL expired while in transit.
Now we setup a peering between R1 and R2 using the loopbacks. We will disable
the connected-check.
R1(config-router)#nei 2.2.2.2 remote-as 1 R1(config-router)#nei 2.2.2.2 update lo0 R1(config-router)#nei 2.2.2.2 disable-connected-check
R2(config-router)#nei 1.1.1.1 remote-as 1 R2(config-router)#nei 1.1.1.1 update lo0 R2(config-router)#nei 1.1.1.1 disable-connected-check
Now according to the people that say that TTL must be 2 for peering to come up
we will prove that this is wrong. The reason peering does not come up when using
loopbacks is that BGP is checking if it is directly connected or not. We take a
look at a BGP packet sent when using the disable-connected-check.
We clearly see that the TTL is 1 but the session still comes up. This proves
that is is not TTL that is expiring when peering to loopbacks!
R1#sh bgp all sum For address family: IPv4 Unicast BGP router identifier 1.1.1.1, local AS number 1 BGP table version is 9, main routing table version 9 2 network entries using 240 bytes of memory 2 path entries using 104 bytes of memory 3/2 BGP path/bestpath attribute entries using 372 bytes of memory 1 BGP AS-PATH entries using 24 bytes of memory 0 BGP route-map cache entries using 0 bytes of memory 0 BGP filter-list cache entries using 0 bytes of memory Bitfield cache entries: current 1 (at peak 2) using 32 bytes of memory BGP using 772 total bytes of memory BGP activity 5/3 prefixes, 5/3 paths, scan interval 60 secs Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 2.2.2.2 4 2 83 80 9 0 0 00:02:45 1
Finally I want to bring up another disadvantage of using the ebgp-multihop
command when peering between directly connected routers using loopbacks.
We have a peering between R2 and R3. What happens when we shutdown the
interface on either router?
R2(config-router)#int f1/0
R2(config-if)#sh
R2(config-if)#
%OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Interface down or detached
R2(config-if)#
%LINK-5-CHANGED: Interface FastEthernet1/0, changed state to administratively down
%LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet1/0, changed state to down
R2(config-if)#do sh bgp ipv4 uni
BGP table version is 11, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
* 33.33.33.33/32 3.3.3.3 0 0 3 i
When we shutdown the interface the peering still stays up. This is because when using
ebgp-multihop the fast-external-fallover feature can not be used at the same time. This could
lead to blackholes since the peering stays up until the hold time expires (180s). In our
case we have no valid next-hop but what if we put in a default route?
R2(config)#ip route 0.0.0.0 0.0.0.0 12.12.12.1
R2(config)#int f1/0
R2(config-if)#sh
R2(config-if)#do
%OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Interface down or detached
R2(config-if)#do
%LINK-5-CHANGED: Interface FastEthernet1/0, changed state to administratively down
%LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet1/0, changed state to down
R2(config-if)#do sh bgp ipv4 uni
BGP table version is 12, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 33.33.33.33/32 3.3.3.3 0 0 3 i
R2(config-if)#do sh bgp ipv4 uni
BGP table version is 12, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 33.33.33.33/32 3.3.3.3 0 0 3 i
Now the route stays in the BGP table until the holdtime expires which creates a
black hole. The default route is now functioning to make sure there is a next-hop
available.
By this post I hope you have got a better understanding of these BGP features
and how a router handles control plane packets. As usual post in comments
if you have any feedback or questions.
Conditional BGP, showing status
This is some notes for the post that I did on conditional BGP advertisement.
I found an easier way of seeing the status for the advertise-map. It is
available by looking at the neighbor status, look at the following output.
Rack7R3#show bgp ipv4 uni nei 155.7.37.7 | i Condition
Condition-map NON_EXIST, Advertise-map ADVERTISE, status: Advertise
The prefix is currently being advertised. Then we bring up the interface.
Rack7R3(config)#int s1/2
Rack7R3(config-if)#no sh
Rack7R3#show bgp ipv4 uni nei 155.7.37.7 | i Condition
Condition-map NON_EXIST, Advertise-map ADVERTISE, status: Withdraw
The prefix is no longer advertised. Remember that the BGP scanner runs every
60 seconds by default so it may take some time before we see results.
Conditional BGP advertisement with advertise-map
This post will describe how to do conditional advertising with BGP. In a real life scenario this can be used to only announce routes to your backup provider when your primary link is down. In a lab scenario this can be used when you are faced with a scenario that says you have to make sure that traffic comes in on interface X/X but if that interface fails it should come in on interface Y/Y. The image below describes the scenario.
We start by putting addresses on interfaces and enable basic BGP. The loopbacks on the Cust router are used for announcing networks.
Cust:
interface Loopback1
ip address 1.1.1.1 255.255.255.0
!
interface Loopback2
ip address 2.2.2.2 255.255.255.0
!
interface Loopback3
ip address 3.3.3.3 255.255.255.0
!
interface Loopback4
ip address 4.4.4.4 255.255.255.0
!
interface FastEthernet0/0
ip address 136.1.13.3 255.255.255.0
no shut
!
interface Serial0/0
ip address 136.1.23.3 255.255.255.0
clock rate 2000000
no shut
!
router bgp 300
no synchronization
bgp log-neighbor-changes
network 1.1.1.0 mask 255.255.255.0
network 2.2.2.0 mask 255.255.255.0
network 3.3.3.0 mask 255.255.255.0
network 4.4.4.0 mask 255.255.255.0
network 136.1.13.0 mask 255.255.255.0
neighbor 136.1.13.1 remote-as 100
neighbor 136.1.23.2 remote-as 200
no auto-summary
ISP1:
interface FastEthernet0/0
ip address 136.1.12.1 255.255.255.0
no shut
!
interface FastEthernet0/1
ip address 136.1.13.1 255.255.255.0
no shut
!
router bgp 100
no synchronization
bgp log-neighbor-changes
network 136.1.12.0 mask 255.255.255.0
neighbor 136.1.12.2 remote-as 200
neighbor 136.1.13.3 remote-as 300
no auto-summary
ISP2:
interface FastEthernet0/0
ip address 136.1.12.2 255.255.255.0
no shut
!
interface Serial0/0
ip address 136.1.23.2 255.255.255.0
clock rate 2000000
no shut
!
router bgp 200
no synchronization
bgp log-neighbor-changes
neighbor 136.1.12.1 remote-as 100
neighbor 136.1.23.3 remote-as 300
no auto-summary
If we look at ISP2 we have two active BGP session with four prefixes over each.
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
136.1.12.1 4 100 6 7 9 0 0 00:01:47 4
136.1.23.3 4 300 5 6 9 0 0 00:00:33 4
ISP2#sh ip bgp
BGP table version is 9, local router ID is 136.1.23.2
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 136.1.23.3 0 0 300 i
* 136.1.12.1 0 100 300 i
*> 2.2.2.0/24 136.1.23.3 0 0 300 i
* 136.1.12.1 0 100 300 i
*> 3.3.3.0/24 136.1.23.3 0 0 300 i
* 136.1.12.1 0 100 300 i
*> 4.4.4.0/24 136.1.23.3 0 0 300 i
* 136.1.12.1 0 100 300 i
Lets do a ping and traceroute to verify reachability first.
ISP2#ping 1.1.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/38/88 ms
ISP2#trace
ISP2#traceroute 1.1.1.1
Type escape sequence to abort.
Tracing the route to 1.1.1.1
1 136.1.23.3 44 msec * 84 msec
We have reachability. The next step is to announce the Ethernet link on the
cust router into BGP. We need this prefix in BGP to be able to track it.
Cust(config)#router bgp 300
Cust(config-router)#network 136.1.13.0 mask 255.255.255.0
ISP will see this prefix as a RIB-failure since it has a route with better AD (connected).
We then configure the Cust router to only advertise 1.1.1.0/24 if the Ethernet link is down.
Cust#conf t
Enter configuration commands, one per line. End with CNTL/Z.
Cust(config)#ip prefix-list 1-NETWORK seq 5 permit 1.1.1.0/24
Cust(config)#ip prefix-list 13-NETWORK seq 5 permit 136.1.13.0/24
Cust(config)#route-map ADVERTISE permit 10
Cust(config-route-map)#match ip address prefix-list 1-NETWORK
Cust(config-route-map)#exit
Cust(config)#route-map NON_EXIST permit 10
Cust(config-route-map)#match ip address prefix-list 13-NETWORK
Cust(config-route-map)#exit
Cust(config)#router bgp 300
Cust(config-router)#neighbor 136.1.13.1 advertise-map ADVERTISE non-exist-map NON_EXIST
Cust(config-router)#^Z
The advertise-map permits prefixes to be announced when the prefixes in the NON_EXIST map are not in the BGP table.
Other prefixes will not be affected by this configuration. Lets look at what Cust is announcing to ISP2.
Cust#sh bgp ipv4 uni nei 136.1.23.2 advertised-routes
BGP table version is 7, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
Network Next Hop Metric LocPrf Weight Path
*> 2.2.2.0/24 0.0.0.0 0 32768 i
*> 3.3.3.0/24 0.0.0.0 0 32768 i
*> 4.4.4.0/24 0.0.0.0 0 32768 i
*> 136.1.13.0/24 0.0.0.0 0 32768 i
Total number of prefixes 4
We can see that 1.1.1.0/24 is no longer being announced. Ping from ISP2 confirms reachability and a traceroute shows that traffic is passing through ISP1.
ISP2#ping 1.1.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 44/76/108 ms
ISP2#traceroute 1.1.1.1
Type escape sequence to abort.
Tracing the route to 1.1.1.1
1 136.1.12.1 100 msec 96 msec 44 msec
2 136.1.13.3 [AS 300] 92 msec * 116 msec
We then do a shutdown of the Ethernet link on Cust and look at the results.
Cust#conf t
Enter configuration commands, one per line. End with CNTL/Z.
Cust(config)#int f0/0
Cust(config-if)#sh
Cust(config-if)#
*Mar 1 00:27:22.007: %BGP-5-ADJCHANGE: neighbor 136.1.13.1 Down Interface flap
Cust(config-if)#
*Mar 1 00:27:23.983: %LINK-5-CHANGED: Interface FastEthernet0/0, changed state to administratively down
*Mar 1 00:27:24.983: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/0, changed state to down
Cust(config-if)#
Cust#sh bgp ipv4 uni nei 136.1.23.2 advertised-routes
BGP table version is 12, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 0.0.0.0 0 32768 i
*> 2.2.2.0/24 0.0.0.0 0 32768 i
*> 3.3.3.0/24 0.0.0.0 0 32768 i
*> 4.4.4.0/24 0.0.0.0 0 32768 i
Total number of prefixes 4
BGP table on ISP2. Ping working and traffic now going the direct path.
ISP2#sh ip bgp
BGP table version is 15, local router ID is 136.1.23.2
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 136.1.23.3 0 0 300 i
* 136.1.12.1 0 100 300 i
*> 2.2.2.0/24 136.1.23.3 0 0 300 i
* 136.1.12.1 0 100 300 i
*> 3.3.3.0/24 136.1.23.3 0 0 300 i
* 136.1.12.1 0 100 300 i
*> 4.4.4.0/24 136.1.23.3 0 0 300 i
* 136.1.12.1 0 100 300 i
r> 136.1.12.0/24 136.1.12.1 0 0 100 i
*> 136.1.13.0/24 136.1.12.1 0 100 300 i
ISP2#ping 1.1.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/40/80 ms
ISP2#tra
ISP2#traceroute 1.1.1.1
Type escape sequence to abort.
Tracing the route to 1.1.1.1
1 136.1.23.3 92 msec * 24 msec
If we debug BGP updates we will se entries like this.
*Mar 1 01:05:03.067: BPG(0): Condition NON_EXIST changes to Withdraw
*Mar 1 01:05:03.067: BPG(0): Condition NON_EXIST changes to Withdraw
*Mar 1 01:06:03.079: BPG(0): Condition NON_EXIST changes to Advertise
*Mar 1 01:06:03.079: BPG(0): Condition NON_EXIST changes to Advertise
BGP troubleshooting – route not installed
Sometimes prefixes in BGP do not get installed into the routing table, if the route is also in an IGP that might be a reason but then a RIB-failure would be indicated. This scenario shows another possible source of problems. Once again, the topology is this.
All internal routers are running iBGP in a full mesh. Routers R4 and R6 have eBGP peerings to the backbone routers which are injecting external prefixes into the AS. All internal routers are announcing their loopbacks into BGP. SW3 is trying to reach 119.0.0.1 in the prefix 119.0.0.0/8 but is unable to do so, lets look at some output.
Rack1SW3#ping 119.0.0.1 so lo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 119.0.0.1, timeout is 2 seconds:
Packet sent with a source address of 150.1.9.9
…..
Success rate is 0 percent (0/5)
SW3 can’t reach 119.0.0.1, why?
Rack1SW3#sh ip route 119.0.0.0
% Network not in table
We have no route there, what routes can we see from BGP?
Rack1SW3#sh ip route bgp
150.1.0.0/24 is subnetted, 10 subnets
B 150.1.7.0 [200/0] via 155.1.79.7, 01:26:58
B 150.1.6.0 [200/0] via 155.1.67.6, 01:26:58
B 150.1.5.0 [200/0] via 155.1.45.5, 01:26:58
B 150.1.4.0 [200/0] via 155.1.146.4, 01:26:58
B 150.1.3.0 [200/0] via 155.1.37.3, 01:26:58
B 150.1.2.0 [200/0] via 155.1.23.2, 01:26:58
B 150.1.1.0 [200/0] via 155.1.146.1, 01:26:58
B 150.1.10.0 [200/0] via 155.1.108.10, 01:26:45
B 150.1.8.0 [200/0] via 155.1.58.8, 01:26:45
We can see all the loopbacks just fine but we have no route to the external prefixes. What is R6 announcing to us?
Rack1SW3# sh ip bgp nei 155.1.67.6 routes
BGP table version is 11, local router ID is 150.1.9.9
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
Network Next Hop Metric LocPrf Weight Path
* i28.119.16.0/24 54.1.1.254 0 100 0 54 i
* i28.119.17.0/24 54.1.1.254 0 100 0 54 i
* i112.0.0.0 54.1.1.254 0 100 0 54 50 60 i
* i113.0.0.0 54.1.1.254 0 100 0 54 50 60 i
* i114.0.0.0 54.1.1.254 0 100 0 54 i
* i115.0.0.0 54.1.1.254 0 100 0 54 i
* i116.0.0.0 54.1.1.254 0 100 0 54 i
* i117.0.0.0 54.1.1.254 0 100 0 54 i
* i118.0.0.0 54.1.1.254 0 100 0 54 i
* i119.0.0.0 54.1.1.254 0 100 0 54 i
*150.1.6.0/24 155.1.67.6 0 100 0 i
Total number of prefixes 11
R6 is announcing the external prefixes to us but what do we have in our BGP table? Output has been abbreviated.
Rack1SW3#sh ip bgp
BGP table version is 11, local router ID is 150.1.9.9
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
Network Next Hop Metric LocPrf Weight Path
* i119.0.0.0 204.12.1.254 0 100 0 54 i
* i 54.1.1.254 0 100 0 54 i
So we do have 119.0.0.0/8 via 204.12.1.254 and 54.1.1.254 but how do we get to the next-hops, remember that route recursion will occur and that the first rule of the BGP best path is that we must have a valid next-hop. We can see that the route is valid but not best.
Rack1SW3#sh ip route 54.1.1.254
% Network not in table
We have an invalid next-hop, so that is why the route is not being installed, lets fix this.
Rack1R6(config)#router eigrp 100
Rack1R6(config-router)#network 54.1.1.0 0.0.0.255
Rack1R4(config)#router eigrp 100
Rack1R4(config-router)#network 204.12.1.0 0.0.0.255
That should take care of the next-hops, lets check the routing table.
Rack1SW3#sh ip route 54.1.1.254
Routing entry for 54.1.1.0/24
Known via “eigrp 100″, distance 90, metric 2174976, type internal
Redistributing via eigrp 100
Last update from 155.1.79.7 on Vlan79, 00:02:04 ago
Routing Descriptor Blocks:
* 155.1.79.7, from 155.1.79.7, 00:02:04 ago, via Vlan79
Route metric is 2174976, traffic share count is 1
Total delay is 20200 microseconds, minimum bandwidth is 1544 Kbit
Reliability 255/255, minimum MTU 1500 bytes
Loading 1/255, Hops 2
We now have a route for the next-hop. Lets look at the BGP table again.
Rack1SW3#sh ip bgp
BGP table version is 31, local router ID is 150.1.9.9
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete
Network Next Hop Metric LocPrf Weight Path
*i119.0.0.0 204.12.1.254 0 100 0 54 i
* i 54.1.1.254 0 100 0 54 i
So the path is now have a best path, is it in the routing table?
Rack1SW3#sh ip route bgp
B 119.0.0.0/8 [200/0] via 204.12.1.254, 00:02:27
Route is installed, we should be good to go.
Rack1SW3#ping 119.0.0.1 so lo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 119.0.0.1, timeout is 2 seconds:
Packet sent with a source address of 150.1.9.9
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/37/84 ms
Success. Always remember to have a valid next-hop in BGP. Next-hops are modified over eBGP peerings but not over iBGP. To resolve this kind of problem either redistribute connected interface to the external peer into IGP or use next-hop-self on iBGP peerings. A route-map can also be used to achieve the same thing. I hope this post has showed you how to do BGP troubleshooting step by step.
BGP added to flash cards
I have added content for BGP to the flash cards. There is now a total of 112 questions so far. More content in other areas will be added as I go through them in my studies. The file as always is located here.
BGP troubleshooting – peer address not matching
Yesterday I did some Internetwork Expert vol1 labs on BGP. I was having trouble getting some of the peers to come up and had to troubleshoot. This post will describe how to troubleshoot when peers won’t form. First, lets look at the topology. Thanks to DennisD on IEOC forums for the image.
R2 and R5 should peer with each other in AS 100. R2 is setup to peer with R5′s IP 155.1.45.5 and R5 is setup to peer with R2′s IP 155.1.23.2. It would have been better to peer over the 155.1.0.0/24 subnet directly but this is to show the steps of troubleshooting. So the session will not form, why? Lets look at some output from debug ip tcp transactions.
Rack1R5#*Mar 1 01:31:39.291: TCP: sending SYN, seq 478218125, ack 0
*Mar 1 01:31:39.291: TCP0: Connection to 155.1.23.2:179, advertising MSS 536
*Mar 1 01:31:39.291: TCP0: state was CLOSED -> SYNSENT [56275 -> 155.1.23.2(179)]
*Mar 1 01:31:39.311: Released port 56275 in Transport Port Agent for TCP IP type 1 delay 240000
*Mar 1 01:31:39.311: TCP0: state was SYNSENT -> CLOSED [56275 -> 155.1.23.2(179)]
*Mar 1 01:31:39.311: TCP0: bad seg from 155.1.23.2 — closing connection: port 56275 seq 0 ack 478218126 rcvnxt 0
rcvwnd 0 len 0
*Mar 1 01:31:39.311: TCP0: connection closed – remote sent RST
*Mar 1 01:31:39.311: TCB 0x651784FC destroyed
We can see that R5 is initiating the connection, it is sending a TCP SYN to R2 on port 179 but R2 responds with a TCP RST which resets the connection. This could indicate that either R2 is not running BGP or that their is a problem with the neighbor statements.
So we want to know what IP R5 is using when sending TCP packets to R2. Lets debug IP packets.
Rack1R5(config)#access-list 101 permit tcp any host 155.1.23.2
Rack1R5#debug ip packet 101
*Mar 1 01:36:01.611: IP: tableid=0, s=155.1.0.5 (local), d=155.1.23.2 (Serial0/0), routed via FIB
*Mar 1 01:36:01.615: IP: s=155.1.0.5 (local), d=155.1.23.2 (Serial0/0), len 44, sending
R5 is using its IP of 155.1.0.5 to communicate with 155.1.23.2 but R2 expects R5 to setup the BGP session from the IP of 155.1.45.5. Let’s verify why R5 is using 155.1.0.5 to get to 155.1.23.2. This is a look at the routing table.
Rack1R5#sh ip route 155.1.23.0
Routing entry for 155.1.23.0/24
Known via “eigrp 100″, distance 90, metric 2681856, type internal
Redistributing via eigrp 100
Last update from 155.1.0.3 on Serial0/0, 00:42:06 ago
Routing Descriptor Blocks:
155.1.0.3, from 155.1.0.3, 00:42:06 ago, via Serial0/0
Route metric is 2681856, traffic share count is 1
Total delay is 40000 microseconds, minimum bandwidth is 1544 Kbit
Reliability 255/255, minimum MTU 1500 bytes
Loading 1/255, Hops 1
* 155.1.0.2, from 155.1.0.2, 00:42:06 ago, via Serial0/0
Route metric is 2681856, traffic share count is 1
Total delay is 40000 microseconds, minimum bandwidth is 1544 Kbit
Reliability 255/255, minimum MTU 1500 bytes
Loading 1/255, Hops 1
We can see that R5 has two equal cost paths to reach the IP of R2. The next hop is either 155.1.0.2 or 155.1.0.3 and these are reachable via the connected subnet of Serial0/0. That is why R5 is using the IP of 155.1.0.5 to source packets. How can we solve this? Either we can setup the neighbor statement to point at 155.1.0.5 or we can change the update-source.
Rack1R5(config-router)#neighbor 155.1.23.2 update-source s0/1
A debug IP packet confirms that the right interface is now being used.
*Mar 1 01:36:31.663: IP: tableid=0, s=155.1.45.5 (local), d=155.1.23.2 (Serial0/0), routed via FIB
*Mar 1 01:36:31.663: IP: s=155.1.45.5 (local), d=155.1.23.2 (Serial0/0), len 44, sending
Show ip bgp confirms that they are now peers.
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
155.1.23.2 4 100 5 5 9 0 0 00:00:05 1
Show tcp brief is a good command to see TCP sessions to/from the router.
Rack1R5#show tcp brief
TCB Local Address Foreign Address (state)
651791FC 155.1.45.5.26655 155.1.23.2.179 ESTAB
And this is how to do basic BGP troubleshooting.
Border Gateway Protocol (BGP) – notes
- Uses TCP as transport, port 179
- Path vector protocol
Checks before becoming a neighbor
- The TCP connection request must come from an IP associated with a neighbor command
- The AS number must match that in the neighbore statement
- The routers can not have duplicate router IDs
- If authentication is configured it must also match
Timers
Uses a keepalive and hold timer, defaults to 60 and 180 seconds.
BGP neighbor states
Idle - BGP not initiated yet
Connect - Listening for TCP
Active - Initiate TCP
Open sent - Open sent, TCP is up
Open confirm - Open receivec, TCP is up
Established - Peering has been established
BGP message types
Open - Used to establish neighbor session and exchange parameters
Keepalive - Used to maintain the neighbor relationship
Update - Used to exchange routing information
Notification - Used when BGP errors occur, resets neighbor session
Confederations
- Uses a sub ASN, real AS divided into smaller sections where each section has an private ASN
- The range is from 64512 to 65535
- Every sub-AS has to be fully meshed internally and uses iBGP logic
- Connections between different sub AS acts as an EBGP connection
- Confederation ASNs is not considered when deciding the AS-path length
- Painful to migrate since it requires to change AS number in router bgp command
- Real AS identified with bgp confederation identifier
- Peers defined with bgp confederation peers
- Confederation AS numbers in AS-path will be removed before advertising to true eBGP peer
Route reflectors
- Removes the need for full mesh, all iBGP routers peer with route reflector
- RR responsible for reflecting routes to clients, RR is usually not in forwarding path
- No change is needed on clients to implement RR
- The RR and its clients create a cluster, it is possible to have multiple RRs in a cluster
- Route reflectors in different clusters should be fully meshed
To ensure no loops in this topology BGP needs two new attributes:
Cluster_list - Route reflectors add their cluster ID to this attribute before sending an update. Updates with same cluster ID as local RR will be discarded.
Originator_ID - The ID of the router that originated the prefix. If a router sees its own ID in this attribute it will not use or propagate this prefix.
BGP PA
AS_PATH - Lists ASNs trough which the route has been advertised - Well known Mandatory
NEXT_HOP - Lists the next-hop IP address used to reach the NLRI - Well known Mandatory
AGGREGATOR - Lists the RID and ASN of the router that created a summary NLRI - Optional Transitive
ATOMIC_AGGREGATE - Tags a summary NLRI as being a summary - Well known Discretionary
ORIGIN - The origin of the route, igp, egp or incomplete - Well known Mandatory
ORIGINATOR_ID - The RID of the iBGP neighbor that injected a NLRI into the AS - Optional Nontransitive
CLUSTER_LIST - Used by RRs to lister the RR cluster IDs in order to prevent loops - Optional Nontransitive
Injecting routes into BGP
Done via network command or redistribute from an IGP or static routes.
Injecting a default route into BGP
Use the network 0.0.0.0 command - Requires that 0.0.0.0 exists in routing table
neighbor default-originate - Always advertise default route even if not present in local routing table
default-information originate - Requires route in routing table and a redistribute command
BGP best path algorithm
0. Discard routes with invalid next-hop
1. Routes with highest weight (Cisco proprietary)
2. Routes with highest local preference
3. Routes locally injected
4 Routes with shortest AS-path
5. Routes with best origin
6. Routes with lowest Multiple Exit Discriminator (MED)
7. Prefer eBGP over iBGP (confederation eBGP treated as iBGP)
8. Routes with lowest metric to next-hop





