We start out with a basic topopology of 3 routers.
R2 and R3 will peer to each others loopback. I have setup OSPF for full reachability
in the network. First we test connectivity.
R2#ping 184.108.40.206 so lo0 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 220.127.116.11, timeout is 2 seconds: Packet sent with a source address of 18.104.22.168 !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 40/53/80 ms
There is connectivity. We setup the peering and set ebgp-multihop to 2 since
this is what most people do. I will explain why this is not a good idea.
R2(config)#router bgp 1 R2(config-router)#nei 22.214.171.124 remote-as 3 R2(config-router)#nei 126.96.36.199 update-source loopback 0 R2(config-router)#nei 188.8.131.52 ebgp-multihop 2
R3(config)#router bgp 3 R3(config-router)#nei 184.108.40.206 remote-as 1 R3(config-router)#nei 220.127.116.11 update-source loopback 0 R3(config-router)#nei 18.104.22.168 ebgp-multihop 2
The session comes up.
%BGP-5-ADJCHANGE: neighbor 22.214.171.124 Up
All good so far. We are not advertising anything yet. We add another loopback
on R3 and advertise that into BGP. We check if R2 is receiving it.
R2#sh bgp ipv4 uni BGP table version is 3, local router ID is 126.96.36.199 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> 188.8.131.52/32 184.108.40.206 0 0 3 i
It looks good so far. Now lets think for a while what ebgp-multihop
actually does. The default setting for eBGP is to check that incoming BGP
packets are destined for a directly connected interface. So the default is
to do a connected-check and ebgp-multihop = 1. When we set ebgp-multihop 2
the outgoing TTL is set to 2 and the connected-check is disabled. We confirm
this with a packet capture.
So the TTL is set to 2, is this really necessary? The common argument is that
because we are peering to a loopback the TTL must be set to 2 because the
TTL is decremented before reaching the loopback. When do routers modify packets
before transmitting them? On the egress interface right? We try this theory by
setting up a peering between R1 and R3. We will use no ebgp-multihop to begin
with and then we will debug ip icmp. We have to disable the connected-check
otherwise BGP will only stay idle because a loopback can never be directly
R1(config-router)#nei 220.127.116.11 remote-as 3 R1(config-router)#nei 18.104.22.168 update-source lo0 R1(config-router)#nei 22.214.171.124 disable-connected-check
R3(config-router)#nei 126.96.36.199 remote-as 1 R3(config-router)#nei 188.8.131.52 update lo0 R3(config-router)#nei 184.108.40.206 disable-connected-check
We can now see that R2 is sending ICMP time exceeded message to R1 and R3.
R1: ICMP: time exceeded rcvd from 220.127.116.11 R3: ICMP: time exceeded rcvd from 18.104.22.168
This is because the TTL was set to 1. The TTL expired while in transit.
Now we setup a peering between R1 and R2 using the loopbacks. We will disable
R1(config-router)#nei 22.214.171.124 remote-as 1 R1(config-router)#nei 126.96.36.199 update lo0 R1(config-router)#nei 188.8.131.52 disable-connected-check
R2(config-router)#nei 184.108.40.206 remote-as 1 R2(config-router)#nei 220.127.116.11 update lo0 R2(config-router)#nei 18.104.22.168 disable-connected-check
Now according to the people that say that TTL must be 2 for peering to come up
we will prove that this is wrong. The reason peering does not come up when using
loopbacks is that BGP is checking if it is directly connected or not. We take a
look at a BGP packet sent when using the disable-connected-check.
We clearly see that the TTL is 1 but the session still comes up. This proves
that is is not TTL that is expiring when peering to loopbacks!
R1#sh bgp all sum For address family: IPv4 Unicast BGP router identifier 22.214.171.124, local AS number 1 BGP table version is 9, main routing table version 9 2 network entries using 240 bytes of memory 2 path entries using 104 bytes of memory 3/2 BGP path/bestpath attribute entries using 372 bytes of memory 1 BGP AS-PATH entries using 24 bytes of memory 0 BGP route-map cache entries using 0 bytes of memory 0 BGP filter-list cache entries using 0 bytes of memory Bitfield cache entries: current 1 (at peak 2) using 32 bytes of memory BGP using 772 total bytes of memory BGP activity 5/3 prefixes, 5/3 paths, scan interval 60 secs Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 126.96.36.199 4 2 83 80 9 0 0 00:02:45 1
Finally I want to bring up another disadvantage of using the ebgp-multihop
command when peering between directly connected routers using loopbacks.
We have a peering between R2 and R3. What happens when we shutdown the
interface on either router?
R2(config-router)#int f1/0 R2(config-if)#sh R2(config-if)# %OSPF-5-ADJCHG: Process 1, Nbr 188.8.131.52 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Interface down or detached R2(config-if)# %LINK-5-CHANGED: Interface FastEthernet1/0, changed state to administratively down %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet1/0, changed state to down R2(config-if)#do sh bgp ipv4 uni BGP table version is 11, local router ID is 184.108.40.206 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path * 220.127.116.11/32 18.104.22.168 0 0 3 i
When we shutdown the interface the peering still stays up. This is because when using
ebgp-multihop the fast-external-fallover feature can not be used at the same time. This could
lead to blackholes since the peering stays up until the hold time expires (180s). In our
case we have no valid next-hop but what if we put in a default route?
R2(config)#ip route 0.0.0.0 0.0.0.0 22.214.171.124 R2(config)#int f1/0 R2(config-if)#sh R2(config-if)#do %OSPF-5-ADJCHG: Process 1, Nbr 126.96.36.199 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Interface down or detached R2(config-if)#do %LINK-5-CHANGED: Interface FastEthernet1/0, changed state to administratively down %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet1/0, changed state to down R2(config-if)#do sh bgp ipv4 uni BGP table version is 12, local router ID is 188.8.131.52 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> 184.108.40.206/32 220.127.116.11 0 0 3 i R2(config-if)#do sh bgp ipv4 uni BGP table version is 12, local router ID is 18.104.22.168 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> 22.214.171.124/32 126.96.36.199 0 0 3 i
Now the route stays in the BGP table until the holdtime expires which creates a
black hole. The default route is now functioning to make sure there is a next-hop
By this post I hope you have got a better understanding of these BGP features
and how a router handles control plane packets. As usual post in comments
if you have any feedback or questions.
This is some notes for the post that I did on conditional BGP advertisement.
I found an easier way of seeing the status for the advertise-map. It is
available by looking at the neighbor status, look at the following output.
The prefix is currently being advertised. Then we bring up the interface.
The prefix is no longer advertised. Remember that the BGP scanner runs every
60 seconds by default so it may take some time before we see results.
This post will describe how to do conditional advertising with BGP. In a real life scenario this can be used to only announce routes to your backup provider when your primary link is down. In a lab scenario this can be used when you are faced with a scenario that says you have to make sure that traffic comes in on interface X/X but if that interface fails it should come in on interface Y/Y. The image below describes the scenario.
We start by putting addresses on interfaces and enable basic BGP. The loopbacks on the Cust router are used for announcing networks.
If we look at ISP2 we have two active BGP session with four prefixes over each.
Lets do a ping and traceroute to verify reachability first.
We have reachability. The next step is to announce the Ethernet link on the
cust router into BGP. We need this prefix in BGP to be able to track it.
ISP will see this prefix as a RIB-failure since it has a route with better AD (connected).
We then configure the Cust router to only advertise 188.8.131.52/24 if the Ethernet link is down.
The advertise-map permits prefixes to be announced when the prefixes in the NON_EXIST map are not in the BGP table.
Other prefixes will not be affected by this configuration. Lets look at what Cust is announcing to ISP2.
We can see that 184.108.40.206/24 is no longer being announced. Ping from ISP2 confirms reachability and a traceroute shows that traffic is passing through ISP1.
We then do a shutdown of the Ethernet link on Cust and look at the results.
BGP table on ISP2. Ping working and traffic now going the direct path.
If we debug BGP updates we will se entries like this.
Sometimes prefixes in BGP do not get installed into the routing table, if the route is also in an IGP that might be a reason but then a RIB-failure would be indicated. This scenario shows another possible source of problems. Once again, the topology is this.
All internal routers are running iBGP in a full mesh. Routers R4 and R6 have eBGP peerings to the backbone routers which are injecting external prefixes into the AS. All internal routers are announcing their loopbacks into BGP. SW3 is trying to reach 220.127.116.11 in the prefix 18.104.22.168/8 but is unable to do so, lets look at some output.
SW3 can’t reach 22.214.171.124, why?
We have no route there, what routes can we see from BGP?
We can see all the loopbacks just fine but we have no route to the external prefixes. What is R6 announcing to us?
R6 is announcing the external prefixes to us but what do we have in our BGP table? Output has been abbreviated.
So we do have 126.96.36.199/8 via 188.8.131.52 and 184.108.40.206 but how do we get to the next-hops, remember that route recursion will occur and that the first rule of the BGP best path is that we must have a valid next-hop. We can see that the route is valid but not best.
We have an invalid next-hop, so that is why the route is not being installed, lets fix this.
That should take care of the next-hops, lets check the routing table.
We now have a route for the next-hop. Lets look at the BGP table again.
So the path is now have a best path, is it in the routing table?
Route is installed, we should be good to go.
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 220.127.116.11, timeout is 2 seconds:
Packet sent with a source address of 18.104.22.168
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/37/84 ms
Success. Always remember to have a valid next-hop in BGP. Next-hops are modified over eBGP peerings but not over iBGP. To resolve this kind of problem either redistribute connected interface to the external peer into IGP or use next-hop-self on iBGP peerings. A route-map can also be used to achieve the same thing. I hope this post has showed you how to do BGP troubleshooting step by step.
I have added content for BGP to the flash cards. There is now a total of 112 questions so far. More content in other areas will be added as I go through them in my studies. The file as always is located here.
Yesterday I did some Internetwork Expert vol1 labs on BGP. I was having trouble getting some of the peers to come up and had to troubleshoot. This post will describe how to troubleshoot when peers won’t form. First, lets look at the topology. Thanks to DennisD on IEOC forums for the image.
R2 and R5 should peer with each other in AS 100. R2 is setup to peer with R5′s IP 22.214.171.124 and R5 is setup to peer with R2′s IP 126.96.36.199. It would have been better to peer over the 188.8.131.52/24 subnet directly but this is to show the steps of troubleshooting. So the session will not form, why? Lets look at some output from debug ip tcp transactions.
We can see that R5 is initiating the connection, it is sending a TCP SYN to R2 on port 179 but R2 responds with a TCP RST which resets the connection. This could indicate that either R2 is not running BGP or that their is a problem with the neighbor statements.
So we want to know what IP R5 is using when sending TCP packets to R2. Lets debug IP packets.
R5 is using its IP of 184.108.40.206 to communicate with 220.127.116.11 but R2 expects R5 to setup the BGP session from the IP of 18.104.22.168. Let’s verify why R5 is using 22.214.171.124 to get to 126.96.36.199. This is a look at the routing table.
We can see that R5 has two equal cost paths to reach the IP of R2. The next hop is either 188.8.131.52 or 184.108.40.206 and these are reachable via the connected subnet of Serial0/0. That is why R5 is using the IP of 220.127.116.11 to source packets. How can we solve this? Either we can setup the neighbor statement to point at 18.104.22.168 or we can change the update-source.
A debug IP packet confirms that the right interface is now being used.
Show ip bgp confirms that they are now peers.
Show tcp brief is a good command to see TCP sessions to/from the router.
And this is how to do basic BGP troubleshooting.
- Uses TCP as transport, port 179
- Path vector protocol
Checks before becoming a neighbor
- The TCP connection request must come from an IP associated with a neighbor command
- The AS number must match that in the neighbore statement
- The routers can not have duplicate router IDs
- If authentication is configured it must also match
Uses a keepalive and hold timer, defaults to 60 and 180 seconds.
BGP neighbor states
Idle - BGP not initiated yet
Connect - Listening for TCP
Active - Initiate TCP
Open sent - Open sent, TCP is up
Open confirm - Open receivec, TCP is up
Established - Peering has been established
BGP message types
Open - Used to establish neighbor session and exchange parameters
Keepalive - Used to maintain the neighbor relationship
Update - Used to exchange routing information
Notification - Used when BGP errors occur, resets neighbor session
- Uses a sub ASN, real AS divided into smaller sections where each section has an private ASN
- The range is from 64512 to 65535
- Every sub-AS has to be fully meshed internally and uses iBGP logic
- Connections between different sub AS acts as an EBGP connection
- Confederation ASNs is not considered when deciding the AS-path length
- Painful to migrate since it requires to change AS number in router bgp command
- Real AS identified with bgp confederation identifier
- Peers defined with bgp confederation peers
- Confederation AS numbers in AS-path will be removed before advertising to true eBGP peer
- Removes the need for full mesh, all iBGP routers peer with route reflector
- RR responsible for reflecting routes to clients, RR is usually not in forwarding path
- No change is needed on clients to implement RR
- The RR and its clients create a cluster, it is possible to have multiple RRs in a cluster
- Route reflectors in different clusters should be fully meshed
To ensure no loops in this topology BGP needs two new attributes:
Cluster_list - Route reflectors add their cluster ID to this attribute before sending an update. Updates with same cluster ID as local RR will be discarded.
Originator_ID - The ID of the router that originated the prefix. If a router sees its own ID in this attribute it will not use or propagate this prefix.
AS_PATH - Lists ASNs trough which the route has been advertised - Well known Mandatory
NEXT_HOP - Lists the next-hop IP address used to reach the NLRI - Well known Mandatory
AGGREGATOR - Lists the RID and ASN of the router that created a summary NLRI - Optional Transitive
ATOMIC_AGGREGATE - Tags a summary NLRI as being a summary - Well known Discretionary
ORIGIN - The origin of the route, igp, egp or incomplete - Well known Mandatory
ORIGINATOR_ID - The RID of the iBGP neighbor that injected a NLRI into the AS - Optional Nontransitive
CLUSTER_LIST - Used by RRs to lister the RR cluster IDs in order to prevent loops - Optional Nontransitive
Injecting routes into BGP
Done via network command or redistribute from an IGP or static routes.
Injecting a default route into BGP
Use the network 0.0.0.0 command - Requires that 0.0.0.0 exists in routing table
neighbor default-originate - Always advertise default route even if not present in local routing table
default-information originate - Requires route in routing table and a redistribute command
BGP best path algorithm
0. Discard routes with invalid next-hop
1. Routes with highest weight (Cisco proprietary)
2. Routes with highest local preference
3. Routes locally injected
4 Routes with shortest AS-path
5. Routes with best origin
6. Routes with lowest Multiple Exit Discriminator (MED)
7. Prefer eBGP over iBGP (confederation eBGP treated as iBGP)
8. Routes with lowest metric to next-hop