We start out with a basic topopology of 3 routers.
R2 and R3 will peer to each others loopback. I have setup OSPF for full reachability
in the network. First we test connectivity.
R2#ping 22.214.171.124 so lo0 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 126.96.36.199, timeout is 2 seconds: Packet sent with a source address of 188.8.131.52 !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 40/53/80 ms
There is connectivity. We setup the peering and set ebgp-multihop to 2 since
this is what most people do. I will explain why this is not a good idea.
R2(config)#router bgp 1 R2(config-router)#nei 184.108.40.206 remote-as 3 R2(config-router)#nei 220.127.116.11 update-source loopback 0 R2(config-router)#nei 18.104.22.168 ebgp-multihop 2
R3(config)#router bgp 3 R3(config-router)#nei 22.214.171.124 remote-as 1 R3(config-router)#nei 126.96.36.199 update-source loopback 0 R3(config-router)#nei 188.8.131.52 ebgp-multihop 2
The session comes up.
%BGP-5-ADJCHANGE: neighbor 184.108.40.206 Up
All good so far. We are not advertising anything yet. We add another loopback
on R3 and advertise that into BGP. We check if R2 is receiving it.
R2#sh bgp ipv4 uni BGP table version is 3, local router ID is 220.127.116.11 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> 18.104.22.168/32 22.214.171.124 0 0 3 i
It looks good so far. Now lets think for a while what ebgp-multihop
actually does. The default setting for eBGP is to check that incoming BGP
packets are destined for a directly connected interface. So the default is
to do a connected-check and ebgp-multihop = 1. When we set ebgp-multihop 2
the outgoing TTL is set to 2 and the connected-check is disabled. We confirm
this with a packet capture.
So the TTL is set to 2, is this really necessary? The common argument is that
because we are peering to a loopback the TTL must be set to 2 because the
TTL is decremented before reaching the loopback. When do routers modify packets
before transmitting them? On the egress interface right? We try this theory by
setting up a peering between R1 and R3. We will use no ebgp-multihop to begin
with and then we will debug ip icmp. We have to disable the connected-check
otherwise BGP will only stay idle because a loopback can never be directly
R1(config-router)#nei 126.96.36.199 remote-as 3 R1(config-router)#nei 188.8.131.52 update-source lo0 R1(config-router)#nei 184.108.40.206 disable-connected-check
R3(config-router)#nei 220.127.116.11 remote-as 1 R3(config-router)#nei 18.104.22.168 update lo0 R3(config-router)#nei 22.214.171.124 disable-connected-check
We can now see that R2 is sending ICMP time exceeded message to R1 and R3.
R1: ICMP: time exceeded rcvd from 126.96.36.199 R3: ICMP: time exceeded rcvd from 188.8.131.52
This is because the TTL was set to 1. The TTL expired while in transit.
Now we setup a peering between R1 and R2 using the loopbacks. We will disable
R1(config-router)#nei 184.108.40.206 remote-as 1 R1(config-router)#nei 220.127.116.11 update lo0 R1(config-router)#nei 18.104.22.168 disable-connected-check
R2(config-router)#nei 22.214.171.124 remote-as 1 R2(config-router)#nei 126.96.36.199 update lo0 R2(config-router)#nei 188.8.131.52 disable-connected-check
Now according to the people that say that TTL must be 2 for peering to come up
we will prove that this is wrong. The reason peering does not come up when using
loopbacks is that BGP is checking if it is directly connected or not. We take a
look at a BGP packet sent when using the disable-connected-check.
We clearly see that the TTL is 1 but the session still comes up. This proves
that is is not TTL that is expiring when peering to loopbacks!
R1#sh bgp all sum For address family: IPv4 Unicast BGP router identifier 184.108.40.206, local AS number 1 BGP table version is 9, main routing table version 9 2 network entries using 240 bytes of memory 2 path entries using 104 bytes of memory 3/2 BGP path/bestpath attribute entries using 372 bytes of memory 1 BGP AS-PATH entries using 24 bytes of memory 0 BGP route-map cache entries using 0 bytes of memory 0 BGP filter-list cache entries using 0 bytes of memory Bitfield cache entries: current 1 (at peak 2) using 32 bytes of memory BGP using 772 total bytes of memory BGP activity 5/3 prefixes, 5/3 paths, scan interval 60 secs Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 220.127.116.11 4 2 83 80 9 0 0 00:02:45 1
Finally I want to bring up another disadvantage of using the ebgp-multihop
command when peering between directly connected routers using loopbacks.
We have a peering between R2 and R3. What happens when we shutdown the
interface on either router?
R2(config-router)#int f1/0 R2(config-if)#sh R2(config-if)# %OSPF-5-ADJCHG: Process 1, Nbr 18.104.22.168 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Interface down or detached R2(config-if)# %LINK-5-CHANGED: Interface FastEthernet1/0, changed state to administratively down %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet1/0, changed state to down R2(config-if)#do sh bgp ipv4 uni BGP table version is 11, local router ID is 22.214.171.124 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path * 126.96.36.199/32 188.8.131.52 0 0 3 i
When we shutdown the interface the peering still stays up. This is because when using
ebgp-multihop the fast-external-fallover feature can not be used at the same time. This could
lead to blackholes since the peering stays up until the hold time expires (180s). In our
case we have no valid next-hop but what if we put in a default route?
R2(config)#ip route 0.0.0.0 0.0.0.0 184.108.40.206 R2(config)#int f1/0 R2(config-if)#sh R2(config-if)#do %OSPF-5-ADJCHG: Process 1, Nbr 220.127.116.11 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Interface down or detached R2(config-if)#do %LINK-5-CHANGED: Interface FastEthernet1/0, changed state to administratively down %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet1/0, changed state to down R2(config-if)#do sh bgp ipv4 uni BGP table version is 12, local router ID is 18.104.22.168 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> 22.214.171.124/32 126.96.36.199 0 0 3 i R2(config-if)#do sh bgp ipv4 uni BGP table version is 12, local router ID is 188.8.131.52 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> 184.108.40.206/32 220.127.116.11 0 0 3 i
Now the route stays in the BGP table until the holdtime expires which creates a
black hole. The default route is now functioning to make sure there is a next-hop
By this post I hope you have got a better understanding of these BGP features
and how a router handles control plane packets. As usual post in comments
if you have any feedback or questions.
15 thoughts on “Some important details of BGP”
Great article,you dispel my confusion on the same problem,Daniel.
So to put it another way, use eBGP multi-hop or ttl security for what it is intended for i.e. when you have multiple hops between you and your remote peer.
Yeah, exactly. I might bring up TTL-security in another post but that one is pretty self explanatory I think.
In R2 and R3. You advertise Loopback L0 of R2 and R3 into eBGP. You will see neighbor relationship between R2 and R3 is “up” and “down”. 🙂
Yeah. That would be a recursive loop 🙂 Kind of like using a tunnel and learning the tunnel prefix over it.
yes. you can use : bgp backdoor 😀
You’re welcome 🙂
that’s quite a serious statement you’re making here, mate. Your explanation certainly makes sense but CCIE SG (4th edition) makes the very mistake you mentioned by saying that you HAVE to set ‘ebgp-multihop 2’ in order to peer between loopbacks of directly connected routers (p. 376). Could it be just a blunder of Mr. Odom et al. or is it the sort of the ‘official misconception’ of Cisco? I mean what if you’d come across this in the Lab? Would you still do it your way (without multihop command)?
Yes, I know it is a big statement but it works and I got the idea for this post while in the bootcamp with Brian Dennis who explained the behavior. I have more trust in Brian than Odom and I have tested it for myself.
Unfortunately books aren’t always correct, that is why we have to test for ourselves to really learn. Even the bible TCP/IP by Doyle has some errors. I found some while studying RIP. There is just no way of writing a 500+ page book and not making a single mistake.
Sometimes there could be an update in the errata for errors that went into print.
Unfortunately we can’t send packets with a TTL of 1 easily within IOS. At least not in a way that I know of. That would be perfect for testing otherwise.
Regarding the lab I am 99.999% sure that you would get points for ebgp-multihop as well so it should not make a difference from that point of view.
Any chance of a confirm on this from cisco ? Everywhere else is pushing the 2 hop line including this old thread https://learningnetwork.cisco.com/thread/29894 that was never answered.
logically the loopback could be the egress interface to the RP and hence an extra decrement (your testing indicatbutcher wise) but cisco are the only ones who will know for sure.
I suppose the behaviour could be an undocumented feature – which could subsequently we broken in a later release? TAC case required for the official answer.
I spoke to someone at Cisco and as he explained it, disabling connected-check does both take care of the connected check and take care of the TTL expiring. So in theory if TTL was set to 1 the packet should not reach loopback but it does due to disabling connected-check.
I see its an old post, but i was labbing something similar when i came across this site of yours.
One thing that stands out, is, a router will not decrement the TTL if the packet is destined to itself. Which leads me to believe, that with the disable-connected-check, works when peering between loopbacks of ‘directly connected’ neighbors. Now, if there was a router inbetween the two ebgp peers, then disable-connected-check would not help, and the neighborship would not form. This makes sense, since the TTL didnt change (remained TTL 1), and the inbetween router would reduce it to 0.
For example, in the topology listed above. Forming ebgp neighborship between R1 and R3, sourcing loopback ips, WITH disable-connected-check on both R1 and R3, will not work. R1 will send the initial packet with TTL 1, and R2 will decrement it to 0, sending an ICMP message back to R1. In that case, you have to use the increased TTL option, with ebgp multihop to something bigger.
But again, great post !! Liked reading it.
Thanks again, and regards
I’m not 100% sure how the internals of IOS works so I’m not sure if it’s just TTL related or not but it’s not common that people know that this works.