Some important details of BGP

We start out with a basic topopology of 3 routers.

R2 and R3 will peer to each others loopback. I have setup OSPF for full reachability
in the network. First we test connectivity.

R2#ping 3.3.3.3 so lo0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 3.3.3.3, timeout is 2 seconds:
Packet sent with a source address of 2.2.2.2
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 40/53/80 ms

There is connectivity. We setup the peering and set ebgp-multihop to 2 since
this is what most people do. I will explain why this is not a good idea.

R2(config)#router bgp 1
R2(config-router)#nei 3.3.3.3 remote-as 3
R2(config-router)#nei 3.3.3.3 update-source loopback 0
R2(config-router)#nei 3.3.3.3 ebgp-multihop 2

R3(config)#router bgp 3
R3(config-router)#nei 2.2.2.2 remote-as 1
R3(config-router)#nei 2.2.2.2 update-source loopback 0
R3(config-router)#nei 2.2.2.2 ebgp-multihop 2

The session comes up.

 %BGP-5-ADJCHANGE: neighbor 2.2.2.2 Up

All good so far. We are not advertising anything yet. We add another loopback
on R3 and advertise that into BGP. We check if R2 is receiving it.

R2#sh bgp ipv4 uni
BGP table version is 3, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 33.33.33.33/32   3.3.3.3                  0             0 3 i

It looks good so far. Now lets think for a while what ebgp-multihop
actually does. The default setting for eBGP is to check that incoming BGP
packets are destined for a directly connected interface. So the default is
to do a connected-check and ebgp-multihop = 1. When we set ebgp-multihop 2
the outgoing TTL is set to 2 and the connected-check is disabled. We confirm
this with a packet capture.

So the TTL is set to 2, is this really necessary? The common argument is that
because we are peering to a loopback the TTL must be set to 2 because the
TTL is decremented before reaching the loopback. When do routers modify packets
before transmitting them? On the egress interface right? We try this theory by
setting up a peering between R1 and R3. We will use no ebgp-multihop to begin
with and then we will debug ip icmp. We have to disable the connected-check
otherwise BGP will only stay idle because a loopback can never be directly
connected.

R1(config-router)#nei 3.3.3.3 remote-as 3
R1(config-router)#nei 3.3.3.3 update-source lo0
R1(config-router)#nei 3.3.3.3 disable-connected-check

R3(config-router)#nei 1.1.1.1 remote-as 1
R3(config-router)#nei 1.1.1.1 update lo0
R3(config-router)#nei 1.1.1.1 disable-connected-check

We can now see that R2 is sending ICMP time exceeded message to R1 and R3.

R1: ICMP: time exceeded rcvd from 12.12.12.2
R3: ICMP: time exceeded rcvd from 23.23.23.2

This is because the TTL was set to 1. The TTL expired while in transit.

Now we setup a peering between R1 and R2 using the loopbacks. We will disable
the connected-check.

R1(config-router)#nei 2.2.2.2 remote-as 1
R1(config-router)#nei 2.2.2.2 update lo0
R1(config-router)#nei 2.2.2.2 disable-connected-check

R2(config-router)#nei 1.1.1.1 remote-as 1
R2(config-router)#nei 1.1.1.1 update lo0
R2(config-router)#nei 1.1.1.1 disable-connected-check

Now according to the people that say that TTL must be 2 for peering to come up
we will prove that this is wrong. The reason peering does not come up when using
loopbacks is that BGP is checking if it is directly connected or not. We take a
look at a BGP packet sent when using the disable-connected-check.

We clearly see that the TTL is 1 but the session still comes up. This proves
that is is not TTL that is expiring when peering to loopbacks!

R1#sh bgp all sum
For address family: IPv4 Unicast
BGP router identifier 1.1.1.1, local AS number 1
BGP table version is 9, main routing table version 9
2 network entries using 240 bytes of memory
2 path entries using 104 bytes of memory
3/2 BGP path/bestpath attribute entries using 372 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
Bitfield cache entries: current 1 (at peak 2) using 32 bytes of memory
BGP using 772 total bytes of memory
BGP activity 5/3 prefixes, 5/3 paths, scan interval 60 secs

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
2.2.2.2         4     2      83      80        9    0    0 00:02:45        1

Finally I want to bring up another disadvantage of using the ebgp-multihop
command when peering between directly connected routers using loopbacks.
We have a peering between R2 and R3. What happens when we shutdown the
interface on either router?

R2(config-router)#int f1/0
R2(config-if)#sh
R2(config-if)#
%OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Interface down or detached
R2(config-if)#
%LINK-5-CHANGED: Interface FastEthernet1/0, changed state to administratively down
%LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet1/0, changed state to down
R2(config-if)#do sh bgp ipv4 uni
BGP table version is 11, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*  33.33.33.33/32   3.3.3.3                  0             0 3 i

When we shutdown the interface the peering still stays up. This is because when using
ebgp-multihop the fast-external-fallover feature can not be used at the same time. This could
lead to blackholes since the peering stays up until the hold time expires (180s). In our
case we have no valid next-hop but what if we put in a default route?

R2(config)#ip route 0.0.0.0 0.0.0.0 12.12.12.1
R2(config)#int f1/0
R2(config-if)#sh
R2(config-if)#do
%OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Interface down or detached
R2(config-if)#do
%LINK-5-CHANGED: Interface FastEthernet1/0, changed state to administratively down
%LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet1/0, changed state to down
R2(config-if)#do sh bgp ipv4 uni
BGP table version is 12, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 33.33.33.33/32   3.3.3.3                  0             0 3 i
R2(config-if)#do sh bgp ipv4 uni
BGP table version is 12, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 33.33.33.33/32   3.3.3.3                  0             0 3 i

Now the route stays in the BGP table until the holdtime expires which creates a
black hole. The default route is now functioning to make sure there is a next-hop
available.

By this post I hope you have got a better understanding of these BGP features
and how a router handles control plane packets. As usual post in comments
if you have any feedback or questions.

Hi Michael,

Yes, I know it is a big statement but it works and I got the idea for this post while in the bootcamp with Brian Dennis who explained the behavior. I have more trust in Brian than Odom and I have tested it for myself.

Unfortunately books aren’t always correct, that is why we have to test for ourselves to really learn. Even the bible TCP/IP by Doyle has some errors. I found some while studying RIP. There is just no way of writing a 500+ page book and not making a single mistake.

Sometimes there could be an update in the errata for errors that went into print.

Unfortunately we can’t send packets with a TTL of 1 easily within IOS. At least not in a way that I know of. That would be perfect for testing otherwise.

Regarding the lab I am 99.999% sure that you would get points for ebgp-multihop as well so it should not make a difference from that point of view.

Aaron

September 14, 2012 at 11:52 pm

Great article,you dispel my confusion on the same problem,Daniel.

Pat

September 15, 2012 at 6:06 pm

So to put it another way, use eBGP multi-hop or ttl security for what it is intended for i.e. when you have multiple hops between you and your remote peer.

reaper81
September 15, 2012 at 7:08 pm

Yeah, exactly. I might bring up TTL-security in another post but that one is pretty self explanatory I think.

networkingtrainingvn

September 15, 2012 at 6:15 pm

Hi
In R2 and R3. You advertise Loopback L0 of R2 and R3 into eBGP. You will see neighbor relationship between R2 and R3 is “up” and “down”. 🙂

reaper81
September 15, 2012 at 7:09 pm

Yeah. That would be a recursive loop 🙂 Kind of like using a tunnel and learning the tunnel prefix over it.

1. networkingtrainingvn
  September 17, 2012 at 5:40 am
  
  yes. you can use : bgp backdoor 😀

Petter Bruland

September 16, 2012 at 7:03 am

Thanks!

reaper81
September 16, 2012 at 7:36 am

You’re welcome 🙂

Michael

September 24, 2012 at 6:08 am

Hi Daniel,
that’s quite a serious statement you’re making here, mate. Your explanation certainly makes sense but CCIE SG (4th edition) makes the very mistake you mentioned by saying that you HAVE to set ‘ebgp-multihop 2’ in order to peer between loopbacks of directly connected routers (p. 376). Could it be just a blunder of Mr. Odom et al. or is it the sort of the ‘official misconception’ of Cisco? I mean what if you’d come across this in the Lab? Would you still do it your way (without multihop command)?

reaper81
September 24, 2012 at 7:07 am

Hi Michael,

Yes, I know it is a big statement but it works and I got the idea for this post while in the bootcamp with Brian Dennis who explained the behavior. I have more trust in Brian than Odom and I have tested it for myself.

Unfortunately books aren’t always correct, that is why we have to test for ourselves to really learn. Even the bible TCP/IP by Doyle has some errors. I found some while studying RIP. There is just no way of writing a 500+ page book and not making a single mistake.

Sometimes there could be an update in the errata for errors that went into print.

Unfortunately we can’t send packets with a TTL of 1 easily within IOS. At least not in a way that I know of. That would be perfect for testing otherwise.

Regarding the lab I am 99.999% sure that you would get points for ebgp-multihop as well so it should not make a difference from that point of view.

1. Jac
  October 5, 2012 at 1:24 am
  
  Daniel,
  
  Any chance of a confirm on this from cisco ? Everywhere else is pushing the 2 hop line including this old thread https://learningnetwork.cisco.com/thread/29894 that was never answered.
  logically the loopback could be the egress interface to the RP and hence an extra decrement (your testing indicatbutcher wise) but cisco are the only ones who will know for sure.

May 29, 2013 at 1:44 pm

I suppose the behaviour could be an undocumented feature – which could subsequently we broken in a later release? TAC case required for the official answer.

reaper81
May 29, 2013 at 1:54 pm

I spoke to someone at Cisco and as he explained it, disabling connected-check does both take care of the connected check and take care of the TTL expiring. So in theory if TTL was set to 1 the packet should not reach loopback but it does due to disabling connected-check.

Alvin

August 5, 2014 at 11:20 am

Hi Daniel,
I see its an old post, but i was labbing something similar when i came across this site of yours.
One thing that stands out, is, a router will not decrement the TTL if the packet is destined to itself. Which leads me to believe, that with the disable-connected-check, works when peering between loopbacks of ‘directly connected’ neighbors. Now, if there was a router inbetween the two ebgp peers, then disable-connected-check would not help, and the neighborship would not form. This makes sense, since the TTL didnt change (remained TTL 1), and the inbetween router would reduce it to 0.
For example, in the topology listed above. Forming ebgp neighborship between R1 and R3, sourcing loopback ips, WITH disable-connected-check on both R1 and R3, will not work. R1 will send the initial packet with TTL 1, and R2 will decrement it to 0, sending an ICMP message back to R1. In that case, you have to use the increased TTL option, with ebgp multihop to something bigger.
But again, great post !! Liked reading it.

Thanks again, and regards

reaper81
August 5, 2014 at 6:16 pm

Thanks Alvin,

I’m not 100% sure how the internals of IOS works so I’m not sure if it’s just TTL related or not but it’s not common that people know that this works.

15 thoughts on “Some important details of BGP”

Leave a Comment Cancel Reply