BGP is a path vector protocol. This is similar to distance vector protocols such as RIP. Protocols like these, as opposed to link state protocols such as OSPF and ISIS, are not aware of any topology. They can only act on information received by peers. Information is not flooded in the same manner as IGPs where a change in connectivity is immediately flooded while also running SPF. With distance and path vector protocols, good news travels fast and bad news travels slow. What does this mean? What does it have to do with path hunting?
To explain what path hunting is, let’s work with the topology below:
With how the routers and autonomous systems are connected, under stable conditions RT05 will have prefix 198.51.100.1/32 via the following AS paths:
- 64513 64512
- 64514 64513 64512
- 64515 64514 64513 64512
This is also shown visually in the following diagram:
There’s a total of three different paths, sorted from shortest to longest. Let’s verify this:
RT05#sh bgp ipv4 uni BGP table version is 11, local router ID is 192.168.128.174 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, x best-external, a additional-path, c RIB-compressed, t secondary path, L long-lived-stale, Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 198.51.100.1/32 192.0.2.9 0 64513 64512 i * 192.0.2.21 0 64515 64514 64513 64512 i * 192.0.2.17 0 64514 64513 64512 i
Before starting our experiment, let’s verify what the Minimum Route Advertisement Interval (MRAI) is set to:
RT01#sh bgp ipv4 uni nei 192.0.2.2 | i advertisement Default minimum time between advertisement runs is 30 seconds
For Cisco routers, this is 30 seconds for eBGP peers and it’s per neighbor, not per prefix. Now, let’s remove 198.51.100.1/32 from RT01 by shutting down loopback0:
RT01(config)#int lo0 RT01(config-if)#shut
My expectation here was that it would take some time to converge due to MRAI but that was not the case. We can still see the path hunting, though. Starting at RT01, it withdraws the route:
Sep 25 07:41:29.178: BGP(0): redist event (2) request for 198.51.100.1/32 Sep 25 07:41:29.178: BGP(0): route 198.51.100.1/32 down Sep 25 07:41:29.178: BGP(0): no valid path for 198.51.100.1/32 Sep 25 07:41:29.178: BGP: topo global:IPv4 Unicast:base Remove_fwdroute for 198.51.100.1/32 Sep 25 07:41:29.179: BGP(0): (base) 192.0.2.2 send unreachable (format) 198.51.100.1/32 Sep 25 07:41:29.186: BGP(0): 192.0.2.2 rcv UPDATE about 198.51.100.1/32 -- withdrawn
RT02 receives the update:
Sep 25 07:41:29.194: BGP(0): 192.0.2.1 rcv UPDATE about 198.51.100.1/32 -- withdrawn Sep 25 07:41:29.194: BGP(0): no valid path for 198.51.100.1/32 Sep 25 07:41:29.194: BGP: topo global:IPv4 Unicast:base Remove_fwdroute for 198.51.100.1/32 Sep 25 07:41:29.195: BGP(0): (base) 192.0.2.1 send unreachable (format) 198.51.100.1/32
RT02 also starts receiving updates from RT03 and RT05 about a new path which is denied due to RT02 being in the path. This is path hunting!
Sep 25 07:41:29.205: BGP(0): 192.0.2.6 rcv UPDATE w/ attr: nexthop 192.0.2.6, origin i, originator 0.0.0.0, merged path 64514 64516 64513 64512, AS_PATH , community , large community , extended community , SSA attribute Sep 25 07:41:29.206: BGPSSA ssacount is 0, Tunnel attribute Sep 25 07:41:29.206: Tunnel encap type: 0, encap size: 0, Link-state attribute: {}, PrefixSid attribute: Sep 25 07:41:29.206: BGP(0): 192.0.2.6 rcv UPDATE about 198.51.100.1/32 -- DENIED due to: AS-PATH contains our own AS; Sep 25 07:41:29.208: BGP(0): 192.0.2.10 rcv UPDATE w/ attr: nexthop 192.0.2.10, origin i, originator 0.0.0.0, merged path 64516 64514 64513 64512, AS_PATH , community , large community , extended community , SSA attribute Sep 25 07:41:29.208: BGPSSA ssacount is 0, Tunnel attribute Sep 25 07:41:29.208: Tunnel encap type: 0, encap size: 0, Link-state attribute: {}, PrefixSid attribute: Sep 25 07:41:29.209: BGP(0): 192.0.2.10 rcv UPDATE about 198.51.100.1/32 -- DENIED due to: AS-PATH contains our own AS; Sep 25 07:41:59.307: BGP(0): 192.0.2.6 rcv UPDATE w/ attr: nexthop 192.0.2.6, origin i, originator 0.0.0.0, merged path 64514 64515 64516 64513 64512, AS_PATH , community , large community , extended community , SSA attribute Sep 25 07:41:59.307: BGPSSA ssacount is 0, Tunnel attribute Sep 25 07:41:59.307: Tunnel encap type: 0, encap size: 0, Link-state attribute: {}, PrefixSid attribute: Sep 25 07:41:59.307: BGP(0): 192.0.2.6 rcv UPDATE about 198.51.100.1/32 -- DENIED due to: AS-PATH contains our own AS; Sep 25 07:41:59.598: BGP(0): 192.0.2.10 rcv UPDATE about 198.51.100.1/32 -- withdrawn
Let’s take a look at RT05. It first receives the update from RT02:
Sep 25 07:41:29.218: BGP(0): 192.0.2.9 rcv UPDATE about 198.51.100.1/32 -- withdrawn
It then sends an update:
Sep 25 07:41:29.219: BGP(0): Revise route installing 1 of 1 routes for 198.51.100.1/32 -> 192.0.2.17(global) to main IP table Sep 25 07:41:29.219: BGP(0): 192.0.2.17 NEXT_HOP is on same subnet as the bgp peer and set to 192.0.2.17 for net 198.51.100.1/32, flags 200, sb: C0000210, mask: FFFFFFFC Sep 25 07:41:29.219: BGP(0): (base) 192.0.2.17 send UPDATE (format) 198.51.100.1/32, next 192.0.2.17, metric 0, path 64514 64513 64512
30 seconds later it is receiving updates from RT03 and RT04:
Sep 25 07:41:59.327: BGP(0): 192.0.2.17 rcv UPDATE about 198.51.100.1/32 -- DENIED due to: AS-PATH contains our own AS; Sep 25 07:41:59.559: BGP: 192.0.2.21 Next hop is our own address 192.0.2.22 Sep 25 07:41:59.559: BGP(0): 192.0.2.21 rcv UPDATE w/ attr: nexthop 192.0.2.22, origin i, originator 0.0.0.0, merged path 64515 64516 64514 64513 64512, AS_PATH , community , large community , extended community , SSA attribute Sep 25 07:41:59.560: BGPSSA ssacount is 0, Tunnel attribute Sep 25 07:41:59.560: Tunnel encap type: 0, encap size: 0, Link-state attribute: {}, PrefixSid attribute: Sep 25 07:41:59.561: BGP(0): 192.0.2.21 rcv UPDATE about 198.51.100.1/32 -- DENIED due to: AS-PATH contains our own AS; NEXTHOP is our own address; Sep 25 07:41:59.614: BGP(0): (base) 192.0.2.17 send unreachable (format) 198.51.100.1/32
Here we see the MRAI kicking in as RT03 and RT04 had previously sent updates and had to wait 30 seconds before sending new ones to RT05. In our lab it took roughly 30 seconds to converge. However, in a real life scenario there are many more updates going back and forth between peers. Let’s simulate this a bit by having routers shut down a loopback so that there are more updates being generated and to see what effect it has with MRAI. The loopback is shut down on RT01:
Sep 25 08:42:30.490: BGP(0): redist event (2) request for 198.51.100.1/32 Sep 25 08:42:30.490: BGP(0): route 198.51.100.1/32 down
It now takes 30 seconds before RT05 receives first update about this loopback:
Sep 25 08:43:00.573: BGP(0): 192.0.2.9 rcv UPDATE about 198.51.100.1/32 -- withdrawn
It also starts receiving updates with itself in the path:
Sep 25 08:43:00.573: BGP(0): 192.0.2.9 rcv UPDATE w/ attr: nexthop 192.0.2.10, origin i, originator 0.0.0.0, merged path 64513 64516 64514, AS_PATH , community , large community , extended community , SSA attribute Sep 25 08:43:00.574: BGPSSA ssacount is 0, Tunnel attribute Sep 25 08:43:00.575: Tunnel encap type: 0, encap size: 0, Link-state attribute: {}, PrefixSid attribute: Sep 25 08:43:00.575: BGP(0): 192.0.2.9 rcv UPDATE about 198.51.100.3/32 -- DENIED due to: AS-PATH contains our own AS; NEXTHOP is our own address;
Finally, around 90 seconds from the original event everything has converged:
Sep 25 08:44:02.000: BGP(0): 192.0.2.17 rcv UPDATE about 198.51.100.1/32 -- withdrawn
It took 90 seconds and a lot of updates before BGP converged even though there was no alternate path to be installed. This is of course the drawback of path vector vs link state. OSPF would have converged within seconds in the same situation. In this scenario it does not matter much as there is no alternate path that provides connectivity, but what if there was? Let’s make the scenario a bit more interesting by adding an additional path that under normal conditions is worse than the best path.
RT05 now has the following paths in BGP:
RT05#sh bgp ipv4 uni BGP table version is 39, local router ID is 192.168.128.174 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, x best-external, a additional-path, c RIB-compressed, t secondary path, L long-lived-stale, Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path * 198.51.100.1/32 192.0.2.30 0 64517 64517 64517 64517 64512 i *> 192.0.2.9 0 64513 64512 i * 192.0.2.21 0 64515 64514 64513 64512 i * 192.0.2.17 0 64514 64513 64512 i
The path through RT06 is the worst currently due to the length of the AS path. Now, let’s initiate a ping on RT05 and the following on RT01 to simulate a failure:
- Implement a prefix-filter towards RT02.
- Implement an access list blocking ICMP on interface towards RT02.
- BGP route refresh towards RT02.
RT05#ping 198.51.100.1 so lo0 re 200000 Type escape sequence to abort. Sending 200000, 100-byte ICMP Echos to 198.51.100.1, timeout is 2 seconds: Packet sent with a source address of 198.51.100.5 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! <SNIP> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!U.U.U.U.U.U.U.U.U.U.U. U.U.U.U.U..........................!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! <SNIP> Success rate is 99 percent (14277/14334), round-trip min/avg/max = 1/1/10 ms
57 packets were lost. If each lost packet represents one second, it means it took BGP almost a minute to converge even though there was an alternate path all along. This is of course due to the combination of path hunting and MRAI. What happens if we set MRAI to zero?
RT01(config-router)#nei 192.0.2.2 advertisement-interval 0 RT01(config-router)#nei 192.0.2.26 advertisement-interval 0
Repeat on all routers.
RT05#ping 198.51.100.1 so lo0 re 200000 Type escape sequence to abort. Sending 200000, 100-byte ICMP Echos to 198.51.100.1, timeout is 2 seconds: Packet sent with a source address of 198.51.100.5 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! <SNIP> !!!!!!!U.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Success rate is 99 percent (14087/14089), round-trip min/avg/max = 1/1/8 ms
With MRAI set to zero, convergence is much faster. This shows that the path hunting does not take very long. It’s the speed of advertising the prefix that is the main limiting factor. We can see from debugs on RT05 the events that take place:
Oct 7 08:40:35.061: BGP(0): 192.0.2.9 rcv UPDATE about 198.51.100.1/32 -- withdrawn Oct 7 08:40:35.061: BGP(0): Revise route installing 1 of 1 routes for 198.51.100.1/32 -> 192.0.2.17(global) to main IP table Oct 7 08:40:35.062: BGP(0): 192.0.2.17 NEXT_HOP is on same subnet as the bgp peer and set to 192.0.2.17 for net 198.51.100.1/32, flags 200, sb: C0000210, mask: FFFFFFFC Oct 7 08:40:35.062: BGP(0): (base) 192.0.2.17 send UPDATE (format) 198.51.100.1/32, next 192.0.2.17, metric 0, path 64514 64513 64512 Oct 7 08:40:35.067: BGP: 192.0.2.17 Next hop is our own address 192.0.2.18 Oct 7 08:40:35.067: BGP(0): 192.0.2.17 rcv UPDATE w/ attr: nexthop 192.0.2.18, origin i, originator 0.0.0.0, merged path 64514 64516 64513 64512, AS_PATH , community , large community , extended community , SSA attribute Oct 7 08:40:35.068: BGPSSA ssacount is 0, Tunnel attribute Oct 7 08:40:35.068: Tunnel encap type: 0, encap size: 0, Link-state attribute: {}, PrefixSid attribute: Oct 7 08:40:35.069: BGP(0): 192.0.2.17 rcv UPDATE about 198.51.100.1/32 -- DENIED due to: AS-PATH contains our own AS; NEXTHOP is our own address; Oct 7 08:40:35.069: BGP(0): Revise route installing 1 of 1 routes for 198.51.100.1/32 -> 192.0.2.21(global) to main IP table Oct 7 08:40:35.069: BGP(0): 192.0.2.17 NEXT_HOP is on same subnet as the bgp peer 192.0.2.21 and set to 192.0.2.21 for net 198.51.100.1/32, flags 200, sb: C0000210, mask: FFFFFFFC Oct 7 08:40:35.069: BGP(0): (base) 192.0.2.17 send UPDATE (format) 198.51.100.1/32, next 192.0.2.21, metric 0, path 64515 64514 64513 64512 Oct 7 08:40:35.074: BGP(0): 192.0.2.17 rcv UPDATE about 198.51.100.1/32 -- withdrawn Oct 7 08:40:35.074: BGP: 192.0.2.21 Next hop is our own address 192.0.2.22 Oct 7 08:40:35.075: BGP(0): 192.0.2.21 rcv UPDATE w/ attr: nexthop 192.0.2.22, origin i, originator 0.0.0.0, merged path 64515 64516 64513 64512, AS_PATH , community , large community , extended community , SSA attribute Oct 7 08:40:35.075: BGPSSA ssacount is 0, Tunnel attribute Oct 7 08:40:35.075: Tunnel encap type: 0, encap size: 0, Link-state attribute: {}, PrefixSid attribute: Oct 7 08:40:35.075: BGP(0): 192.0.2.21 rcv UPDATE about 198.51.100.1/32 -- DENIED due to: AS-PATH contains our own AS; NEXTHOP is our own address; Oct 7 08:40:35.076: BGP: 192.0.2.21 Next hop is our own address 192.0.2.22 Oct 7 08:40:35.076: BGP(0): 192.0.2.21 rcv UPDATE w/ attr: nexthop 192.0.2.22, origin i, originator 0.0.0.0, merged path 64515 64516 64514 64513 64512, AS_PATH , community , large community , extended community , SSA attribute Oct 7 08:40:35.076: BGPSSA ssacount is 0, Tunnel attribute Oct 7 08:40:35.076: Tunnel encap type: 0, encap size: 0, Link-state attribute: {}, PrefixSid attribute: Oct 7 08:40:35.077: BGP(0): 192.0.2.21 rcv UPDATE about 198.51.100.1/32 -- DENIED due to: AS-PATH contains our own AS; NEXTHOP is our own address; Oct 7 08:40:35.077: BGP(0): Revise route installing 1 of 1 routes for 198.51.100.1/32 -> 192.0.2.30(global) to main IP table Oct 7 08:40:35.078: BGP(0): 192.0.2.17 NEXT_HOP is on same subnet as the bgp peer 192.0.2.30 and set to 192.0.2.30 for net 198.51.100.1/32, flags 200, sb: C0000210, mask: FFFFFFFC Oct 7 08:40:35.078: BGP(0): (base) 192.0.2.17 send UPDATE (format) 198.51.100.1/32, next 192.0.2.30, metric 0, path 64517 64517 64517 64517 64512 Oct 7 08:40:35.082: BGP(0): 192.0.2.21 rcv UPDATE about 198.51.100.1/32 -- withdrawn Oct 7 08:40:35.084: BGP: 192.0.2.21 Next hop is our own address 192.0.2.22 Oct 7 08:40:35.084: BGP(0): 192.0.2.21 rcv UPDATE w/ attr: nexthop 192.0.2.22, origin i, originator 0.0.0.0, merged path 64515 64516 64517 64517 64517 64517 64512, AS_PATH , community , large community , extended community , SSA attribute Oct 7 08:40:35.085: BGPSSA ssacount is 0, Tunnel attribute Oct 7 08:40:35.085: Tunnel encap type: 0, encap size: 0, Link-state attribute: {}, PrefixSid attribute: Oct 7 08:40:35.086: BGP(0): 192.0.2.21 rcv UPDATE about 198.51.100.1/32 -- DENIED due to: AS-PATH contains our own AS; NEXTHOP is our own address; Oct 7 08:40:35.086: BGP: 192.0.2.17 Next hop is our own address 192.0.2.18 Oct 7 08:40:35.086: BGP(0): 192.0.2.17 rcv UPDATE w/ attr: nexthop 192.0.2.18, origin i, originator 0.0.0.0, merged path 64514 64516 64517 64517 64517 64517 64512, AS_PATH , community , large community , extended community , SSA attribute Oct 7 08:40:35.087: BGPSSA ssacount is 0, Tunnel attribute Oct 7 08:40:35.087: Tunnel encap type: 0, encap size: 0, Link-state attribute: {}, PrefixSid attribute: Oct 7 08:40:35.087: BGP(0): 192.0.2.17 rcv UPDATE about 198.51.100.1/32 -- DENIED due to: AS-PATH contains our own AS; NEXTHOP is our own address; Oct 7 08:40:35.087: BGP: 192.0.2.9 Next hop is our own address 192.0.2.10 Oct 7 08:40:35.088: BGP(0): 192.0.2.9 rcv UPDATE w/ attr: nexthop 192.0.2.10, origin i, originator 0.0.0.0, merged path 64513 64516 64517 64517 64517 64517 64512, AS_PATH , community , large community , extended community , SSA attribute Oct 7 08:40:35.088: BGPSSA ssacount is 0, Tunnel attribute Oct 7 08:40:35.088: Tunnel encap type: 0, encap size: 0, Link-state attribute: {}, PrefixSid attribute: Oct 7 08:40:35.088: BGP(0): 192.0.2.9 rcv UPDATE about 198.51.100.1/32 -- DENIED due to: AS-PATH contains our own AS; NEXTHOP is our own address;
The debug shows the following:
- 08:40:35.061 – RT02 withdraws 198.51.100.1 towards RT05.
- 08:40:35.067 – RT03 sends 198.51.100.1/32 towards RT05 with RT05 as next-hop with AS path 64514 64516 64513 64512.
- 08:40:35.074 – RT03 sends withdraw towards RT05.
- 08:40:35.075 – RT04 sends 198.51.100.1/32 towards RT05 with RT05 as next-hop with AS path 64515 64516 64513 64512.
- 08:40:35.077 – RT05 installs route to 198.51.100.1/32 via RT06.
- 08:40:35.078 – RT05 sends 198.51.100.1/32 towards RT03 with AS path 64517 64517 64517 64517 64512.
- 08:40:35.082 – RT04 sends withdraw towards RT05.
- 08:40:35.084 – RT04 sends 198.51.100.1/32 towards RT05 with RT05 as next-hop with AS path 64515 64516 64517 64517 64517 64517 64512.
- 08:40:35.086 – RT03 sends 198.51.100.1/32 towards RT05 with RT05 as next-hop with AS path 64514 64516 64517 64517 64517 64517 64512.
- 08:40:35.088 – RT02 sends 198.51.100.1/32 towards RT05 with RT05 as next-hop with AS path 64513 64516 64517 64517 64517 64517 64512.
During 27 milliseconds BGP hunts for a path. There’s a lot of updates going back and forth where peers are advertising to RT05 that they know a path but it is going through RT05 itself. The paths keep getting longer as seen by the AS path in the updates.
Let’s summarize what we learned in this post:
- BGP is a path vector protocol that is unaware of any topology.
- During convergence, BGP will hunt for another path.
- BGP happily advertises these paths to peers even though the path is through the peer itself.
- The paths keep growing in length until BGP is done hunting.
- The hunting may or may not produce an alternate path.
- If MRAI is not set to zero, it takes a long time to converge while BGP is hunting.
- With MRAI set to zero, BGP can converge quickly.
- Optimal MRAI timer depends on scenario, platform, and OS.
I’ll see you in the next blog post!
Would the AS path loop check on NX-OS that was mentioned in your last blog about valley free routing help with convergence in this scenario? The routers would not send all those possible paths to RT05 because it’s ASN was already in the path.
Thanks! Really appreciate the depth of your explanation on this.
Thanks, Adam!