This post is inspired by a discussion at Twitter with Ivan Pepelnjak and
Nicolas Michel. Nicolas asked what happens when there is the same route from two
different OSPF processes. Which one will be selected? Ivan explained how
to use the distance command. First before I show how it works and why we
need to get some few basic concepts explained.
LSDB – Link State Database – All OSPF LSAs populate the LSDB
RIB – Routing Information Base – The best routes from every protocol
compete to get installed to the RIB
FIB – Forwarding Information Base – Routes are copied from the RIB
and used for forwarding (CEF)
CEF – Cisco Express Forwarding – The algorithm that Cisco uses for
the forwarding (FIB)
If we have for example OSPF, this is how a route gets selected to the RIB(global).
The routers exchange LSAs with each other. Within an area every router has the same
view of the network. These LSAs populate the LSDB. If there are multiple paths to
a destination they will compete with each other unless they are of same type and equal
cost. Intra area is preferred first, then inter and finally external routes. There is no
way of modifying this behaviour. The best route then goes to the OSPF RIB, could be several
if they are equal. From there this route will compete with other routing protocols and the
AD will decide which one is installed. If the OSPF one is best then that one goes to the global
RIB. Then finally the RIB populates FIB with this information and forwarding can ensue.
This is a picture I made that describes the process.
We start out with a very basic topology looking like this.
R1 and R3 will announce the same network 1.1.1.1/32. R2 will use two different OSPF processes.
We start out with the basic configuration:
R1
R1(config)#int f1/0 R1(config-if)#ip add 12.12.12.1 255.255.255.0 R1(config-if)#no sh R1(config-if)#ip ospf 1 area 0 R1(config-if)#int lo0 R1(config-if)#ip add 1.1.1.1 255.255.255.255 R1(config-if)#ip ospf 1 area 0
R2
R2(config)#int f1/0 R2(config-if)#ip add 12.12.12.2 255.255.255.0 R2(config-if)#no sh R2(config-if)#ip ospf 1 area 0 R2(config-if)#int f1/1 R2(config-if)#ip add 23.23.23.2 255.255.255.0 R2(config-if)#no sh R2(config-if)#ip ospf 3 area 0 %OSPF-5-ADJCHG: Process 1, Nbr 12.12.12.1 on FastEthernet1/0 from LOADING to FULL, Loading Done
We see the session coming up immediately. Now lets bring up R3 as well.
R3
R3(config)#int f1/0 R3(config-if)#ip add 23.23.23.3 255.255.255.0 R3(config-if)#no sh R3(config-if)#ip ospf 3 area 0 R3(config-if)#int lo0 R3(config-if)#ip add 1.1.1.1 255.255.255.255 R3(config-if)#ip ospf 3 area 0 %OSPF-5-ADJCHG: Process 3, Nbr 23.23.23.2 on FastEthernet1/0 from LOADING to FULL, Loading Done
Both OSPF peerings are up. Now lets follow the steps that was shown in
the picture above starting by looking at the database.
R2#sh ip ospf data router 12.12.12.1 OSPF Router with ID (23.23.23.2) (Process ID 3) OSPF Router with ID (12.12.12.2) (Process ID 1) Router Link States (Area 0) LS age: 184 Options: (No TOS-capability, DC) LS Type: Router Links Link State ID: 12.12.12.1 Advertising Router: 12.12.12.1 LS Seq Number: 80000003 Checksum: 0xF78 Length: 48 Number of Links: 2 Link connected to: a Stub Network (Link ID) Network/subnet number: 1.1.1.1 (Link Data) Network Mask: 255.255.255.255 Number of MTID metrics: 0 TOS 0 Metrics: 1 Link connected to: a Transit Network (Link ID) Designated Router address: 12.12.12.1 (Link Data) Router Interface address: 12.12.12.1 Number of MTID metrics: 0 TOS 0 Metrics: 1
We see that R1 is announcing 1.1.1.1/32 and we have a metric of 2 to it.
Do we see R3 announcing that as well?
R2#sh ip ospf data router 23.23.23.3 OSPF Router with ID (23.23.23.2) (Process ID 3) Router Link States (Area 0) LS age: 148 Options: (No TOS-capability, DC) LS Type: Router Links Link State ID: 23.23.23.3 Advertising Router: 23.23.23.3 LS Seq Number: 80000003 Checksum: 0x54A7 Length: 48 Number of Links: 2 Link connected to: a Stub Network (Link ID) Network/subnet number: 1.1.1.1 (Link Data) Network Mask: 255.255.255.255 Number of MTID metrics: 0 TOS 0 Metrics: 1 Link connected to: a Transit Network (Link ID) Designated Router address: 23.23.23.2 (Link Data) Router Interface address: 23.23.23.3 Number of MTID metrics: 0 TOS 0 Metrics: 1
Yes, it’s there. Now we take a look at the OSPF RIB. Which ones do we see there?
R2#sh ip ospf rib OSPF Router with ID (23.23.23.2) (Process ID 3) Base Topology (MTID 0) OSPF local RIB Codes: * - Best, > - Installed in global RIB * 1.1.1.1/32, Intra, cost 2, area 0 via 23.23.23.3, FastEthernet1/1 * 23.23.23.0/24, Intra, cost 1, area 0, Connected via 23.23.23.2, FastEthernet1/1 OSPF Router with ID (12.12.12.2) (Process ID 1) Base Topology (MTID 0) OSPF local RIB Codes: * - Best, > - Installed in global RIB *> 1.1.1.1/32, Intra, cost 2, area 0 via 12.12.12.1, FastEthernet1/0 * 12.12.12.0/24, Intra, cost 1, area 0, Connected via 12.12.12.2, FastEthernet1/0
The greater than sign indicates that the one from OSPF process 1 was selected.
Why? When running multiple OSPF processes the one that first installs to the
RIB will be selected to the global RIB. Now we confirm by looking in the
global RIB.
R2# show ip route ospf Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2 ia - IS-IS inter area, * - candidate default, U - per-user static route o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP + - replicated route, % - next hop override Gateway of last resort is not set 1.0.0.0/32 is subnetted, 1 subnets O 1.1.1.1 [110/2] via 12.12.12.1, 00:06:35, FastEthernet1/0
Yes, that looks correct. Final step is to verify that FIB is also updated.
R2#sh ip cef 1.1.1.1/32 1.1.1.1/32 nexthop 12.12.12.1 FastEthernet1/0
So the one that first writes to the global RIB wins. Now lets bring down the
process that is currently winning.
R2(config)#int f1/0 R2(config-if)#sh R2(config-if)#
The OSPF RIB and global RIB should now be updated.
R2#show ip ospf rib OSPF Router with ID (23.23.23.2) (Process ID 3) Base Topology (MTID 0) OSPF local RIB Codes: * - Best, > - Installed in global RIB *> 1.1.1.1/32, Intra, cost 2, area 0 via 23.23.23.3, FastEthernet1/1 * 23.23.23.0/24, Intra, cost 1, area 0, Connected via 23.23.23.2, FastEthernet1/1 OSPF Router with ID (12.12.12.2) (Process ID 1) Base Topology (MTID 0) OSPF local RIB Codes: * - Best, > - Installed in global RIB
R2#show ip route ospf Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2 ia - IS-IS inter area, * - candidate default, U - per-user static route o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP + - replicated route, % - next hop override Gateway of last resort is not set 1.0.0.0/32 is subnetted, 1 subnets O 1.1.1.1 [110/2] via 23.23.23.3, 00:00:42, FastEthernet1/1
Now if we bring back OSPF process 1, what will happen? Process 3 should still be
winning since it installed to global RIB first.
R2(config)#int f1/0 R2(config-if)#no sh
R2#sh ip ospf rib OSPF Router with ID (2.2.2.2) (Process ID 11) Base Topology (MTID 0) OSPF local RIB Codes: * - Best, > - Installed in global RIB OSPF Router with ID (23.23.23.2) (Process ID 3) Base Topology (MTID 0) OSPF local RIB Codes: * - Best, > - Installed in global RIB * 1.1.1.1/32, Intra, cost 2, area 0 via 23.23.23.3, FastEthernet1/1 * 23.23.23.0/24, Intra, cost 1, area 0, Connected via 23.23.23.2, FastEthernet1/1 OSPF Router with ID (12.12.12.2) (Process ID 1) Base Topology (MTID 0) OSPF local RIB Codes: * - Best, > - Installed in global RIB *> 1.1.1.1/32, Intra, cost 2, area 0 via 12.12.12.1, FastEthernet1/0 * 12.12.12.0/24, Intra, cost 1, area 0, Connected via 12.12.12.2, FastEthernet1/0
Now process 1 is winning, which is odd. Lets debug ip routing to see what is
really happening. We shutdown interface in process 1.
*Mar 14 23:26:36.555: RT: del 1.1.1.1 via 12.12.12.1, ospf metric [110/2] *Mar 14 23:26:36.559: RT: delete subnet route to 1.1.1.1/32 *Mar 14 23:26:36.579: RT: updating ospf 1.1.1.1/32 (0x0): via 23.23.23.3 Fa1/1 *Mar 14 23:26:36.583: RT: add 1.1.1.1/32 via 23.23.23.3, ospf metric [110/2]
Now we bring back process 1.
*Mar 14 23:29:04.163: RT: updating ospf 1.1.1.1/32 (0x0): via 12.12.12.1 Fa1/0 *Mar 14 23:29:04.171: RT: closer admin distance for 1.1.1.1, flushing 1 routes *Mar 14 23:29:04.175: RT: add 1.1.1.1/32 via 12.12.12.1, ospf metric [110/2]
We can see that IOS is claiming that distance is lower which it is clearly not.
What happens if we change process 1 to process 11 and we shutdown the interface
in process 3?
R2(config)#int f1/1 R2(config-if)#sh R2(config-if)#int f1/0 R2(config-if)#ip ospf 11 area 0
Now we look at the output from the debug.
*Mar 14 23:33:27.615: RT: updating ospf 1.1.1.1/32 (0x0): via 12.12.12.1 Fa1/0 *Mar 14 23:33:27.619: RT: add 1.1.1.1/32 via 12.12.12.1, ospf metric [110/2] *Mar 14 23:33:39.927: RT: updating connected 23.23.23.0/24 (0x0): via 0.0.0.0 Fa1/1 *Mar 14 23:33:39.931: RT: add 23.23.23.0/24 via 0.0.0.0, connected metric [0/0] *Mar 14 23:33:39.939: RT: interface FastEthernet1/1 added to routing table *Mar 14 23:33:39.947: RT: updating connected 23.23.23.2/32 (0x0): via 0.0.0.0 Fa1/1 *Mar 14 23:33:39.951: RT: network 23.0.0.0 is now variably masked *Mar 14 23:33:39.951: RT: add 23.23.23.2/32 via 0.0.0.0, connected metric [0/0] *Mar 14 23:33:55.447: RT: updating ospf 1.1.1.1/32 (0x0): via 23.23.23.3 Fa1/1 *Mar 14 23:33:55.455: RT: closer admin distance for 1.1.1.1, flushing 1 routes *Mar 14 23:33:55.455: RT: add 1.1.1.1/32 via 23.23.23.3, ospf metric [110/2]
We can see that first process 11 is the only option available so the 1.1.1.1/32
route is installed via f1/0. Then f1/1 comes back up and now 1.1.1.1/32 is reachable
via f1/1 and is chosen because of “closer admin distance” which is not true. This must
mean that the OSPF process number is the tie breaker.
We take a look at the OSPF RIB and global RIB to verify once more.
R2#sh ip ospf rib OSPF Router with ID (22.22.22.22) (Process ID 11) Base Topology (MTID 0) OSPF local RIB Codes: * - Best, > - Installed in global RIB * 1.1.1.1/32, Intra, cost 2, area 0 via 12.12.12.1, FastEthernet1/0 * 12.12.12.0/24, Intra, cost 1, area 0, Connected via 12.12.12.2, FastEthernet1/0 OSPF Router with ID (23.23.23.2) (Process ID 3) Base Topology (MTID 0) OSPF local RIB Codes: * - Best, > - Installed in global RIB *> 1.1.1.1/32, Intra, cost 2, area 0 via 23.23.23.3, FastEthernet1/1 * 23.23.23.0/24, Intra, cost 1, area 0, Connected via 23.23.23.2, FastEthernet1/1 OSPF Router with ID (12.12.12.2) (Process ID 1) Base Topology (MTID 0) OSPF local RIB Codes: * - Best, > - Installed in global RIB R2#sh ip route ospf Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2 ia - IS-IS inter area, * - candidate default, U - per-user static route o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP + - replicated route, % - next hop override Gateway of last resort is not set 1.0.0.0/32 is subnetted, 1 subnets O 1.1.1.1 [110/2] via 23.23.23.3, 00:09:02, FastEthernet1/1
What if we change the AD of process 11?
R2(config)#router ospf 11 R2(config-router)#distance ospf intra-area 100
*Mar 14 23:43:31.315: RT: updating ospf 1.1.1.1/32 (0x0): via 12.12.12.1 Fa1/0 *Mar 14 23:43:31.319: RT: closer admin distance for 1.1.1.1, flushing 1 routes *Mar 14 23:43:31.323: RT: add 1.1.1.1/32 via 12.12.12.1, ospf metric [100/2]
That makes process 11 win again. So these tests seems to indicate that if everything
is the same then the tiebreaker is the lowest process number. For EIGRP it is the
lowest AS number so maybe Cisco chose to make it comparable.
Also take a look at what Ivan is saying at IOS hints
I was under the impression that there was just one RIB. This was confirmed by this post: http://www.dasblinkenlichten.com/?p=2249 Now I see “OSPF local RIB” in the output of sh ip ospf rib.
I know it’s just semantics after all, but I admit I’m confused. 🙂
I’ve looked at it as a RIB and that’s how it was described to me but I could be wrong. I’ll have to look into it some more. I’ll get back to you.
Actually the “show ip ospf rib” will show the routes submitted to the RIB process by OSPF. The RIB process is a complex system. The most common part of the system is a “cache” that store routing informations and it is called Routing table. Others include Route selection process, APIs that interface with other process like OSPF & BGP, redistribution process, data structures like prefix,path and interface descriptor.
The RIB system main functions is to interface to different routing information sources, receive the information, select the “best” one, store it in the routing table cache, feed the forwarding process’ cache(FIB) and maintain/verify the routing table entries with both event & periodically driven algorithms.
Since in the above scenario there are two OSPF processes, each will interface with the RIB process using different APIs and thus they will be treated as different routing information sources by the RIB. It is the “Route selection” process, part of the RIB process, that will determine which one to use. Most often it will determine this by using AD value submitted by the routing information source ( routing protocol, in this case)
Hi Daniel,
Very nice experiment.
I’m not sure which IOS version you used, but I think you have run into a “bug” as mentioned in this site:
http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080531fd2.shtml
“Before Cisco bug ID CSCdw10987 (registered customers only) (integrated in Cisco IOS Software Releases 12.2(07.04)S, 12.2(07.04)T, and later), the last process to make an shortest path first algorithm (SPF) would have won, and the two processes overwrite other routes in the routing table.”
I think you were shutting down the interface for the “lower” OSPF process every time, making it the last process to run SPF when you bring the interface up and overwrite the other process’ entries.
I have recreated your scenario using 12.4(15)T and the result is that the first OSPF process that installed the entry will always win in accordance to the above Cisco’s site.
I tested it on 15.2 something. I’ll try to recreate it but configure the process with the higher ID first.
What happens when you make the lower process submit an internal route with a high metric, and the second higher process submit an E2 route with a lower metric. 🙂
What happens when you make the lower process submit an internal route with a high metric, and the second higher process submit an E2 route with a lower metric 🙂
Even in this scenario, the first route to enter will still win the RIB route selection process. RIB will treat the two different OSPF processes as if they are from different routing protocol sources.
Please check Alex Zinin’s bok chapter 4 routing table maintenance:
” c. If the AD values are equal but the routes are from different sources, the behavior depends on whether the routes are submitted by processes of the same type. If they are, the metrics of the routes are compared as if the routes were from the same protocol. If metrics are the same, the route from the process with the lower process number wins. If the routes are from different sources, the new route is ignored. (Note that this check is implemented for EIGRP routes only. All other routes are treated as from different protocols, so the old route is left in the routing table.) ”
I wonder what is the reason behind the EIGRP exception, may be Cisco’s designers want to be unique for their own indigenous protocol?
I’ve just confirmed with 12.2 SRE (33) and I don’t see that. I get the same result as Daniel in that whatever process number on R1 is the lowest, will win. It doesn’t matter if I increase the metric to either of them, the lowest process still wins.
The only time the lowest process doesn’t win, is when the lowest process number has an external route and the higher process number has an internal route, in that case the higher process wins.
But if they both submit type5s or type1s, then the route form the lowest process will win, and even take over the route from the other.
At least in the version I tested.
try to use higher IOS versions. I tested using 12.4(15)T and the oldest route won always. I didn’t test using internal/external routes, but I don’t think the route type can and should make any difference. In Daniel’s task, he always shutdown the interface with a lower process number, making it the last one to run the SPF when he brought back the interfaces.
Alternatively, try to shut down the interfaces of both process consecutively and if both processes can overwrite each other’s route table entry, then the reported bug exists.
Both 15.2 and 12.2(33)SRE are both later versions that 12.4(15)T and show the ‘lowest process number wins’ behaviour
HI Daniel,
I’m using the version (Cisco IOS Software, 3600 Software (C3660-IS-M), Version 12.3(11)T2, RELEASE SOFTWARE (fc1)) in GNS3. When i try to use the command “show ip ospf rib” it’s saying command invalid.
Please help me on this
That’s a very old IOS. You should be using at least 12.4T. Try to use show ip ospf route instead.
Hi Daniel,
I tried with Both IOS versions (12.3 and 12.4) but neither supporting the command “sh ip ospf route” and “sh ip ospf rib”
Thanks Daniel. Let me download and try it.
Great blog Daniel. I originally thought it would have used the cost and was expecting loops but your blog & Ivan’s cleared it up.
One interesting point is I got a GNS3 lab setup with CSR1k running IOS15.4(2)s. It would seem that Cisco has changed the way OSPF prefers path. OSPF now uses the lowest process ID as a tie-breaker like EIGRP.
I’ve experienced the same behaviour on VIRL with IOSv (version 15.6). I was trying to replicate the behaviour as seen in a video and noticed the difference.
In VIRL at least the router (now) prefers the lowest process ID. I wonder if there’s any documentation to be found on this subject. So far I haven’t been able to find any.
So it would seem like this is another possible means of stopping loops of forming although it would have almost zero use in Production environments as it would only stop loops from forming one way. I did test this by changing the process ID.