In a previous post, I walked through how a packet gets bridged in a VXLAN/EVPN network. In this post, I’ll go through how a packet gets routed, that is, packet from one VNI to another VNI. The following topology will be used:
The lab has the following characteristics:
- OSPF in the underlay.
- Ingress replication for BUM traffic through the use of EVPN.
- ARP suppression is enabled.
Server-2 initiates a ping towards Server-4:
Frame 562: 98 bytes on wire (784 bits), 98 bytes captured (784 bits) on interface ens257, id 4 Ethernet II, Src: 00:50:56:ad:f4:8d, Dst: 00:01:00:01:00:01 Internet Protocol Version 4, Src: 10.0.0.22, Dst: 198.51.100.44 Internet Control Message Protocol Type: 8 (Echo (ping) request) Code: 0 Checksum: 0xd745 [correct] [Checksum Status: Good] Identifier (BE): 17 (0x0011) Identifier (LE): 4352 (0x1100) Sequence Number (BE): 1 (0x0001) Sequence Number (LE): 256 (0x0100) [Response frame: 563] Timestamp from icmp data: Mar 3, 2024 08:38:35.804470000 Romance Standard Time [Timestamp from icmp data (relative): 0.000701509 seconds] Data (40 bytes)
The destination MAC is 0001.0001.0001 which is the Anycast GW MAC configured on Leaf-2. As this MAC is used on SVI for VLAN 20 of Leaf-2, the packet will need to be routed. Let’s check the routing table:
Leaf2# show ip route 198.51.100.44/24 longer-prefixes vrf Tenant1 IP Route Table for VRF "Tenant1" '*' denotes best ucast next-hop '**' denotes best mcast next-hop '[x/y]' denotes [preference/metric] '%<string>' in via output denotes VRF <string> 198.51.100.0/24, ubest/mbest: 2/0 *via 203.0.113.1%default, [200/0], 5w6d, bgp-65000, internal, tag 65000, segid: 10001 tunnelid: 0xcb007101 encap: VXLAN *via 203.0.113.4%default, [200/0], 5w6d, bgp-65000, internal, tag 65000, segid: 10001 tunnelid: 0xcb007104 encap: VXLAN 198.51.100.11/32, ubest/mbest: 1/0 *via 203.0.113.1%default, [200/0], 5w6d, bgp-65000, internal, tag 65000, segid: 10001 tunnelid: 0xcb007101 encap: VXLAN 198.51.100.44/32, ubest/mbest: 1/0 *via 203.0.113.4%default, [200/0], 5w6d, bgp-65000, internal, tag 65000, segid: 10001 tunnelid: 0xcb007104 encap: VXLAN
Notice that there are two routes for 198.51.100.0/24 which have been learned via RT5 and then there is a more specific route 198.51.100.44/32 which has been learned via RT2:
Leaf2# show bgp l2vpn evpn 198.51.100.44 BGP routing table information for VRF default, address family L2VPN EVPN Route Distinguisher: 192.0.2.6:32777 BGP routing table entry for [2]:[0]:[0]:[48]:[0050.56ad.7d68]:[32]:[198.51.100.44]/272, version 6444 Paths: (2 available, best #2) Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW Path type: internal, path is valid, not best reason: Neighbor Address, no labeled nexthop AS-Path: NONE, path sourced internal to AS 203.0.113.4 (metric 81) from 192.0.2.12 (192.0.2.2) Origin IGP, MED not set, localpref 100, weight 0 Received label 10000 10001 Extcommunity: RT:65000:10000 RT:65000:10001 ENCAP:8 Router MAC:00ad.7083.1b08 Originator: 192.0.2.6 Cluster list: 192.0.2.2 Advertised path-id 1 Path type: internal, path is valid, is best path, no labeled nexthop Imported to 3 destination(s) Imported paths list: Tenant1 L3-10001 L2-10000 AS-Path: NONE, path sourced internal to AS 203.0.113.4 (metric 81) from 192.0.2.11 (192.0.2.1) Origin IGP, MED not set, localpref 100, weight 0 Received label 10000 10001 Extcommunity: RT:65000:10000 RT:65000:10001 ENCAP:8 Router MAC:00ad.7083.1b08 Originator: 192.0.2.6 Cluster list: 192.0.2.1 Path-id 1 not advertised to any peer
Notice where it says imported paths list and that it has been imported to Tenant1, which is the VRF that is associated with the L3 VNI 10001. It has also been imported into L2 VNI 10000. Also note the Router MAC of 00ad.7083.1b08.
The next-hop for this route is 203.0.113.4 so a lookup is done to find what paths are available:
Leaf2# show ip route 203.0.113.4 IP Route Table for VRF "default" '*' denotes best ucast next-hop '**' denotes best mcast next-hop '[x/y]' denotes [preference/metric] '%<string>' in via output denotes VRF <string> 203.0.113.4/32, ubest/mbest: 2/0 *via 192.0.2.1, Eth1/1, [110/81], 7w5d, ospf-UNDERLAY, intra *via 192.0.2.2, Eth1/2, [110/81], 7w5d, ospf-UNDERLAY, intra
Leaf-2 encapsulates the packet and forwards it towards Spine-1:
Frame 564: 148 bytes on wire (1184 bits), 148 bytes captured (1184 bits) on interface ens225, id 5 Ethernet II, Src: 00:ad:f3:bb:1b:08, Dst: 00:ad:7b:30:1b:08 Internet Protocol Version 4, Src: 203.0.113.2, Dst: 203.0.113.4 User Datagram Protocol, Src Port: 54276, Dst Port: 4789 Virtual eXtensible Local Area Network Flags: 0x0800, VXLAN Network ID (VNI) Group Policy ID: 0 VXLAN Network Identifier (VNI): 10001 Reserved: 0 Ethernet II, Src: 00:ad:f3:bb:1b:08, Dst: 00:ad:70:83:1b:08 Internet Protocol Version 4, Src: 10.0.0.22, Dst: 198.51.100.44 Internet Control Message Protocol Type: 8 (Echo (ping) request) Code: 0 Checksum: 0xd745 [correct] [Checksum Status: Good] Identifier (BE): 17 (0x0011) Identifier (LE): 4352 (0x1100) Sequence Number (BE): 1 (0x0001) Sequence Number (LE): 256 (0x0100) [No response seen] Timestamp from icmp data: Mar 3, 2024 08:38:35.804470000 Romance Standard Time [Timestamp from icmp data (relative): 0.001872153 seconds] Data (40 bytes)
Notice that L3 VNI 10001 is used as this is symmetric IRB. Also notice that the destination MAC is set to 00ad.7b30.1b08, which is the Router MAC of Leaf-4:
Leaf4# show int vlan 100 | i Ether Hardware is EtherSVI, address is 00ad.7083.1b08
Refer to the post on Advertising IPs in RT2 for more details on the Router MAC.
Spine-1 forwards the packet towards Leaf-4:
Frame 565: 148 bytes on wire (1184 bits), 148 bytes captured (1184 bits) on interface ens162, id 7 Ethernet II, Src: 00:ad:7b:30:1b:08, Dst: 00:ad:70:83:1b:08 Internet Protocol Version 4, Src: 203.0.113.2, Dst: 203.0.113.4 User Datagram Protocol, Src Port: 54276, Dst Port: 4789 Virtual eXtensible Local Area Network Flags: 0x0800, VXLAN Network ID (VNI) Group Policy ID: 0 VXLAN Network Identifier (VNI): 10001 Reserved: 0 Ethernet II, Src: 00:ad:f3:bb:1b:08, Dst: 00:ad:70:83:1b:08 Internet Protocol Version 4, Src: 10.0.0.22, Dst: 198.51.100.44 Internet Control Message Protocol Type: 8 (Echo (ping) request) Code: 0 Checksum: 0xd745 [correct] [Checksum Status: Good] Identifier (BE): 17 (0x0011) Identifier (LE): 4352 (0x1100) Sequence Number (BE): 1 (0x0001) Sequence Number (LE): 256 (0x0100) [No response seen] Timestamp from icmp data: Mar 3, 2024 08:38:35.804470000 Romance Standard Time [Timestamp from icmp data (relative): 0.002762421 seconds] Data (40 bytes)
Leaf-4 decapsulates the packet. As the inner destination MAC is set to 00ad.7083.ab08, Leaf-4 will route the packet. Let’s check for 198.51.100.44 in the routing table:
Leaf4# show ip route 198.51.100.44/24 longer-prefixes vrf Tenant1 IP Route Table for VRF "Tenant1" '*' denotes best ucast next-hop '**' denotes best mcast next-hop '[x/y]' denotes [preference/metric] '%<string>' in via output denotes VRF <string> 198.51.100.0/24, ubest/mbest: 1/0, attached *via 198.51.100.1, Vlan10, [0/0], 7w6d, direct 198.51.100.11/32, ubest/mbest: 1/0 *via 203.0.113.1%default, [200/0], 7w4d, bgp-65000, internal, tag 65000, segid: 10001 tunnelid: 0xcb007101 encap: VXLAN
There is a route for 198.51.100.0/24 and this route is attached. This means that Leaf-4 will now check the ARP cache to get the MAC of Server-4:
Leaf4# show ip arp vrf Tenant1 Flags: * - Adjacencies learnt on non-active FHRP router + - Adjacencies synced via CFSoE # - Adjacencies Throttled for Glean CP - Added via L2RIB, Control plane Adjacencies PS - Added via L2RIB, Peer Sync RO - Re-Originated Peer Sync Entry D - Static Adjacencies attached to down interface IP ARP Table for context Tenant1 Total number of entries: 1 Address Age MAC Address Interface Flags 198.51.100.44 00:02:53 0050.56ad.7d68 Vlan10
Leaf-4 does L2 lookup to find where to forward the packet:
Leaf4# show mac address-table address 0050.56ad.7d68 vni 10000 Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since last seen,+ - primary entry using vPC Peer-Link, (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan, (NA)- Not Applicable VLAN MAC Address Type age Secure NTFY Ports ---------+-----------------+--------+---------+------+----+------------------ * 10 0050.56ad.7d68 dynamic NA F F Eth1/3
Leaf-4 forwards the packet towards Server-4:
Frame 566: 98 bytes on wire (784 bits), 98 bytes captured (784 bits) on interface ens194, id 8 Ethernet II, Src: 00:ad:70:83:1b:08, Dst: 00:50:56:ad:7d:68 Internet Protocol Version 4, Src: 10.0.0.22, Dst: 198.51.100.44 Internet Control Message Protocol Type: 8 (Echo (ping) request) Code: 0 Checksum: 0xd745 [correct] [Checksum Status: Good] Identifier (BE): 17 (0x0011) Identifier (LE): 4352 (0x1100) Sequence Number (BE): 1 (0x0001) Sequence Number (LE): 256 (0x0100) [Response frame: 567] Timestamp from icmp data: Mar 3, 2024 08:38:35.804470000 Romance Standard Time [Timestamp from icmp data (relative): 0.003884583 seconds] Data (40 bytes)
This is shown visually below:
Server-4 responds to the ping from Server-2:
Frame 567: 98 bytes on wire (784 bits), 98 bytes captured (784 bits) on interface ens194, id 8 Ethernet II, Src: 00:50:56:ad:7d:68, Dst: 00:01:00:01:00:01 Internet Protocol Version 4, Src: 198.51.100.44, Dst: 10.0.0.22 Internet Control Message Protocol Type: 0 (Echo (ping) reply) Code: 0 Checksum: 0xdf45 [correct] [Checksum Status: Good] Identifier (BE): 17 (0x0011) Identifier (LE): 4352 (0x1100) Sequence Number (BE): 1 (0x0001) Sequence Number (LE): 256 (0x0100) [Request frame: 566] [Response time: 0,188 ms] Timestamp from icmp data: Mar 3, 2024 08:38:35.804470000 Romance Standard Time [Timestamp from icmp data (relative): 0.004072854 seconds] Data (40 bytes)
The destination MAC is 0001.0001.0001 which is the Anycast GW MAC configured on Leaf-4. As this MAC is used on SVI for VLAN 10 of Leaf-4, the packet will need to be routed. Let’s check the routing table:
Leaf4# show ip route 10.0.0.22/24 longer-prefixes vrf Tenant1 IP Route Table for VRF "Tenant1" '*' denotes best ucast next-hop '**' denotes best mcast next-hop '[x/y]' denotes [preference/metric] '%<string>' in via output denotes VRF <string> 10.0.0.22/32, ubest/mbest: 1/0 *via 203.0.113.2%default, [200/0], 7w3d, bgp-65000, internal, tag 65000, segid: 10001 tunnelid: 0xcb007102 encap: VXLAN
One thing to note is that there is only a single /32 route here. Why is there no /24? This is because I haven’t configured redistribute direct
on Leaf-2 so it’s not originating a RT5. The RT2 is below:
Leaf4# show bgp l2vpn evpn 10.0.0.22 BGP routing table information for VRF default, address family L2VPN EVPN Route Distinguisher: 192.0.2.4:32787 BGP routing table entry for [2]:[0]:[0]:[48]:[0050.56ad.f48d]:[32]:[10.0.0.22]/272, version 1638 Paths: (2 available, best #2) Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW Path type: internal, path is valid, not best reason: Neighbor Address, no labeled nexthop AS-Path: NONE, path sourced internal to AS 203.0.113.2 (metric 81) from 192.0.2.12 (192.0.2.2) Origin IGP, MED not set, localpref 100, weight 0 Received label 10002 10001 Extcommunity: RT:65000:10001 RT:65000:10002 ENCAP:8 Router MAC:00ad.f3bb.1b08 Originator: 192.0.2.4 Cluster list: 192.0.2.2 Advertised path-id 1 Path type: internal, path is valid, is best path, no labeled nexthop Imported to 2 destination(s) Imported paths list: Tenant1 L3-10001 AS-Path: NONE, path sourced internal to AS 203.0.113.2 (metric 81) from 192.0.2.11 (192.0.2.1) Origin IGP, MED not set, localpref 100, weight 0 Received label 10002 10001 Extcommunity: RT:65000:10001 RT:65000:10002 ENCAP:8 Router MAC:00ad.f3bb.1b08 Originator: 192.0.2.4 Cluster list: 192.0.2.1 Path-id 1 not advertised to any peer
Note the Router MAC of 00ad.f3bb.1b08. Another thing to note here is that the Imported paths list only has the L3 VNI 10001 and not the L2 VNI 10002. Why? This is because Leaf-4 does not have VNI 10002 configured. Earlier we saw that Leaf-2 imported into VNI 10000 as it had this VNI configured. Why would it import routes also into the L2 VNI? This is because it uses this information for the ARP suppression cache:
Leaf2# show ip arp suppression-cache remote Flags: + - Adjacencies synced via CFSoE L - Local Adjacency R - Remote Adjacency L2 - Learnt over L2 interface PS - Added via L2RIB, Peer Sync RO - Dervied from L2RIB Peer Sync Entry Ip Address Age Mac Address Vlan Physical-ifindex Flags Remote Vtep Addrs 198.51.100.44 00:00:13 0050.56ad.7d68 10 (null) R 203.0.113.4 198.51.100.11 00:00:13 0050.56ad.8506 10 (null) R 203.0.113.1
Now back to the packet walk. Leaf-4 needs a route for 203.0.113.2:
Leaf4# show ip route 203.0.113.2 IP Route Table for VRF "default" '*' denotes best ucast next-hop '**' denotes best mcast next-hop '[x/y]' denotes [preference/metric] '%<string>' in via output denotes VRF <string> 203.0.113.2/32, ubest/mbest: 2/0 *via 192.0.2.1, Eth1/1, [110/81], 7w5d, ospf-UNDERLAY, intra *via 192.0.2.2, Eth1/2, [110/81], 7w5d, ospf-UNDERLAY, intra
Leaf-4 encapsulates the packet and sends it towards Spine-1:
Frame 547: 148 bytes on wire (1184 bits), 148 bytes captured (1184 bits) on interface ens161, id 0 Ethernet II, Src: 00:ad:70:83:1b:08, Dst: 00:ad:b3:fd:1b:08 Internet Protocol Version 4, Src: 203.0.113.4, Dst: 203.0.113.2 User Datagram Protocol, Src Port: 49708, Dst Port: 4789 Virtual eXtensible Local Area Network Flags: 0x0800, VXLAN Network ID (VNI) Group Policy ID: 0 VXLAN Network Identifier (VNI): 10001 Reserved: 0 Ethernet II, Src: 00:ad:70:83:1b:08, Dst: 00:ad:f3:bb:1b:08 Internet Protocol Version 4, Src: 198.51.100.44, Dst: 10.0.0.22 Internet Control Message Protocol Type: 0 (Echo (ping) reply) Code: 0 Checksum: 0xdf45 [correct] [Checksum Status: Good] Identifier (BE): 17 (0x0011) Identifier (LE): 4352 (0x1100) Sequence Number (BE): 1 (0x0001) Sequence Number (LE): 256 (0x0100) Timestamp from icmp data: Mar 3, 2024 08:38:35.804470000 Romance Standard Time [Timestamp from icmp data (relative): 0.004693453 seconds] Data (40 bytes)
Once again, notice that VNI 10001 is used which is the L3 VNI. Also note the inner destination MAC of 00ad.f3bb.1b08 which is the router MAC of Leaf-2:
Leaf2# show int vlan 100 | i Ether Hardware is EtherSVI, address is 00ad.f3bb.1b08
Spine-1 forwards the packet towards Leaf-2:
Frame 559: 148 bytes on wire (1184 bits), 148 bytes captured (1184 bits) on interface ens224, id 2 Ethernet II, Src: 00:ad:b3:fd:1b:08, Dst: 00:ad:f3:bb:1b:08 Internet Protocol Version 4, Src: 203.0.113.4, Dst: 203.0.113.2 User Datagram Protocol, Src Port: 49708, Dst Port: 4789 Virtual eXtensible Local Area Network Flags: 0x0800, VXLAN Network ID (VNI) Group Policy ID: 0 VXLAN Network Identifier (VNI): 10001 Reserved: 0 Ethernet II, Src: 00:ad:70:83:1b:08, Dst: 00:ad:f3:bb:1b:08 Internet Protocol Version 4, Src: 198.51.100.44, Dst: 10.0.0.22 Internet Control Message Protocol Type: 0 (Echo (ping) reply) Code: 0 Checksum: 0xdf45 [correct] [Checksum Status: Good] Identifier (BE): 17 (0x0011) Identifier (LE): 4352 (0x1100) Sequence Number (BE): 1 (0x0001) Sequence Number (LE): 256 (0x0100) Timestamp from icmp data: Mar 3, 2024 08:38:35.804470000 Romance Standard Time [Timestamp from icmp data (relative): 0.005738551 seconds] Data (40 bytes)
Leaf-2 decapsulates the packet. As the inner destination MAC is set to 00ad.f3bb.1b08, Leaf-2 will route the packet. Let’s check for 10.0.0.22 in the routing table:
Leaf2# show ip route 10.0.0.22/24 longer-prefixes vrf Tenant1 IP Route Table for VRF "Tenant1" '*' denotes best ucast next-hop '**' denotes best mcast next-hop '[x/y]' denotes [preference/metric] '%<string>' in via output denotes VRF <string> 10.0.0.0/24, ubest/mbest: 1/0, attached *via 10.0.0.1, Vlan20, [0/0], 7w3d, direct
There is a route for 10.0.0.0/24 and it’s directly attached. Leaf-2 checks the ARP cache to get the MAC address of Server-2:
Leaf2# show ip arp vrf Tenant1 Flags: * - Adjacencies learnt on non-active FHRP router + - Adjacencies synced via CFSoE # - Adjacencies Throttled for Glean CP - Added via L2RIB, Control plane Adjacencies PS - Added via L2RIB, Peer Sync RO - Re-Originated Peer Sync Entry D - Static Adjacencies attached to down interface IP ARP Table for context Tenant1 Total number of entries: 1 Address Age MAC Address Interface Flags 10.0.0.22 00:01:26 0050.56ad.f48d Vlan20
Leaf-2 does L2 lookup to find where to forward the packet:
Leaf2# show mac address-table address 0050.56ad.f48d vni 10002 Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since last seen,+ - primary entry using vPC Peer-Link, (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan, (NA)- Not Applicable VLAN MAC Address Type age Secure NTFY Ports ---------+-----------------+--------+---------+------+----+------------------ * 20 0050.56ad.f48d dynamic NA F F Eth1/3
Leaf-2 forwards the packet to Server-2:
Frame 563: 98 bytes on wire (784 bits), 98 bytes captured (784 bits) on interface ens257, id 4 Ethernet II, Src: 00:ad:f3:bb:1b:08, Dst: 00:50:56:ad:f4:8d Internet Protocol Version 4, Src: 198.51.100.44, Dst: 10.0.0.22 Internet Control Message Protocol Type: 0 (Echo (ping) reply) Code: 0 Checksum: 0xdf45 [correct] [Checksum Status: Good] Identifier (BE): 17 (0x0011) Identifier (LE): 4352 (0x1100) Sequence Number (BE): 1 (0x0001) Sequence Number (LE): 256 (0x0100) [Request frame: 562] [Response time: 5,899 ms] Timestamp from icmp data: Mar 3, 2024 08:38:35.804470000 Romance Standard Time [Timestamp from icmp data (relative): 0.006600705 seconds] Data (40 bytes)
This is shown visually below:
The packet walk is complete! In this post we learned:
- How VXLAN packets get routed in symmetric IRB topology.
- How the Router MAC is crucial to do forwarding with symmetric IRB.
- How with symmetric IRB both ingress and egress VTEP perform both L2 and L3 lookup.
- How the packet changes along the path to its destination.
- How to check forwarding tables to see where to forward the packet.
I hope you were able to follow along and see you in the next one!