There are two main methods that can be used to forward broadcast, unknown unicast, and multicast (BUM) frames in a VXLAN network:

  • Ingress replication.
  • Multicast in underlay.

In this post, we take a detailed look at how multicast can be used to forward BUM frames by running multicast in the underlay. We are using the topology from my Building a VXLAN Lab Using Nexus9000v post. The Spine switches are configured using the Nexus feature Anycast RP. That is, no MSDP is used between the RPs. To be able to forward broadcast frames such as ARP in our topology, the following is required:

  • The VTEPs must signal that they want to join the shared tree for 239.0.0.1 (to receive multicast) using a PIM Join.
  • The VTEPs must signal that they intend to send multicast for 239.0.0.1 on the source tree using a PIM Register.
  • The RPs must share information about the sources that they know of by forwarding PIM Register messages.
  • The VTEP must encapsulate ARP packets in VXLAN and forward in the underlay using multicast.

The Leaf switches have the following configuration to enable multicast:

ip pim rp-address 192.0.2.255 group-list 224.0.0.0/4

interface nve1
  source-interface loopback1
  member vni 1000
    mcast-group 239.0.0.1

interface Ethernet1/1
  no switchport
  mtu 9216
  medium p2p
  ip address 198.51.100.1/31
  ip ospf network point-to-point
  no ip ospf passive-interface
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  mtu 9216
  medium p2p
  ip address 198.51.100.9/31
  ip ospf network point-to-point
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode
  no shutdown

Let’s first verify that the Leaf has PIM neighbors and is aware of the RP:

Leaf1# show ip pim nei
PIM Neighbor Status for VRF "default"
Neighbor        Interface            Uptime    Expires   DR       Bidir-  BFD    ECMP Redirect
                                                         Priority Capable State     Capable
198.51.100.0    Ethernet1/1          3d21h     00:01:19  1        yes     n/a     no
198.51.100.8    Ethernet1/2          3d22h     00:01:31  1        yes     n/a     no
Leaf1# show ip pim rp vrf default
PIM RP Status Information for VRF "default"
BSR disabled
Auto-RP disabled
BSR RP Candidate policy: None
BSR RP policy: None
Auto-RP Announce policy: None
Auto-RP Discovery policy: None

RP: 192.0.2.255, (0), 
 uptime: 3d22h   priority: 255, 
 RP-source: (local),  
 group ranges:
 224.0.0.0/4   

Let’s also verify that the Spines are aware of each other as RPs:

Spine1# show ip pim rp vrf default
PIM RP Status Information for VRF "default"
BSR disabled
Auto-RP disabled
BSR RP Candidate policy: None
BSR RP policy: None
Auto-RP Announce policy: None
Auto-RP Discovery policy: None

Anycast-RP 192.0.2.255 members:
  192.0.2.1*  192.0.2.2  

RP: 192.0.2.255*, (0), 
 uptime: 10w0d   priority: 255, 
 RP-source: (local),  
 group ranges:
 224.0.0.0/4   

That is all looking as expected.

Currently, Leafs have their NVE interface shutdown to enable us to do packet captures for the entire interaction. I will enable packet capture in Wireshark on my sniffer to capture the PIM packets.

Currently Leaf1 has no outgoing interface for 239.0.0.1:

Leaf1# show ip mroute
IP Multicast Routing Table for VRF "default"

(*, 232.0.0.0/8), uptime: 3d22h, pim ip 
  Incoming interface: Null, RPF nbr: 0.0.0.0
  Outgoing interface list: (count: 0)

The NVE interface is brought up and Leaf1 sends a PIM Join towards the RP:

Leaf1 also sends a PIM Register:

The RP (Spine1) then sends a PIM Register Stop:

Spine1 also forwards the PIM Register to Spine2 as they are part of the same Anycast RP set:

At this point, Leaf1 is both on the shared and the source tree:

Leaf1# show ip mroute 239.0.0.1
IP Multicast Routing Table for VRF "default"

(*, 239.0.0.1/32), uptime: 00:27:06, nve ip pim 
  Incoming interface: Ethernet1/1, RPF nbr: 198.51.100.0
  Outgoing interface list: (count: 1)
    nve1, uptime: 00:27:06, nve


(203.0.113.1/32, 239.0.0.1/32), uptime: 00:27:06, nve mrib ip pim 
  Incoming interface: loopback1, RPF nbr: 203.0.113.1
  Outgoing interface list: (count: 1)
    Ethernet1/1, uptime: 00:26:08, pim

For the shared tree, the incoming interface is Eth1/1 which is the interface towards the Spine and the outgoing interface is the VXLAN interface. For the source tree, the source is loopback1 which is the source interface for the NVE and the outgoing interface is Eth1/1 towards Spine1.

The RP is also aware of Leaf1:

Spine1# show ip mroute 239.0.0.1
IP Multicast Routing Table for VRF "default"

(*, 239.0.0.1/32), uptime: 00:30:17, pim ip 
  Incoming interface: loopback1, RPF nbr: 192.0.2.255
  Outgoing interface list: (count: 1)
    Ethernet1/1, uptime: 00:30:17, pim


(203.0.113.1/32, 239.0.0.1/32), uptime: 00:29:20, pim mrib ip 
  Incoming interface: Ethernet1/1, RPF nbr: 198.51.100.1, internal
  Outgoing interface list: (count: 1)
    Ethernet1/1, uptime: 00:26:20, pim, (RPF)

Now Leaf4 will have its NVE brought up and we will verify the mroutes again on the Spines:

Spine1# show ip mroute 239.0.0.1
IP Multicast Routing Table for VRF "default"

(*, 239.0.0.1/32), uptime: 00:37:00, pim ip 
  Incoming interface: loopback1, RPF nbr: 192.0.2.255
  Outgoing interface list: (count: 1)
    Ethernet1/1, uptime: 00:37:00, pim


(203.0.113.1/32, 239.0.0.1/32), uptime: 00:36:03, pim mrib ip 
  Incoming interface: Ethernet1/1, RPF nbr: 198.51.100.1, internal
  Outgoing interface list: (count: 1)
    Ethernet1/1, uptime: 00:33:03, pim, (RPF)


(203.0.113.4/32, 239.0.0.1/32), uptime: 00:04:32, pim mrib ip 
  Incoming interface: Ethernet1/4, RPF nbr: 198.51.100.7, internal
  Outgoing interface list: (count: 1)
    Ethernet1/1, uptime: 00:04:32, pim

Spine2# show ip mroute 239.0.0.1
IP Multicast Routing Table for VRF "default"

(*, 239.0.0.1/32), uptime: 00:02:08, pim ip 
  Incoming interface: loopback1, RPF nbr: 192.0.2.255
  Outgoing interface list: (count: 1)
    Ethernet1/4, uptime: 00:02:08, pim


(203.0.113.1/32, 239.0.0.1/32), uptime: 00:33:04, pim ip 
  Incoming interface: Ethernet1/1, RPF nbr: 198.51.100.9, internal
  Outgoing interface list: (count: 1)
    Ethernet1/4, uptime: 00:02:08, pim


(203.0.113.4/32, 239.0.0.1/32), uptime: 00:01:33, pim mrib ip 
  Incoming interface: Ethernet1/4, RPF nbr: 198.51.100.15, internal
  Outgoing interface list: (count: 0)

Now let’s clear the ARP cache on Server1 (198.51.100.11) and ping Server4 (198.51.100.44):

cisco@server1:~$ sudo ip neighbor flush dev ens160
cisco@server1:~$ ping 198.51.100.44
PING 198.51.100.44 (198.51.100.44) 56(84) bytes of data.
64 bytes from 198.51.100.44: icmp_seq=1 ttl=64 time=9.46 ms
64 bytes from 198.51.100.44: icmp_seq=2 ttl=64 time=5.03 ms
64 bytes from 198.51.100.44: icmp_seq=3 ttl=64 time=4.19 ms
64 bytes from 198.51.100.44: icmp_seq=4 ttl=64 time=4.35 ms

Server1 sends ARP request that is a broadcast frame as the MAC address of Server4 is unknown:

Note that the frame is sent to 239.0.0.1 using multicast but that it is VXLAN encapsulated. The destination port is 4789 (VXLAN) and the source port is determined based on an algorithm.

Note that when the ARP response comes back, Server4 has learned the MAC address of Server1 so there is no need for a broadcast. The ARP response is unicast and is also sent as unicast between Leaf4 and Leaf1:

There you have it. VXLAN with flood and learn using multicast in the underlay for BUM frames. In another post we will take a look at ingress replication. See you next time!

Forwarding BUM Frames in VXLAN Network With Multicast in Underlay
Tagged on:         

Leave a Reply

Your email address will not be published. Required fields are marked *