Many vendors offer MLAG features, that is, the ability to form a PortChannel (some vendors call it trunk or bond) towards two separate devices. In this post, I will cover the following:
- Briefly describe vPC in a traditional network.
- Describe vPC in a VXLAN/EVPN network.
- Configure leaf switches to support vPC.
- Setup of Ubuntu Linux host to bond two interfaces and use LACP.
- Verification of the setup.
Traditional vPC
On Cisco Nexus switches, virtual Port Channel (vPC) has been a highly used feature for many years. It has been used towards other network devices such as firewalls, routers, and switches, but also towards hosts running hypervisors such as ESX.
As opposed to other technologies such as Virtual Switching System (VSS) or StackWise Virtual, it does not require the two switches to become one to provide the ability to do MLAG. Instead, the two devices appear as one in PDUs such as LACP, STP, and IGMP, by using a vPC system MAC address as the source MAC. With MLAG features, the two switches need to verify the other is alive and also synchronize state and perform consistency checking. This is done by connecting them with a vPC peer keepalive link, and vPC peer link that runs Cisco Fabric Services (CFS) protocol. The diagram below shows the main vPC terminology:
I’ll expand a bit on the different links and their use before moving on to the VXLAN/EVPN part.
vPC peer keepalive link:
- Periodic heartbeat between vPC peers to verify liveness.
- Sent on UDP port 3200.
- Sent every second.
vPC peer link carries:
- vPC VLANs.
- CFS messages.
- Flooded traffic from peer device.
- STP BPDUs, HSRP Hello messages, and IGMP updates.
- Multicast traffic.
Please note that the vPC peer link should never be blocking.
Rules for vPC:
- It is L2.
- Can be trunk or access.
- VLANs allowed on vPC need to be allowed on the peer link.
- Supports static configuration as well as LACP.
Cisco CFS characteristics:
- Used for synchronization and consistency checking.
- Runs on the peer link.
- Validation and comparison for consistency check.
- Synchronization of MAC addresses for member ports.
- Status of member ports is advertised.
- STP management.
- Synchronization of HSRP and IGMP snooping.
- It is enabled by default.
vPC in VXLAN/EVPN Network
To support vPC in VXLAN/EVPN network, there are two things we need to solve:
- Having two VTEPs appear as one from VXLAN and EVPN perspective.
- Connecting leafs to run the function of vPC peer link and vPC peer keepalive link.
Let’s start with how to make two VTEPs appear as one. This is where Anycast VTEP comes into play.
Anycast VTEP
Normally, EVPN updates are sent with next-hop of VTEP, of course, but what would happen if this was the case also when using vPC? Take a look at the topology below:
In this diagram, Leaf-3 picks one of the routes advertised from Leaf-1 and Leaf-2. Let’s say that it picks Leaf-1. This would mean that traffic from Host-1 could go towards Leaf-1 or Leaf-2 based on load sharing, but traffic to Host-1 would always come in via Leaf-1. This might not sound so bad, but keep in mind that there could be several hosts connected via vPC and that BGP would pick the same leaf for all of them as the best path algorithm would produce the same result. All traffic towards vPC-connected hosts is now coming in via Leaf-1 which could potentially overload its links while Leaf-2 sits almost idle. In addition, this also means that if Leaf-1 has an issue, that all the vPC-connected hosts will be affected as opposed to if traffic was split between Leaf-1 and Leaf-2.
Now that we know what the problem is, how can we fix it? By adding an additional VTEP that is used by both leafs. This is referred to as an Anycast VTEP as the two leafs will use the same IP. They will keep the unique VTEP that they already have, but add an additional one that is used for anycast. Do not confuse this with the Anycast Gateway functionality that is used towards hosts. The concept of Anycast VTEP is shown below:
The Anycast VTEP is configured by adding a secondary IP to the loopback used as source of VTEP:
Note that implementing the secondary IP on the loopback used as source for NVE will be very disruptive!
At this point, nothing with vPC has been configured. The feature isn’t even enabled, but there is already a change in how the routes get advertised via BGP. Take a look at an RT2 on Leaf-4:
Leaf4# show bgp l2vpn evpn 0050.56ad.8506 BGP routing table information for VRF default, address family L2VPN EVPN Route Distinguisher: 192.0.2.3:32777 BGP routing table entry for [2]:[0]:[0]:[48]:[0050.56ad.8506]:[0]:[0.0.0.0]/216, version 8771 Paths: (2 available, best #2) Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW Path type: internal, path is valid, not best reason: Neighbor Address, no labeled nexthop AS-Path: NONE, path sourced internal to AS 203.0.113.12 (metric 81) from 192.0.2.12 (192.0.2.2) Origin IGP, MED not set, localpref 100, weight 0 Received label 10000 Extcommunity: RT:65000:10000 SOO:203.0.113.12:0 ENCAP:8 MAC Mobility Sequence:00:2 Originator: 192.0.2.3 Cluster list: 192.0.2.2 Advertised path-id 1 Path type: internal, path is valid, is best path, no labeled nexthop Imported to 1 destination(s) Imported paths list: L2-10000 AS-Path: NONE, path sourced internal to AS 203.0.113.12 (metric 81) from 192.0.2.11 (192.0.2.1) Origin IGP, MED not set, localpref 100, weight 0 Received label 10000 Extcommunity: RT:65000:10000 SOO:203.0.113.12:0 ENCAP:8 MAC Mobility Sequence:00:2 Originator: 192.0.2.3 Cluster list: 192.0.2.1 Path-id 1 not advertised to any peer BGP routing table entry for [2]:[0]:[0]:[48]:[0050.56ad.8506]:[32]:[198.51.100.11]/272, version 8769 Paths: (2 available, best #2) Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW Path type: internal, path is valid, not best reason: Neighbor Address, no labeled nexthop AS-Path: NONE, path sourced internal to AS 203.0.113.12 (metric 81) from 192.0.2.12 (192.0.2.2) Origin IGP, MED not set, localpref 100, weight 0 Received label 10000 10001 Extcommunity: RT:65000:10000 RT:65000:10001 SOO:203.0.113.12:0 ENCAP:8 MAC Mobility Sequence:00:2 Router MAC:00ad.e688.1b08 Originator: 192.0.2.3 Cluster list: 192.0.2.2 Advertised path-id 1 Path type: internal, path is valid, is best path, no labeled nexthop Imported to 3 destination(s) Imported paths list: Tenant1 L3-10001 L2-10000 AS-Path: NONE, path sourced internal to AS 203.0.113.12 (metric 81) from 192.0.2.11 (192.0.2.1) Origin IGP, MED not set, localpref 100, weight 0 Received label 10000 10001 Extcommunity: RT:65000:10000 RT:65000:10001 SOO:203.0.113.12:0 ENCAP:8 MAC Mobility Sequence:00:2 Router MAC:00ad.e688.1b08 Originator: 192.0.2.3 Cluster list: 192.0.2.1 Path-id 1 not advertised to any peer
The next-hop is now 203.0.113.12, which is the Anycast VTEP. There is also an extended community added, Site of Origin (SoO) 203.0.113.12:0
. Why has this community been added? The SoO community is used in BGP for loop prevention. When a switch is intended to use vPC, it should not learn any routes from the other vPC member as they should be connected to the same hosts and networks. There are of course exceptions to this, which we will cover later, but in general the switch should only learn of MAC addresses and routes either locally or over the vPC peer link. As Leaf-1 and Leaf-2 are now advertising BGP updates with the same SoO, they will ignore each others updates. This can be seen in BGP logs below:
[M 27] [bgp] E_DEBUG [bgp_af_process_nlri:7438] (default) PFX: [L2VPN EVPN] Dropping prefix [3]:[0]:[32]:[203.0.113.12]/88 from peer 192.0.2.12, due to attribute error [M 27] [bgp] E_DEBUG [bgp_af_process_nlri:7438] (default) PFX: [L2VPN EVPN] Dropping prefix [2]:[0]:[0]:[48]:[0050.56ad.8506]:[0]:[0.0.0.0]/112 from peer 192.0.2.12, due to attribute error [M 27] [bgp] E_DEBUG [bgp_af_process_nlri:7438] (default) PFX: [L2VPN EVPN] Dropping prefix [2]:[0]:[0]:[48]:[0050.56ad.b4a4]:[0]:[0.0.0.0]/112 from peer 192.0.2.12, due to attribute error [M 27] [bgp] E_DEBUG [bgp_af_process_nlri:7438] (default) PFX: [L2VPN EVPN] Dropping prefix [2]:[0]:[0]:[48]:[0050.56ad.8506]:[32]:[198.51.100.11]/144 from peer 192.0.2.12, due to attribute error [M 27] [bgp] E_DEBUG [bgp_af_process_nlri:7438] (default) PFX: [L2VPN EVPN] Dropping prefix [5]:[0]:[0]:[24]:[198.51.100.0]/88 from peer 192.0.2.12, due to attribute error
With Anycast VTEP explained, the next blog post will cover how to configure vPC.