In my last post on Configuring EVPN, we setup EVPN but configured no services. In this post we will configure a basic L2 service so we can dive into the different EVPN route types. This post will cover route type 2 and 3 together as you will commonly see these together. This post will cover:
- Discovery of VTEPs.
- How to map a VLAN to a VNI.
- Automatic generation of RD and RT.
- Advertising MAC- and optionally IP address (route type 2).
- Ingress replication with dynamic discovery of VTEPs (route type 3).
The topology we will use for this post is shown below:
Before diving into configuration, let’s discuss something that is often overlooked, VTEP discovery.
VTEP discovery
Without EVPN, VXLAN uses flood and learn behavior for discovery of VTEPs. This means that any host sending VXLAN frames would be considered a trusted VTEP in the network. This is obviously not great from a security perspective. When using EVPN, adding VTEPs is based on BGP messages. A VTEP will learn about other VTEPs based on these BGP updates. It’s not a specific route type, but rather any type of EVPN message. This makes it more difficult to add a rogue VTEP as once BGP has been enabled, only VTEPs discovered by BGP are allowed to forward VXLAN frames. VXLAN frames coming from other devices will be dropped. An additional layer of security can be added to BGP by requiring MD5 authentication for BGP peers.
As this lab starts, the list of NVE peers is empty as there are no entries in the BGP table:
Leaf1# show nve peers Leaf1# show bgp l2vpn evpn Leaf1#
Next, let’s configure a L2 service.
Configuring L2 VNI
In our example a VLAN will be mapped to a VNI. This mapping is applied for the entire switch. It is also possible on some platforms to do a port local mapping. The configuration below is applied:
After configuration has been applied, we can now see NVE peers:
Leaf1# show nve peers Interface Peer-IP State LearnType Uptime Router-Mac --------- -------------------------------------- ----- --------- -------- ----------------- nve1 203.0.113.4 Up CP 23:26:52 n/a
Before diving into route types, let’s cover RD and RT.
Automatic generation of RD and RT
In my previous post Introduction to EVPN In VXLAN Networks, I described the different formats for RD and RT as can be seen below:
Instead of coming up with a scheme for generating RD and RT and keeping track of the assigned values, this can be auto generated by NX-OS. For RD, type 1 scheme is used with four bytes consisting of the BGP RID and two bytes consisting of an ID for the MAC VRF. The ID assigned to the MAC VRF is the number 32767 + VLAN ID. On Leaf1, the BGP RID is 192.0.2.3 and the VLAN we mapped to VNI 10000 is 10 so the expected RD is 192.0.2.3:32777.
When it comes to RT, type 0 scheme is used with two bytes consisting of the ASN, and four bytes consisting of the VNI. On Leaf 1 AS 65000 is used and the VNI is 10000 so we are expecting an RT of 65000:10000.
To confirm the RD and RT, the command show bgp evi
can be used:
Leaf1# show bgp evi ----------------------------------------------- L2VNI ID : 10000 (L2-10000) RD : 192.0.2.3:32777 Prefixes (local/total) : 2/4 Created : Dec 31 01:30:53.699048 Last Oper Up/Down : Dec 31 01:30:53.699837 / never Enabled : Yes Active Export RT list : 65000:10000 Active Import RT list : 65000:10000
The expected RD and RT has been generated. Now let’s dive into the route types.
EVPN route type 2
EVPN route type 2 is used to carry endpoint information. This is normally the MAC address, but also optionally the IP address. Before diving into the details, let’s walk through some of the use cases for route type 2:
- Host MAC address advertisement – This is the obvious use case where type 2 routes advertise MAC addresses so that hosts belonging to same L2 network on different VTEPs can communicate with each other.
- Host ARP advertisement – EVPN can be leverage to advertise MAC/IP of a host. With this information features like ARP suppression can be used where a VTEP can act as a proxy for ARP requests and minimize the amount of BUM traffic sent on a network. There are also scenarios with host mobility where for example a VM has moved to another location, sends GARP, and where local VTEP advertises this route. The original VTEP will then be aware of that the host has moved.
- Host IP route advertisement – A /32 route is advertised with the IP of a host so that hosts can communicate with each other in a distributed gateway scenario. If several VTEPs have the same IP configured towards hosts (anycast gateway), then /32 routes need to be advertised as advertising the actual prefix, such as /24, would not work when that prefix is already in the local routing table.
- ND entry advertisement – As with v4 it’s possible to advertise combination of MAC/IPv6 address to enable features like ND suppression.
- Host IPv6 route advertisement – A /128 route is advertised with the IPv6 address of a host so that v6 hosts can communicate with each other in a distributed gateway scenario.
Let’s get into the details! RFC 7432 defines the following fields in route type 2:
Note that some fields must have information and some are optional. Let’s get deeper into each field:
- RD – Route Distinguisher to create unique prefixes.
- ESI – Ethernet Segment Identifier. Identifies an Ethernet segment. This is zero for single homed hosts.
- Ethernet Tag ID – Identifies a broadcast domain in an EVPN instance. This is zero except when using VLAN-aware bundle services.
- MAC Address Length – Length of MAC address carried in route. This is 48 bits.
- MAC Address – The MAC address carried in route.
- IP Address Length – The length of the IP address carried in the route, if any. Set to 0 if IP address is not being carried. Otherwise, set to 32 or 128, depending on if v4 or v6 address is advertised.
- IP Address – The IP address carried in route. It is set to 0 if not used. Otherwise, a 32-bit or 128-bit IPv4/IPv6 address.
- MPLS Label1 – This is set to the L2 VNI. Used in data plane to forward frames to correct MAC VRF.
- MPLS Label2 – This is set to L3 VNI if IPs are being advertised. Otherwise this label is not used. Used in data plane to forward packets to correct IP VRF.
Now let’s look at what a type 2 route looks like on the CLI:
Leaf1# show bgp l2vpn evpn vni-id 10000 BGP routing table information for VRF default, address family L2VPN EVPN BGP table version is 11, Local Router ID is 192.0.2.3 Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - best2 Network Next Hop Metric LocPrf Weight Path Route Distinguisher: 192.0.2.3:32777 (L2VNI 10000) *>i[2]:[0]:[0]:[48]:[0050.56ad.7d68]:[0]:[0.0.0.0]/216 203.0.113.4 100 0 i *>l[2]:[0]:[0]:[48]:[0050.56ad.8506]:[0]:[0.0.0.0]/216 203.0.113.1 100 32768 i
This all looks a bit confusing so let’s break it down for one of the routes:
We can also look at a route in more detail:
This is what the route looks like from a packet capture:
Frame 604: 170 bytes on wire (1360 bits), 170 bytes captured (1360 bits) on interface ens162, id 7 Ethernet II, Src: 00:ad:70:83:1b:08, Dst: 00:ad:7b:30:1b:08 Internet Protocol Version 4, Src: 192.0.2.104, Dst: 192.0.2.12 Transmission Control Protocol, Src Port: 179, Dst Port: 33688, Seq: 447, Ack: 699, Len: 104 Border Gateway Protocol - UPDATE Message Marker: ffffffffffffffffffffffffffffffff Length: 104 Type: UPDATE Message (2) Withdrawn Routes Length: 0 Total Path Attribute Length: 81 Path attributes Path Attribute - MP_REACH_NLRI Flags: 0x90, Optional, Extended-Length, Non-transitive, Complete Type Code: MP_REACH_NLRI (14) Length: 44 Address family identifier (AFI): Layer-2 VPN (25) Subsequent address family identifier (SAFI): EVPN (70) Next hop: 203.0.113.4 Number of Subnetwork points of attachment (SNPA): 0 Network Layer Reachability Information (NLRI) EVPN NLRI: MAC Advertisement Route Route Type: MAC Advertisement Route (2) Length: 33 Route Distinguisher: 0001c00002068009 (192.0.2.6:32777) ESI: 00:00:00:00:00:00:00:00:00:00 Ethernet Tag ID: 0 MAC Address Length: 48 MAC Address: 00:50:56:ad:7d:68 IP Address Length: 0 IP Address: NOT INCLUDED VNI: 10000 Path Attribute - ORIGIN: IGP Path Attribute - AS_PATH: empty Path Attribute - LOCAL_PREF: 100 Path Attribute - EXTENDED_COMMUNITIES Flags: 0xc0, Optional, Transitive, Complete Type Code: EXTENDED_COMMUNITIES (16) Length: 16 Carried extended communities: (2 communities) Route Target: 65000:10000 [Transitive 2-Octet AS-Specific] Type: Transitive 2-Octet AS-Specific (0x00) Subtype (AS2): Route Target (0x02) 2-Octet AS: 65000 Local Administrator: 0x00, Type: VID (802.1Q VLAN ID) Service Id: 10000 Encapsulation: VXLAN Encapsulation [Transitive Opaque] Type: Transitive Opaque (0x03) Subtype (Opaque): Encapsulation (0x0c) Tunnel type: VXLAN Encapsulation (8)
Now let’s move on to route type 3.
EVPN route type 3
EVPN route type 3, referred to as Inclusive Multicast Ethernet Tag (IMET) is a route type that confuses many. What is the purpose of this route? This route is used to populate ingress replication list with what VTEPs to forward BUM frames to for a specific VNI. Essentially, we want to populate the list below:
Leaf1# show nve vni ingress-replication Interface VNI Replication List Source Up Time --------- -------- ----------------- ------- ------- nve1 10000 203.0.113.4 BGP-IMET 2d03h
A VTEP will advertise itself interested in a VNI as soon as the VNI is configured and the replication protocol is set to BGP. In a previous posts I covered how to do Static Ingress Replication. This is done with configuration similar to the one below:
interface nve1 no shutdown source-interface loopback1 member vni 1000 ingress-replication protocol static peer-ip 203.0.113.2 peer-ip 203.0.113.3 peer-ip 203.0.113.4
The drawback to this approach is of course that it’s static. If another leaf comes online, all the other leafs need to be updated. This could of course be done via automation, but the new leaf would not be functional until the automation has run. With an EVPN type 3 route, this would be automatic as soon as the leaf has been configured.
Now that we know what this route is for, let’s get deeper by looking at the fields of this route:
These fields should be familiar from route type 2. The only new field is the originating router’s IP address which is pretty self explanatory.
Let’s take a look at route type 3 from the CLI:
Leaf1# show bgp l2vpn evpn vni-id 10000 BGP routing table information for VRF default, address family L2VPN EVPN BGP table version is 11, Local Router ID is 192.0.2.3 Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - best2 Network Next Hop Metric LocPrf Weight Path Route Distinguisher: 192.0.2.3:32777 (L2VNI 10000) *>l[3]:[0]:[32]:[203.0.113.1]/88 203.0.113.1 100 32768 i *>i[3]:[0]:[32]:[203.0.113.4]/88 203.0.113.4 100 0 i
As with route type 2, the output can be a bit confusing. Let’s break it down:
We can also look at the route in further detail:
Leaf1# show bgp l2vpn evpn 203.0.113.4 BGP routing table information for VRF default, address family L2VPN EVPN Route Distinguisher: 192.0.2.3:32777 (L2VNI 10000) BGP routing table entry for [3]:[0]:[32]:[203.0.113.4]/88, version 7 Paths: (1 available, best #1) Flags: (0x000012) (high32 00000000) on xmit-list, is in l2rib/evpn, is not in HW Advertised path-id 1 Path type: internal, path is valid, is best path, no labeled nexthop Imported from 192.0.2.6:32777:[3]:[0]:[32]:[203.0.113.4]/88 AS-Path: NONE, path sourced internal to AS 203.0.113.4 (metric 81) from 192.0.2.11 (192.0.2.1) Origin IGP, MED not set, localpref 100, weight 0 Extcommunity: RT:65000:10000 ENCAP:8 Originator: 192.0.2.6 Cluster list: 192.0.2.1 PMSI Tunnel Attribute: flags: 0x00, Tunnel type: Ingress Replication Label: 10000, Tunnel Id: 203.0.113.4 Path-id 1 not advertised to any peer
This has revealed some new information to us. There is an interesting attribute called PMSI Tunnel Attribute. The term PMSI is probably new to most people that don’t have a background with Multicast VPNs (MVPNs). PMSI stands for Provider Multicast Service Interface. RFC 7432 says the following:
If the PE that originates the advertisement uses ingress replication for the P-tunnel for EVPN, the route MUST include the PMSI Tunnel attribute with the Tunnel Type set to Ingress Replication and the Tunnel Identifier set to a routable address of the PE. The PMSI Tunnel attribute MUST carry a downstream assigned MPLS label. This label is used to demultiplex the broadcast, multicast, or unknown unicast EVPN traffic received over an MP2P tunnel by the PE.
Let’s look at this information highlighted from the CLI:
Don’t get confused with the term PMSI. In its original form it was intended for MPLS networks and the PMSI was a kind of overlay used to forward multicast frames in L3 VPNs. The people involved with EVPN decided to reuse previous work and mostly exchanged the MPLS label for a VXLAN label. Finally, let’s look at a type 3 route from a packet capture:
Frame 276: 166 bytes on wire (1328 bits), 166 bytes captured (1328 bits) on interface ens161, id 0 Ethernet II, Src: 00:ad:70:83:1b:08, Dst: 00:ad:b3:fd:1b:08 Internet Protocol Version 4, Src: 192.0.2.104, Dst: 192.0.2.11 Transmission Control Protocol, Src Port: 43222, Dst Port: 179, Seq: 153, Ack: 267, Len: 100 Border Gateway Protocol - UPDATE Message Marker: ffffffffffffffffffffffffffffffff Length: 100 Type: UPDATE Message (2) Withdrawn Routes Length: 0 Total Path Attribute Length: 77 Path attributes Path Attribute - MP_REACH_NLRI Flags: 0x90, Optional, Extended-Length, Non-transitive, Complete Type Code: MP_REACH_NLRI (14) Length: 28 Address family identifier (AFI): Layer-2 VPN (25) Subsequent address family identifier (SAFI): EVPN (70) Next hop: 203.0.113.4 Number of Subnetwork points of attachment (SNPA): 0 Network Layer Reachability Information (NLRI) EVPN NLRI: Inclusive Multicast Route Route Type: Inclusive Multicast Route (3) Length: 17 Route Distinguisher: 0001c00002068009 (192.0.2.6:32777) Ethernet Tag ID: 0 IP Address Length: 32 IPv4 address: 203.0.113.4 Path Attribute - ORIGIN: IGP Path Attribute - AS_PATH: empty Path Attribute - LOCAL_PREF: 100 Path Attribute - EXTENDED_COMMUNITIES Path Attribute - PMSI_TUNNEL_ATTRIBUTE Flags: 0xc0, Optional, Transitive, Complete Type Code: PMSI_TUNNEL_ATTRIBUTE (22) Length: 9 Flags: 0 Tunnel Type: Ingress Replication (6) VNI: 10000 Tunnel ID: tunnel end point -> 203.0.113.4 Tunnel type ingress replication IP end point: 203.0.113.4
This should all be familiar by now. In this post we learned that:
- Without EVPN, VXLAN networks use flood and learn to learn of hosts and VTEPs.
- EVPN can be used to provide VTEP discovery through the reception of EVPN routes.
- How to configure a L2 VNI on NX-OS.
- That route type 2 can advertise both the MAC address and IP address of a host.
- That route type 3 is used to signal what VTEPs are interested in getting BUM frames for specific L2 VNIs through ingress replication.
I hope this has been informative and see you next time!
Hi Dan,
I have question about how VTEP route the traffic.
For example, there is VTEP1 connects to 2 other VTEP(VTEP2 and VTEP3), all the 3 VTEPS mapped with same VNI. Host1, Host2, Host3 connect to the VTEPs respectively. My understanding is the hosts’ MAC addresses and ip addresses are advertised by VTEPs, if host1 try to reach host2, VTEP1 will search its BGP table to find the correct info and encapsulate the data as vxlan packet. The traffic won’t go to VTEP3 because in BGP table it knows the destination MAC/IP are behind VTEP2.
What if the remote host information does not exist in the BGP table, how can the VTEP get the info?
Hi,
This scenario can sometimes happen when you have a silent host. The VTEP would then have to flood the frame using for example multicast in underlay or ingress replication. The VTEP where the silent host is connected would then advertise MAC or MAC/IP in EVPN RT2.
Hi Daniel,
Thanks for your explanation. I want to ask a question that VTEP device will announce the IP/MAC information via Type-2, only when it get the ARP request? For example we can define destination host arp information as statically in the source host. In this senario, source host does not send any arp request but sends directly traffic to destionation host. In this way, there will not be any arp request in the VTEP but there will a traffic which goes to destionation host in Vtep
In these senario, also will vtep send type 2 host route to the other vteps ?
The T2 can advertise only MAC or MAC and MAC/IP. If there is a host that is only communicating at L2, then I would expect that VTEP has learned MAC of source, but not IP, so it would be a single T2. If the host is also communicating at L3, it would have ARPed for its GW and then the VTEP should learn about IP to MAC binding and be able to advertise that in T2.
I haven’t tried using static ARP binding, but in theory it should behave the same as learning something dynamically.