Before diving into a new technology, it is always useful to understand the previous generation of technology, what the limitations where, and how the new technology intends to overcome them. In this post, let’s look at what some of the challenges were with L2-based networks and how VXLAN/EVPN can overcome them. Before starting, I want to balance the messaging a bit on the bad reputation that STP gets:

  • Radia Perlman did an excellent job with what was available at that time.
  • A lot of the bad reputation comes from a misunderstanding of the protocol.
  • STP-based networks can run just fine but they are often misconfigured (related to the point above).
  • Many issues come from misbehaving end user devices where protection mechanisms have not been implemented (see the point above).
  • It’s natural for technologies to evolve as more compute becomes available and we gain experience.

Keep in mind that the original 802.1D standard was published in 1990. This was long before internet was generally available and our networks were critically important to us. At that time we didn’t measure outages in seconds or even minutes. That said, let’s look at the limitations of a traditional L2 network.

Convergence – In the original 802.1D standard, convergence was slow as ports had to go through blocking, listening, and learning before becoming forwarding. This would take around 50 seconds. This was much improved in 802.1w which uses a synchronization process and can often converge within a couple of seconds or less.

VXLAN/EVPN networks are routed. This means we can leverage technologies like ECMP to have multiple paths installed. If a link fails, convergence is almost instantaneous as there is already another path in the RIB/FIB. This will always be faster than what a L2 network can provide.

Ineffecient use of available links – A L2-based network using STP builds a tree topology. This means that some links have to be blocking traffic to create a loop free topology. This means that not all links can be used and that there are links idling which could have been utilized better. This can be seen as poor use of funds invested in the network.

In a routed network using a Clos, also commonly called leaf and spine topology, there is no need to build a tree topology. All links can be fully utilized and installed as equally good paths using ECMP. This provides a better return on investment.

Suboptimal forwarding – Because there is a tree topology, which is rooted at the Root switch, there may be direct paths between switches that can’t be leveraged as traffic must flow towards the Root.

In VXLAN/EVPN network, traffic can flow optimally as the topology is not a tree. There is only one hop between any two leafs in the topology.

Lack of ECMP – There is no concept of ECMP in L2-based networks. As described before, if multiple links are available between two switches, one will have to be blocked. The only way of overcoming this is to use link aggregation using for example LACP.

We have already established that VXLAN/EVPN supports ECMP.

Network scale – 802.1Q provides 12 bits for the VLAN ID meaning that only 4096 VLANs can be provided. This is not enough in a large DC environment. Note that there have been technologies such as Q-in-Q that were designed to overcome this limitation. It was mainly used in service provider networks, though.

With VXLAN, the Virtual Network Identifier (VNI) is 24 bits, supporting up to 16 million different networks. Far more than 802.1Q can provide.

There are of course more limitations but these are some of the most prevalent ones. I hope you will join me in the next post which will introduce the concepts of VXLAN and EVPN.

VXLAN/EVPN – What Are the Challenges in L2-based Networks?
Tagged on:         

9 thoughts on “VXLAN/EVPN – What Are the Challenges in L2-based Networks?

  • August 10, 2023 at 8:58 am

    This is useful information,thank you so much.

  • August 10, 2023 at 8:11 pm

    Very well summarized. Good job!

  • August 11, 2023 at 4:18 am

    these challenges are very good but one of the biggest challenges is L2 extension. For example,in legacy DCs, you can not use same subnet of one vlan over two switches unless you create a trunk link between them. and if you tried to use L3 link hence, the two devices will be L3 segmented and so you will back to use L2 trunk link again and this leads to STP Loop and port blocking.
    How to solve this? Is by combining benefits of L3 (Which is no STP) and benefit of L2( Which is subnet extension)
    Who can do this ???? Fabric Path and VXLAN but VXLAN is better as it can use more than 4096 VLANs and standard

    • August 11, 2023 at 6:12 am

      L2 extension is often a sign of a poor design where the problem is solved in the wrong layer. Still, in some scenarios it is required and VXLAN/EVPN is a good solution for that. There have been many DCI technologies in the past such as OTV but moving to L3 and running L2 on top is definitely a better solution.

  • August 11, 2023 at 5:16 am

    The technology is great but please how do I convince an Infrastructure manager to shift from a collapsed core data center (DC) to a Spine-and-leaf topology? How do I break it down for him to understand the benefits?My Infrastructure manager loves his different vendor top-of-rack (ToR) as A and B sides so even if we decide to go the EVPN/VXLAN path, I may be forced to deploy it using a multi-vendor approach. This is not common at the moment.
    This is my current worry.

    • August 11, 2023 at 6:22 am

      A network should always be designed based on its requirements. There is no need to use the same topology or technologies everywhere. That said, if building a new DC I don’t see any reason to not build it using Leaf and Spine topology. It provides one hop between any leaf, redundancy, fast convergence, multipathing. By using L3 there is a lower risk for outages that were common in the past where one misbehaving server could take out the entire network.

      I don’t think it makes sense to build with multiple vendors even if using standards. There are always pains when integrating and also means the Engineers must work on different platforms. If using multiple vendors, it should be in different parts of the network.

      Try to use the arguments that were described in the post and that it has become an industry standard to build your DC in this manner. Good luck!

  • August 17, 2023 at 11:08 am

    Very well explained the challenges in layer 2 networks.

    Thank you for writing it, Daniel!


Leave a Reply

Your email address will not be published. Required fields are marked *