Someone asked the other day how fast STP converges depending on PVST+ or
RPVST+ or MST is running. Usually the answer for PVST+ is 30-50 seconds
and for RPVST+ it’s fast, maybe less than a second. I thought I would
explore on this and check difference between PVST+ and RPVST+ and also
using PVST+ with features like uplinkfast.

This post assumes you already have a good basic understanding of STP. This
is not an introductory post on STP.

This is the topology being used:

STP-convergence

SW1 is the root and ports towards the routers have been configured with VLAN 23
and portfast. I will run NTP to have the clocks properly synchronized. Currently
the port roles look like this:

STP-port-roles

I will configure the routers in 23.23.23.0/24 subnet and do a ping to verify connectivity.

Working fine so far. Now let’s take a look at some different failure scenarios.
We turn on logging to a buffer to not flood the console. We will be looking at
spanning tree events.

What happens when the root port is shutdown? In theory when the carrier detects
that the link is down it should look at alternate BPDU and start to take that
port through the different port states. This should take around 30 seconds.

This is output from SW2.

The timing is almost perfect. The port goes through listening and learning
at 15 seconds each before it goes to forwarding almost exactly 30 seconds after
the port was shutdown.

What happens when there is an indirect failure? The switch has to expire the root BPDU
before it believes other BPDUs with worse cost. This should take around 20 seconds. By
default Maxage will be set to 20 seconds.

We will this time simulate a passive error by configuring BPDU filter on SW1 towards
SW2.

This has created a bridging loop but in this case we just want to see how long it
takes before the alternate port is coming up as root.

So it took almost 20 seconds for the BPDU to expire. Then the port goes through
the ordinary state changes. Roughly 48.5 seconds after the filter was applied
the port went into forwarding. For passive failures when running PVST+ the
maximum recovery time should be 50 seconds.

Now let’s look at PVST+ with Uplinkfast configured. The theory is that when a
root port fails the Alternate port should be bypass listening and learning
states and go direct to forwarding. Let’s try this out.

It took only 2 seconds from realizing the port was down to putting the alternate
port into forwarding. For PVST+ this is a great enhancement. What if there is
a passive error?

There is nothing to be done about the Maxage expiring but the port is
brought up after that. So instead of 50 seconds it takes about 20 seconds.

That’s it for PVST+. Now let’s move on to RPVST+. RPVST+ works by synchronizing
the topology and it has optimizations builtin. If a port fails then it should
converge almost instantly.

It instantly failovers to the Alternate port and then starts synchronizing
the topology by sending out proposals. What if there was a passive failure?
In theory after RPVST+ misses 3 BPDUs it should realize that it needs to
start using the alternate path. Let’s try it out.

Already around 4 seconds later the topology has converged. It should take
maximum 6 seconds depending on when the last BPDU was received before the
failure.

As you can see it’s very important to detect carrier down. If you do detect it
and are running RPVST+ then convergence is almost immediate. So when designing your
network try to avoid use fiber converts and such that won’t shut down the RJ45 side
if the optical goes down. Designing for convergence is just not about protocols, you
also need to consider the physical infrastructure.

I hope this post has given you a good insight to the convergence of STP.

Spanning tree convergence
Tagged on:                     

10 thoughts on “Spanning tree convergence

  • May 7, 2013 at 11:15 am
    Permalink

    An even nicer comparison would be with standards based MST (rather than proprietary based RPVST+)

    Reply
    • May 7, 2013 at 11:27 am
      Permalink

      MST uses RSTP though. Not just per VLAN. So in theory the convergence should be the same. I could test if there is any difference between the CIST and the ISTs though.

      Reply
    • May 11, 2013 at 8:20 pm
      Permalink

      Thanks Jochen! This was just labbing. I just wanted to drive the point home that convergence is just not about protocols. You need to consider it from day one when you do your design. Hope you are enjoying your new digits 🙂

      Reply
  • May 29, 2013 at 9:09 am
    Permalink

    “There is nothing to be done about the Maxage expiring but the port is
    brought up after that. So instead of 50 seconds it takes about 20 seconds.”

    what about backbone fast ? 😉 the only piece missing in a great article

    Reply
    • May 29, 2013 at 9:25 am
      Permalink

      Yeah, backbonefast could be used for such situations. I’m preparing a lecture on STP for a group. Once it’s done and I have presented it I might make it public.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: