Someone asked the other day how fast STP converges depending on PVST+ or
RPVST+ or MST is running. Usually the answer for PVST+ is 30-50 seconds
and for RPVST+ it’s fast, maybe less than a second. I thought I would
explore on this and check difference between PVST+ and RPVST+ and also
using PVST+ with features like uplinkfast.

This post assumes you already have a good basic understanding of STP. This
is not an introductory post on STP.

This is the topology being used:

STP-convergence

SW1 is the root and ports towards the routers have been configured with VLAN 23
and portfast. I will run NTP to have the clocks properly synchronized. Currently
the port roles look like this:

STP-port-roles

I will configure the routers in 23.23.23.0/24 subnet and do a ping to verify connectivity.

R2#ping 23.23.23.3

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 23.23.23.3, timeout is 2 seconds:
.!!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 1/3/4 ms

Working fine so far. Now let’s take a look at some different failure scenarios.
We turn on logging to a buffer to not flood the console. We will be looking at
spanning tree events.

SW1(config)#logging con 6
SW1(config)#logging buff 7
SW1(config)#logging buff 32768
SW1(config)#do debug spanning-tree events
Spanning Tree event debugging is on

What happens when the root port is shutdown? In theory when the carrier detects
that the link is down it should look at alternate BPDU and start to take that
port through the different port states. This should take around 30 seconds.

This is output from SW2.

May  7 10:27:03.314: STP: VLAN0023 new root port Fa0/16, cost 38
May  7 10:27:18.321: STP: VLAN0023 Fa0/16 -> learning
May  7 10:27:33.329: STP: VLAN0023 sent Topology Change Notice on Fa0/16
May  7 10:27:33.329: STP: VLAN0023 Fa0/16 -> forwarding

The timing is almost perfect. The port goes through listening and learning
at 15 seconds each before it goes to forwarding almost exactly 30 seconds after
the port was shutdown.

What happens when there is an indirect failure? The switch has to expire the root BPDU
before it believes other BPDUs with worse cost. This should take around 20 seconds. By
default Maxage will be set to 20 seconds.

SW1#sh span | i Age
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
SW2#sh span int f0/13 det | i age
   Timers: message age 1, forward delay 0, hold 0

We will this time simulate a passive error by configuring BPDU filter on SW1 towards
SW2.

SW1(config-if)#span bpdufilter enable   
SW1(config-if)#do sh clock
10:39:05.598 UTC Tue May 7 2013

This has created a bridging loop but in this case we just want to see how long it
takes before the alternate port is coming up as root.

May  7 10:39:24.046: STP: VLAN0023 new root port Fa0/16, cost 38
May  7 10:39:24.046: STP: VLAN0023 Fa0/16 -> listening
May  7 10:39:39.053: STP: VLAN0023 Fa0/16 -> learning
May  7 10:39:54.061: STP: VLAN0023 sent Topology Change Notice on Fa0/16
May  7 10:39:54.061: STP: VLAN0023 Fa0/16 -> forwarding

So it took almost 20 seconds for the BPDU to expire. Then the port goes through
the ordinary state changes. Roughly 48.5 seconds after the filter was applied
the port went into forwarding. For passive failures when running PVST+ the
maximum recovery time should be 50 seconds.

Now let’s look at PVST+ with Uplinkfast configured. The theory is that when a
root port fails the Alternate port should be bypass listening and learning
states and go direct to forwarding. Let’s try this out.

SW2(config)#spanning-tree uplinkfast
May  7 10:46:37.260: STP: VLAN0023 new root port Fa0/16, cost 3038
May  7 10:46:38.249: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/13, changed state to down
May  7 10:46:39.264: %LINK-3-UPDOWN: Interface FastEthernet0/13, changed state to down
May  7 10:46:39.264: STP: VLAN0023 sent Topology Change Notice on Fa0/16

It took only 2 seconds from realizing the port was down to putting the alternate
port into forwarding. For PVST+ this is a great enhancement. What if there is
a passive error?

SW1(config-if)#span bpdufilter enable
SW1(config-if)#do sh clock
10:51:11.870 UTC Tue May 7 2013
May  7 10:51:30.216: STP: VLAN0023 new root port Fa0/16, cost 3038
May  7 10:51:30.216: STP: VLAN0023 sent Topology Change Notice on Fa0/16

There is nothing to be done about the Maxage expiring but the port is
brought up after that. So instead of 50 seconds it takes about 20 seconds.

That’s it for PVST+. Now let’s move on to RPVST+. RPVST+ works by synchronizing
the topology and it has optimizations builtin. If a port fails then it should
converge almost instantly.

May  7 10:56:34.421: RSTP(1): updt roles, root port Fa0/13 going down
May  7 10:56:34.421: RSTP(1): Fa0/16 is now root port
May  7 10:56:34.421: RSTP(1): syncing port Fa0/4
May  7 10:56:34.421: RSTP(1): syncing port Fa0/6
May  7 10:56:34.421: RSTP(1): syncing port Fa0/24
May  7 10:56:34.421: RSTP(23): updt roles, root port Fa0/13 going down
May  7 10:56:34.421: RSTP(23): Fa0/16 is now root port
May  7 10:56:34.438: RSTP(1): transmitting a proposal on Fa0/4
May  7 10:56:34.438: RSTP(1): transmitting a proposal on Fa0/6
May  7 10:56:34.438: RSTP(1): transmitting a proposal on Fa0/24
May  7 10:56:35.419: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/13, changed state to down
May  7 10:56:35.578: RSTP(1): transmitting a proposal on Fa0/4
May  7 10:56:35.578: RSTP(1): transmitting a proposal on Fa0/6
May  7 10:56:35.578: RSTP(1): transmitting a proposal on Fa0/24
May  7 10:56:36.434: %LINK-3-UPDOWN: Interface FastEthernet0/13, changed state to down

It instantly failovers to the Alternate port and then starts synchronizing
the topology by sending out proposals. What if there was a passive failure?
In theory after RPVST+ misses 3 BPDUs it should realize that it needs to
start using the alternate path. Let’s try it out.

SW1(config-if)#span bpdufilter enable
SW1(config-if)#do sh clock
11:01:12.960 UTC Tue May 7 2013
May  7 11:01:16.648: RSTP(1): Fa0/13 rcvd info expired
May  7 11:01:16.648: RSTP(1): updt roles, information on root port Fa0/13 expired
May  7 11:01:16.648: RSTP(1): Fa0/16 is now root port
May  7 11:01:16.648: RSTP(1): Fa0/13 blocked by re-root
May  7 11:01:16.648: RSTP(1): syncing port Fa0/4
May  7 11:01:16.648: RSTP(1): syncing port Fa0/6
May  7 11:01:16.648: RSTP(1): syncing port Fa0/24
May  7 11:01:16.648: RSTP(1): Fa0/13 is now designated
May  7 11:01:16.648: RSTP(23): Fa0/13 rcvd info expired
May  7 11:01:16.648: RSTP(23): updt roles, information on root port Fa0/13 expired
May  7 11:01:16.648: RSTP(23): Fa0/16 is now root port
May  7 11:01:16.648: RSTP(23): Fa0/13 blocked by re-root
May  7 11:01:16.648: RSTP(23): Fa0/13 is now designated

Already around 4 seconds later the topology has converged. It should take
maximum 6 seconds depending on when the last BPDU was received before the
failure.

As you can see it’s very important to detect carrier down. If you do detect it
and are running RPVST+ then convergence is almost immediate. So when designing your
network try to avoid use fiber converts and such that won’t shut down the RJ45 side
if the optical goes down. Designing for convergence is just not about protocols, you
also need to consider the physical infrastructure.

I hope this post has given you a good insight to the convergence of STP.

Spanning tree convergence
Tagged on:                     

10 thoughts on “Spanning tree convergence

  • May 7, 2013 at 11:15 am
    Permalink

    An even nicer comparison would be with standards based MST (rather than proprietary based RPVST+)

    Reply
    • May 7, 2013 at 11:27 am
      Permalink

      MST uses RSTP though. Not just per VLAN. So in theory the convergence should be the same. I could test if there is any difference between the CIST and the ISTs though.

      Reply
    • May 11, 2013 at 8:20 pm
      Permalink

      Thanks Jochen! This was just labbing. I just wanted to drive the point home that convergence is just not about protocols. You need to consider it from day one when you do your design. Hope you are enjoying your new digits 🙂

      Reply
  • May 29, 2013 at 9:09 am
    Permalink

    “There is nothing to be done about the Maxage expiring but the port is
    brought up after that. So instead of 50 seconds it takes about 20 seconds.”

    what about backbone fast ? 😉 the only piece missing in a great article

    Reply
    • May 29, 2013 at 9:25 am
      Permalink

      Yeah, backbonefast could be used for such situations. I’m preparing a lecture on STP for a group. Once it’s done and I have presented it I might make it public.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *