I ran into an “exciting” bug yesterday. It was seen in a 4500-X VSS pair running 3.7.0 code. When there has been a switchover meaning that the secondary switch became active, there’s a risk that information is not properly synced between the switches. What we were seeing was that this VSS pair was “eating” the packets, essentially black holing them. Any multicast that came into the VSS pair would not be properly forwarded even though the Outgoing Interface List (OIL) had been properly built. Everything else looked normal, PIM neighbors were active, OILs were good (except no S,G), routing was there, RPF check was passing and so on.

It turns out that there is a bug called CSCus13479 which requires CCO login to view. The bug is active when Portchannels are used and PIM is run over them. To see if an interface is misbehaving, use the following command:

hrn3-4500x-vss-01#sh platfo hardware rxvlan-map-table vl 200 <<< Ingress port

Executing the command on VSS member switch role = VSS Active, id = 1


Vlan 200:
l2LookupId: 200
srcMissIgnored: 0
ipv4UnicastEn: 1
ipv4MulticastEn: 1 <<<<<
ipv6UnicastEn: 0
ipv6MulticastEn: 0
mplsUnicastEn: 0
mplsMulticastEn: 0
privateVlanMode: Normal
ipv4UcastRpfMode: None
ipv6UcastRpfMode: None
routingTableId: 1
rpSet: 0
flcIpLookupKeyType: IpForUcastAndMcast
flcOtherL3LookupKeyTypeIndex: 0
vlanFlcKeyCtrlTableIndex: 0
vlanFlcCtrl: 0


Executing the command on VSS member switch role = VSS Standby, id = 2


Vlan 200:
l2LookupId: 200
srcMissIgnored: 0
ipv4UnicastEn: 1
ipv4MulticastEn: 0 <<<<<
ipv6UnicastEn: 0
ipv6MulticastEn: 0
mplsUnicastEn: 0
mplsMulticastEn: 0
privateVlanMode: Normal
ipv4UcastRpfMode: None
ipv6UcastRpfMode: None
routingTableId: 1
rpSet: 0
flcIpLookupKeyType: IpForUcastAndMcast
flcOtherL3LookupKeyTypeIndex: 0
vlanFlcKeyCtrlTableIndex: 0
vlanFlcCtrl: 0

From the output you can see that "ipv4MulticastEn" is set to 1 on one switch and 0 to the other one. The state has not been properly synched or somehow misprogrammed which leads to this issue with black holing multicast. It was not an easy one to catch so I hope this post might help someone.

This also shows that there are always drawbacks to clustering so weigh the risk of running in systems in clusters and having bugs affecting both devices as opposed to running them stand alone. There's always a tradeoff between complexity, topologies and how a network can be designed depending on your choice.

Nasty Multicast VSS bug on Catalyst 4500-X
Tagged on:             

3 thoughts on “Nasty Multicast VSS bug on Catalyst 4500-X

  • September 27, 2015 at 1:03 pm
    Permalink

    Interesting, I already got this problem without knowing that it was a bug. What did you do to enable the multicast on the second switch without upgrading and/or reloading the box ?
    Thanks !

    Reply
    • September 27, 2015 at 1:08 pm
      Permalink

      Silly me to forget adding the fix. Remove PIM from interface and then put it back again. I’ll add this to the post.

      Reply
      • September 27, 2015 at 8:26 pm
        Permalink

        Heh, thanks for this super fast answer ! :-p

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *