When deploying PIM ASM, the Designated Router (DR) role plays a significant part in how PIM ASM works. The DR on a segment is responsible for registering mulicast sources with the Rendezvous Point (RP) and/or sending PIM Joins for the segment. Routers with PIM enabled interfaces send out PIM Hello messages every 30 seconds by default.
After missing three Hellos the secondary router will take over as the DR. With the standard timer value, this can take between 60 to 90 seconds depending on when the last Hello came in. Not really acceptable in a modern network.
The first thought is to lower the PIM query interval, this can be done and it supports sending PIM Hellos at msec level. In my particular case I needed convergence within two seconds. I tuned the PIM query interval to 500 msec meaning that the PIM DR role should converge within 1.5 seconds. The problem though is that these Hellos are sent at process level. Even though my routers were barely breaking a sweat CPU wise I would see PIM adjacencies flapping.
The answer to my problems would be to have Bidirectional Forwarding Dectection (BFD) for PIM but it’s only supported on a very limited set of platforms. I already have BFD running for OSPF and BGP but unfortunately it’s not supported for PIM. The advantage of BFD is that the Hellos are more light weight and they are sent through interrupt instead of process level. This provides more deterministic behavior than than the regular PIM Hellos.
So how did I solve my problem? I need something that detects failure, I need BFD. Hot Standby Routing Protocol (HSRP) detects failures, HSRP has support for BFD. I could then use HSRP to detect the failure and act on the Syslog message generated by HSRP. Even though I didn’t really need HSRP on that segment it helped me in moving the PIM DR role which I wrote this Embedded Event Manager (EEM) applet for. A thank you to Peter Paluch for providing this idea and support 🙂
The configuration of the interface is this:
interface GigabitEthernet0/2.100 description *** Receiver LAN *** encapsulation dot1Q 100 ip address 10.0.100.3 255.255.255.0 no ip redirects no ip unreachables no ip proxy-arp ip pim sparse-mode standby version 2 standby 1 ip 10.0.100.1 standby 1 preempt delay reload 180 standby 1 name HSRP-1 bfd interval 300 min_rx 300 multiplier 3
BFD is sending Hellos every 300 msec so it will converge within 900 msec. The key is then to find the Syslog message that HSRP generates when it detects a failure. These messages look like this:
%HSRP-5-STATECHANGE: GigabitEthernet0/2.100 Grp 1 state Standby -> Active %HSRP-5-STATECHANGE: GigabitEthernet0/2.100 Grp 1 state Speak -> Standby
It is then possible to write an EEM applet acting on this message and setting the DR priority on the secondary router.
event manager applet CHANGE-DR-UP-RECEIVER event syslog pattern "%HSRP-5-STATECHANGE: GigabitEthernet0/2.100 Grp 1 state Standby -> Active" action 1.0 syslog msg "Changing DR on interface Gi0/2.100 due to AR is DOWN" action 1.1 cli command "enable" action 1.2 cli command "conf t" action 1.3 cli command "interface gi0/2.100" action 1.4 cli command "ip pim dr-priority 100" action 1.5 cli command "end"
When HSRP has detected the failure, the EEM apple will trigger very quickly and set the priority.
116072: Nov 20 13:03:04.544 UTC: %HSRP-5-STATECHANGE: GigabitEthernet0/2.100 Grp 1 state Standby -> Active 116080: Nov 20 13:03:04.552 UTC: %HA_EM-6-LOG: CHANGE-DR-UP-RECEIVER : DEBUG(cli_lib) : : CTL : cli_open called. 116120: Nov 20 13:03:04.604 UTC: PIM(0): Changing DR for GigabitEthernet0/2.100, from 10.0.100.2 to 10.0.100.3 (this system) 116121: Nov 20 13:03:04.604 UTC: %PIM-5-DRCHG: DR change from neighbor 10.0.100.2 to 10.0.100.3 on interface GigabitEthernet0/2.100
It took 60 msec from HSRP detecting the failure through BFD until the DR role had converged. It’s then possible to recover from a failure within a second.
It’s also important to set the DR priority back after the network converges. We use another applet for this:
event manager applet CHANGE-DR-DOWN-RECEIVER event syslog pattern "%HSRP-5-STATECHANGE: GigabitEthernet0/2.100 Grp 1 state Speak -> Standby" action 1.0 syslog msg "Changing DR on interface Gi0/2.100 due to AR is UP" action 1.1 cli command "enable" action 1.2 cli command "conf t" action 1.3 cli command "interface gi0/2.100" action 1.4 cli command "no ip pim dr-priority 100" action 1.5 cli command "end"
This works very well. There are some considerations when running EEM. Firstly, if you are running AAA then the EEM applet will fail authorization. This can be bypassed with the following command:
event manager applet CHANGE-DR-UP-RECEIVER authorization bypass
It’s also important to note that the EEM applet will use a VTY line when executing so make sure that there are available VTY’s when the applet runs.
After the PIM DR role has converged, the router will send out a PIM Join and the multicast will start flowing to the receiver.