Introduction
In todays networks, reliability is critical. Reliability needs to be high and
convergence needs to be fast. There are several ways of detecting network failure
but not all of them scale. This post takes a look at different methods of
detection and discusses when one or the other should be used.
Routing Convergence Components
There are mainly four components of routing convergence:
- Failure detection
- Failure propagation (flooding)
- Topology/Routing recalculation
- Update of the routing and forwarding table (RIB and FIB)
With modern networking networking equipment and CPUs it’s actually the first
one that takes most time and not the flooding or recalculation of the topology.
Failure can be detected at different level of the OSI model. It can be layer 1, 2
or 3. When designing the network it’s important to look at complexity and cost
vs the convergence gain. A more complex solution could increase the Mean Time
Between Failure (MTBF) but also increase the Mean Time To Repair (MTTR) leading
to a lower reliability in the end.
Layer 1 Failure Detection – Ethernet
Ethernet has builtin detection of link failure. This works by sending
pulses across the link to test the integrity of it. This is dependant on
auto negotiation so don’t hard code links unless you must! In the case of
running a P2P link over a CWDM/DWDM network make sure that link failure
detection is still operational or use higher layer methods for detecting
failure.
Carrier Delay
- Runs in software
- Filters link up and down events, notifies protocols
- By default most IOS versions defaults to 2 seconds to suppress flapping
- Not recommended to set it to 0 on SVI
- Router feature
Debounce Timer
- Delays link down event only
- Runs in firmware
- 100 ms default in NX-OS
- 300 ms default on copper in IOS and 10 ms for fiber
- Recommended to keep it at default
- Switch feature
IP Event Dampening
If modifying the carrier delay and/or debounce timer look at implementing IP
event dampening. Otherwise there is a risk of having the interface flap a lot
if the timers are too fast.
Layer 2 Failure Detection
Some layer 2 protocols have their own keepalives like Frame Relay and PPP. This
post only looks at Ethernet.
UDLD
- Detects one-way connections due to hardware failure
- Detects one-way connections due to soft failure
- Detects miswiring
- Runs on any single Ethernet link even inside a bundle
- Typically centralized implementation
UDLD is not a fast protocol. Detecting a failure can take more than 20 seconds so
it shouldn’t be used for fast convergence. There is a fast version of UDLD but this
still runs centralized so it does not scale well and should only be used on a select
few ports. It supports sub second convergence.
Spanning Tree Bridge Assurance
- Turns STP into a bidirectional protocol
- Ensures spanning tree fails “closed” rather than “open”
- If port type is “network” send BPDU regardless of state
- If network port stops receiving BPDU it’s put in BA-inconsistent state
Bridge Assurance (BA) can help protect against bridging loops where a port becomes
designated because it has stopped receiving BPDUs. This is similar to the function
of loop guard.
LACP
It’s not common knowledge that LACP has builtin mechanisms to detect failures.
This is why you should never hardcode Etherchannels between switches, always
use LACP. LACP is used to:
- Ensure configuration consistence across bundle members on both ends
- Ensure wiring consistency (bundle members between 2 chassis)
- Detect unidirectional links
- Bundle member keepalive
LACP peers will negotiate the requested send rate through the use of PDUs.
If keepalives are not received a port will be suspended from the bundle.
LACP is not a fast protocol, default timers are usually 30 seconds for keepalive
and 90 seconds for dead. The timer can be tuned but it doesn’t scale well if you
have many links because it’s a control plane protocol. IOS XR has support for
sub second timers for LACP.
Layer 3 Failure Detection
There are plenty of protocol timers available at layer 3. OSPF, EIGRP, ISIS,
HSRP and so on. Tuning these from their default values is common and many of
these protocols support sub second timers but because they must run to the
RP/CPU they don’t scale well if you have many interfaces enabled. Tuning these
timers can work well in small and controlled environments though. These are
some reasons to not tune layer 3 timers too low:
- Each interface may have several protocols like PIM, HSRP, OSPF running
- Increased supervisor CPU utilization leading to false positives
- More complex configuration and bandwidth wasted
- Might not support ISSU/SSO
BFD
Bidirectional Forwarding Detection (BFD) is a lightweight protocol designed to
detect liveliness over links/bundles. BFD is:
- Designed for sub second failure detection
- Any interested client (OSPF, HSRP, BGP) registers with BFD and is notified when BFD detects loss
- All registered clients benefit from uniform failure detection
- Uses UDP port 3784/3785 (echo)
Because any interested protocol can register with BFD there are less packets
going across the link which means less wasting of bandwidth and the packets
are also smaller in size which reduces this even more.
Many platforms also support offloading BFD to line cards which means that the
CPU does not get increased load when BFD is enabled. It also supports ISSU/SSO.
BFD negotiates the transmit and receive interval. If we have a router R1
that wants to transmit at 50 ms interval but R2 can only receive at 100 ms
then R1 has to transmit at 100ms interval.
BFD can run in asynchronous mode or echo mode. In asynchronous mode the BFD
packets go to the control plane to detect liveliness. This can also be combined
with echo mode which sends a packet with a source and destination IP of the
sending router itself. This way the packet is looped back at the other end
testing the data plane. When echo mode is enabled the control plane packets
are sent at a slower pace.
Link bundles
There can be challenges running BFD over link bundles. Due to CEF polarization
control plane/data plane packets might only be sent over the same link. This
means that not all links in the bundle can be properly tested. There is
a per link BFD mode but it seems to have limited support so far.
Event Driven vs Polled
Generally event driven mechanisms are both faster and scale better than polling
based mechanisms of detecting failure. Rely on event driven if you have the option
and only use polled mechanisms when neccessary.
Conclusion
Detecting a network failure is a very important part of network convergence. It
is generally the step that takes the most time. Which protocols to use depends
on network design and the platforms used. Don’t enable all protocols on a link
without knowing what they actually do. Don’t tune timers too low unless you
know why you are tuning them. Use BFD if you can as it is faster and uses
less resources. For more information refer to BRKRST-2333.
Neat summary. It’d be nice to hear a bit more elaboration on Ethernet OAM and 802.1ag. AFAIK OAM and autonegotiation may be mutually exclusive (http://www.cisco.com/en/US/products/hw/routers/ps368/module_installation_and_configuration_guides_chapter09186a0080523f3c.html)
Thanks Michael!
There will be a separate post on Ethernet OAM but I need to read up on it more first.
Good brief.. When the complexity increase MTBF would be reduced , right as oppose to MTBR or MTBM 🙂
Thanks. I worded it a bit bad. What I meant was that by increasing redundancy MTBF could be increased but if we add too much redundancy then the network is not deterministic and MTTR may increase. So you are taking the lab soon right?
Correct, adding a back to back links decrease , adding parallel links for redundancy ( optimal 2 ) generally increase MTBF. Since I need to design 5 9s nowadays for the customer , I am very careful on it : ) . Yes It has been scheduled November 22 in Chicago. What about you ?
Couple of quick questions
> What about Layer 1 protection mechanism ?
> How to balance between NSF and BFD as selection. Where NSF says Route Through vs BFD as Route around. So does that mean go for NSF if single home and BFD if dual home ?
> There are couple of line cards there I have seen don’t understand loss of carrier. So carrier delay doesn’t work
> While LACP is an excellent choice at high level. I encounter lots of issues while trying to bundle two metro links between two buildings in campus under cross stack environment. Both links were from different providers and one of them didn’t allow passing LACP packets somehow and it end up being suspended
> Problem with multilink at some point is if your physical interfaces have different latency for example. So buffer depth tunning is another aspect there
Perhaps we should try to put up a dummy case study using all these and present to CCDE group and lets see what existing CCDEs have to say ? 🙂
Pingback:Network Campus Design | Daniels networking blog
Pingback:Fast Convergence and the Fast Reroute - Definitions/Design Considerations in IP and MPLS | Cisco Network Design and Architecture | CCDE Bootcamp | orhanergun.net
Pingback:CCDE Success: References Used – localpref.net