This tuesday our primary Internet provider had a major outage. We recently got a backup but it was not implemented yet but I had to implement it in a hurry to get us back online. I had prepared most of the config but not everything had been tested. To check if the primary connection is up I ping an IP-address which is only reachable through the primary provider. The config looks like this.
ip sla 1
icmp-echo 1.1.1.1 source-interface vlan 200
timeout 500
frequency 3
This will send an ICMP echo packet every 3 seconds. If a reply is not received within 500 ms it will be considered to have timed out. If you want to force the traffic to originate from a specific interface use the source-interface option.
We need to schedule the IP SLA operation to run.
ip sla schedule 1 life forever start-time now
Then we need to track the SLA operation.
track 1 rtr 1 reachability
Then we need to configure our routes to be dependant on the SLA operation.
ip route 0.0.0.0 0.0.0.0 2.2.2.2 track 1
ip route 0.0.0.0 0.0.0.0 3.3.3.3 254
Traffic will be sent to 2.2.2.2 while the SLA operation is successful. If it fails the static route will be removed from the routing table and the floating static route will be installed instead.
What I noticed while implementing this is that when the primary came back up the routing would not switch back. I later realized why. When the echo was sent from the backup it was following the backup default route and could therefore not reach its destination. I had configure a static /32 route for 1.1.1.1 which always exits out the primary interface so that the primary can come backup when the echo is successful. To do a succesful implementation of IP SLA you should have an address that is only reachable on way and that is reliable, maybe a loopback on a router or something like that. You also want to ping as far in to the net as possible, just pinging your next-hop won’t help if there is a problem further away from you. Anyone else running IP SLA for failover?
Please note that all IP-addresses have been switched out from the real ones.
I use it, mainly for HSRP deployments.
http://mellowd.co.uk/ccie/?p=827
It’s pretty awesome
Different operations may have different return-code values, so only values common to all operation types are used. The below table shows the track states as per the IP SLA return code.