This tuesday our primary Internet provider had a major outage. We recently got a backup but it was not implemented yet but I had to implement it in a hurry to get us back online. I had prepared most of the config but not everything had been tested. To check if the primary connection is up I ping an IP-address which is only reachable through the primary provider. The config looks like this.
This will send an ICMP echo packet every 3 seconds. If a reply is not received within 500 ms it will be considered to have timed out. If you want to force the traffic to originate from a specific interface use the source-interface option.
We need to schedule the IP SLA operation to run.
Then we need to track the SLA operation.
Then we need to configure our routes to be dependant on the SLA operation.
Traffic will be sent to 18.104.22.168 while the SLA operation is successful. If it fails the static route will be removed from the routing table and the floating static route will be installed instead.
What I noticed while implementing this is that when the primary came back up the routing would not switch back. I later realized why. When the echo was sent from the backup it was following the backup default route and could therefore not reach its destination. I had configure a static /32 route for 22.214.171.124 which always exits out the primary interface so that the primary can come backup when the echo is successful. To do a succesful implementation of IP SLA you should have an address that is only reachable on way and that is reliable, maybe a loopback on a router or something like that. You also want to ping as far in to the net as possible, just pinging your next-hop won’t help if there is a problem further away from you. Anyone else running IP SLA for failover?
Please note that all IP-addresses have been switched out from the real ones.