I was contacted by some people at Noction and asked if I was interested in writing about their platform, the Intelligent Routing Platform (IRP). Since it’s a product that uses Border Gateway Protocol (BGP), it peaked my interest. First let’s make the following things clear:
- I am not being paid to write this blog post
- My opinions can’t be bought
- I will only write about a product if it’s something that interests me
BGP is the glue of the Internet (with DNS) and what keeps everything running. BGP is a well designed and scalable protocol which has been around for a long time. It has grown from carrying a few hundred routes to half a million routes. However, there will always be use cases where BGP might not fit your business model.
In Noction’s white paper they define the following as the network’s major challenges:
- Meeting the customer’s demand for 100% uptime
- Facing the low latency requirement
- Achieving reliable data transmission
- Avoiding network congestion and blackouts
- Achieving consistency of throughput
- Keeping bandwidth usage below predefined commit levels
- Reducing the cost and time of network troubleshooting
The product is designed for multihomed networks running BGP. You can’t optimize network flows if you don’t have any other paths to switch to. Some of these challenges apply to all networks and some may be a bit more local. As an example, in Sweden (where I live), you usually pay a fixed amount for your bandwidth and you can use that all you want without going above some threshold defined by the Service Provider (SP).
So why do we have these challenges? Is it BGP’s fault? BGP has a lot of knobs but they are quite blunt tools. We need to keep in mind that BGP runs between organizations and every organization must make their own decisions on how to forward traffic. This means that there is no end to end policy to optimize the traffic flowing across these organizations.
If history has learned us anything, it is that protocols that try to keep too much state will eventually fail or hit scaling limitations. These protocols seem very intelligent and forward thinking at first but as soon as they hit large scale, the burden becomes too much. One such protcol is Resource Reservation Protocol (RSVP). BGP’s design is what has kept the Internet running for decades, this would not be the case if we were to inject all kind of metrics, latencies, jitter etc for all of the Network Layer Reachability Information (NLRI). As communities have grown more popular there could be a use case where information is tagged along as communities for the NLRI. The question is then, how often do we update the communities?
Does this mean that these are not real challenges or that there is no room for a product like Noction IRP? No, it means that unique forwarding decisions and intelligence needs to be kept at the edge of the network, not in the core. We should keep as little state as possible in the core for networks that need high availability.
How does BGP select which routes are the best? The default is to simply look at the AS-path:, the shorter AS-path, the better. Meaning that the traffic will pass through as few organizations as possible. This does not however give any consideration to how much bandwidth is available, nor takes into account latency and jitter of the path and the availability of the path.
How does this product work? The following picture shows the key components of IRP:
There is a collector that passively analyzes the traffic flowing to see which prefixes are being used the most, between which endpoints is the traffic flowing and so on. The collector can gather this data from a mirror port or preferably from Netflow/sFlow.
The Explorer will actively probe relevant prefixes for metrics such as latency, jitter and packet loss. This data is then sent to the Core.
The Core is based on the data received from the Explorer calculating improvements to optimize metrics such as latency, jitter and packet loss or the most cost effective path. These improvements are sent to the BGP daemon which will advertise BGP Updates to the edge router(s).
IRP is non-intrusive and does not sit in the data path. If IRP were to fail, traffic would fall back to their normal paths following the shortest AS-path or any other policies defined on the edge router. IRP can also act in BGP non-intrusive mode where it will report potential improvements without applying them.
If we pause here for a second, this sounds a lot like Performance Routing (PfR), doesn’t it? So what value would IRP add that PfR does not? I see mainly two benefits here. PfR may require a more senior network administrator to setup and administer, however PfR has been greatly simplified in later releases. The other main factor is the reporting through the frontend. PfR does not give you the monitoring platform, which is not to be expected of course.
When you login to the IRP you get a dashboard showing the status of the system and the number of prefixes being probed and how many of those prefixes are being improved.
In the demo, there are two service providers called “SwiftWay” and “FiberRing”. There is a graph to show how many prefixes have been rerouted to one of the providers.
There is also a list that shows you which prefixes were moved, what’s the AS number and the reason for being moved. If you do a mouseover on the flash symbol, it will show if the improvement was due to loss or latency.
There are a lot of different reports that can be generated. A nice feature is that all reports are exportable to CSV, XLS or PDF.
This report shows how loss has been improved: 75% of loss was totally avoided and 25% of loss was reduced.
There are also graphs showing top usage of traffic by AS or, as in this case, the bandwidth used per provider.
The monitoring and reports are extensive and easy to use. The IRP is certainly an interesting platform and depending on the business case it could be very useful. The main considerations would be how sensitive are you to loss and latency? How much does it cost you if you are not choosing the most optimal path? Do you trust a system to make these decisions for you? If you do, then certainly take a look at the Noction IRP.