First packet to a specific destination IP is process switched. With the first packet the router adds an entry in the fast switch cache, also called route cache. The cache contains the destination IP, data link header information and the next hop. The next packet to the same desination will hit the cache and be fast switched.
Cisco Express Forwarding (CEF)
CEF has a construct called the Forwarding Information Base (FIB) where the best routes from the Routing Information Base (RIB) end up. The FIB is used for forwarding packets. The CEF table is designed as a mtrie which reduces the time needed to lookup a packet. CEF also uses an adjacency table with information needed to create the data link header and trailer and the outgoing interface. The FIB has a pointer to the adjacency table. CEF is enabled globally with the ip cef command. To turn off CEF per interface use the no ip route-cache cef command.
Uses ARP, inverse ARP and other sources to find out layer tree to layer two mappings. After lookup has been done in FIB the information in the adjacency table is needed to build the header and trailer for the layer two protocol in use.
Used with frame-relay. Data Link Connection Identifier (DLCI) is already know but what IP address does the other side have? This is unknown information and is discovered via inverse ARP or statically entered. After receiving a PVC UP message with Local Management Interface (LMI) each router announces its IP over the Virtual Circuit (VC). If LMI is disabled nothing will trigger the inverse ARP process. Point-to-point interfaces ignore InARP information since there is only one way the traffic can be sent on a point-to-point interface.
Performance routing (PfR)
Originally named Optimized Edge Routing (OER) but Cisco added functionality and renamed it PfR. Can take into account the following information:
- Packet loss
- Response time
- Path availabiliy
- Traffic load distribution
PfR uses a five phase operational model:
Profile – Learn the flows of traffic that have high latency or high throughput
Measure – Passively/actively collect traffic performance metrics
Apply policy – Create low and high thresholds to define in-policy and out-of-policy (OOP) performance categories
Control – Influence traffic by manipulating routing or in conjunction with PBR
Verify – Measure OOP event performance and adjust policy to bring performance in-policy
PfR learns about network performance using IP SLA and Netflow features (one or both). Requirements for running PfR:
- CEF must be enabled
- IGP/BGP routing must be configured and working
- PfR does not support MPLS
Device roles in PfR
Master Controller (MC)
Configured using the oer master command, this device is the decision maker in the cluster of PfR routers. Learns information from the border routers and makes configuration decisions for the network based on this information.
Border Router (BR)
Configured with the oer border command. Provides information to the master and accepts commands from the MC.
It is possible for a router to hold both roles. BR and MC routers mantain communication using keepalives. If keepalives from the MC stops the BR will remove PfR configuration and return to its pre PfR state. More than one MC can be used for failover purposes. PfR traffic classes can be defined by IP address, protocol, port numbers or even DSCP markings
Generic Routing Encapsulation (GRE)
Method for tunneling data from one router to another. Can be used to tunnel multicast and other protocols. The tunnel destination address must be known over something that is not the tunnel itself like a static route.
Address Resolution Protocol, used to discovery layer two address when IP address is already known. Uses protocol 0×0806 compared to Ethernet which uses 0×0800. At layer two an ARP request is a broadcast since the MAC address of the destination device is unknown, this means that the destination MAC is FF:FF:FF:FF:FF:FF. In the ARP packet itself the target MAC address is set to 00:00:00:00:00:00. The device receving the request will answer with a reply (hopefully) with its MAC address in the sender MAC address field in the ARP packet.
Uses the same message types as regular ARP. Can be used together with other methods to force traffic to go through a router even if hosts are in the same subnet, useful in a FTTH/ETTH scenario. If a host has the IP 10.1.1.100/8 with a GW of 10.1.1.1 and wants to send a packet to 10.1.2.100 they are in the same network according to subnetmask but the router with IP 10.1.1.1 might have /24 masks on these subnets. Unless proxy ARP is enabled the traffic will never reach its destination since the first host won’t know what MAC address to use as a destination.
Uses same messages as ARP but is used by hosts to discover their IP address. The host will broadcast a RARP request with a sender IP of 0.0.0.0. A RARP server has to be present on the local subnet and it has mappings of MAC addresses to IP addresses. The server will reply with the IP address that the host should use.
Partly designed to be an improvement of RARP. Encapsulated in UDP and uses different kind of messages. With a correct configuration on a router messages can be forwarded to a centrally placed server instead of using locally placed servers. Supports more information in reply like default gateway, DNS server, subnet mask and an address to a boot (image) server. Still has the burden of mapping MAC addresses to IP addresses.
The next step in dynamic addressing. Support for sending pretty much any information needed to a host including IP, gateway, subnetmask, DNS, custom options. Servers are most often centrally located and relies on the function of DHCP relay. On a cisco router DHCP relay is configured with the ip helper-address command. When the router receives a broadcast on the LAN from a host trying to find out its IP it will change the destination IP (255.255.255.255) to the IP of the DHCP server. It will also set its own IP in the gateway IP address field (GIADDR).
Hot Standby Router Protocol (HSRP)
- Cisco proprietary
- Virtual IP and virtual MAC address active on Master router
- Default hello-interval of three seconds and dead-interval of 10
- Highest priority will win (1-255), preempt not enabled by default
- Supports tracking
- Up to 255 groups per interface
- Uses virtual MAC of 0000.0C07.ACxx where xx is the group number in hex
Virtual Router Redundancy Protocol (VRRP)
- Open standard, very similar to HSRP
- Uses the virtual MAC 0000.5E00.01xx where xx is the group number in hex
- Uses preemption by default
- VRRP can use the interface IP as the VRRP IP which means only two adresses are needed instead of three
Gateway Load Balancing Protocol (GLBP)
GLBP is a Cisco proprietary protocol. The Active Virtual Gateway (AVG) assigns each router in the group a virtual MAC of the form 0007.B400.xxyy where xx is the group number and yy is the identifier for the routers
When a host ARPs for its default gateway the AVG will respond with the MAC of one of the virtual routers, this leads to load balancing
Network Time Protocol
Used to synchronize time for a host/router/server. Will most often run in client mode but a router can also be a NTP server. Uses the concept of stratum to indicate how accurate a time source is, lower stratum is better. Stratum one time sources are very accurate and most of them are atomic clocks.
Simple Network Management Protocol (SNMP)
Used to discover status and information for routers/switches/servers. Version 2C is the most commonly used. SNMP v2C is SNMP version two with version one authentication (clear text). Uses UDP for transport port 61 and 62 for SNMP traps. Cisco devices can send traps when something goes down like an interface failing. SNMP uses Management Information Bases (MIBs) to access the information, to request information from a device the OID is specified in the request. There is also a special Remote Monitoring MIB (RMON) which is used to get interface statistics and information about flows.
SNMP version 3
Supports authentication and encryption
Uses MD5 and SHA for authentication and DES for encryption
Cisco devices do not log to NVRAM by default, can be configured with logging buffered command
Uses UDP port 514 by default
Most often used to send syslog to a remote device which collects syslog from all devices
Web Cache Communication Protocol (WCCP)
Used to ease pressure on WAN links and optimize WAN links. Redirects traffic to content engine which has a cache. Uses UDP port 2048, up to 32 content engines can communicate with a single router, if more than one content engine is present the one with the lowest IP will become the lead engine. With WCCPv1 only one router can redirect the traffic for the content engines but in WCCPv2 multiple routers and content engines can be configured in a service group. WCCPv1 can only support port 80 but v2 supports other protocols as well.
- Supports TCP and UDP other than port 80, like FTP, video and telephony
- Supports multicast
- Supports multiple routers (up to 32 per cluster)
- Can use MD5 for security
- Provides load distribution
Can be used to measure delay, jitter, packet loss and other parameters. Configured with ip sla monitor command. Type of monitor and lifetime needs to be specified.
Used to monitor traffic levels and can be used to look for DDOS. Terms used in Netflow:
Records – A set of predefined and user-defined fields like the source IP and destination IP or ports for UDP/TCP.
Flow monitors – Applied to an interface, include records, a cache and optionally a flow exporter
Flow exporters – Export the cached flow information to an outside system, like a netflow collector
Flow sampler – Reduce the load by only sampling packets like very 1/1000 packets
Router IP traffic Export (RITE)
Used to export IP packets to a VLAN or LAN interface for analysis, like an IDS. Can sample packets in same way as Netflow. Redirects packets to a MAC address.
Embedded Event Manager (EEM)
Designed to make life easier for administrators by tracking and classifying events that take place on a router also provides notification for those events. EEM can be used to:
- Monitor SNMP objects
- Monitor counters
- Screen syslog messages for a pattern match (using regexp)
- Screening CLI input (using regexp)
Actions that EEM can take:
- Send an email
- Reload router
- Generate SNMP traps
- Execute IOS command
Can be used to monitor interface usage or CPU usage. Can warn if CPU rises more than x % in 60 seconds or if CPU has gone over 80%. Can also set falling thresholds.
Secure Shell (SSH)
Requires some parameters to work:
- Hostname configured
- Domain name has been set
- Generate RSA keys
- Transport input allows SSH
Electing the root
Only one switch can be the root bridge. From the start all switches announce themselves as the root, when a switch hears of a superior BPDU it stops announcing itself as the root and instead forwards the superior BDPUs. The switch with the lowest priority will be elected the root, the priority can range from 0 to 65535 where lower is better. If there is a tie in priority the lowest MAC address will decide which bridge becomes the root.
Electing a root port
The port with the lowest cost to the root will be elected the root port. The switch adds its incoming cost on an interface when receiving BPDU hellos. If there is a tie in cost these are the tie breakers.
1. Pick the lowest value of the forwarding switch’s bridge ID
2. Use the lowest port priority of the neighboring switch.
3. Use the lowest internal port number of the forwarding switch.
Note that if multiple links exist between two switches the ID will be the same and port priority may be the same but the port number will always differ.
Electing a designated port
For every segment there can only be one designated port and one designated switch. The switch that sends a hello with the lowest path cost will be the designated port. If there is a tie the same tie breakers as for electing a root port will be used.
STP normal behaviour
The root switch generates hello packets every two seconds. Each non root switch receives the hello on its root port. Each switch updates and forwards the hello out of its designated ports. On each blocked port the switch will receive a copy of the hello from the designated switch on the segment. Hellos are not forwarded out blocking ports.
Changes in the topology
If there is a change in the topology the switch needs to notify the other switches about the change. It will send a TCN BPDU out its root port, it will repeat this message every hello time until acknowledged. The next switch receiving the BPDU will send back an acknowledgement via the next forwarded hello message and set the Topology Change Acknowledgement (TCA) bit. Eventually the TCN will reach the root which will then send hellos with the TC flag set on the next hellos. When the switches receive hellos with TC set they know they should age out their Content Addressable Memory (CAM) tables. This takes 15 seconds default (forward delay timer).
802.1D port states
Taking a port from blocking to forwarding takes between 30 to 50 seconds. If there is an intermediate failure max age has to expire first (20 seconds). After that the port will be listening for 15 seconds (forward delay), if it hears no BPDUs it will move to learning and stay there for 15 seconds. Finally the port will be forwarding. The switch doesn’t learn any MAC addresses until it is in the learning state.
Making a switch the root
The switch that should become the root can be configured with spanning tree vlan vlan-id root primary. If the current root has a higher priority than 24576 the switch sets 24576 as its priority. If the current root has a lower value than 24576 the switch sets its priority to 4096 lower than the current value. Note that the value can not be zero when using this command. If the current root has a priority of 4096 this command will fail. Setting the priority to zero is a safer bet to make sure the root doesn’t change. This command can also be used with the secondary option, this sets the priority to 28672. The second best switch might already have a lower priority than this but there is no way of knowing this from the show spanning tree output.
Spanning tree enhancements
Portfast – Immiediately transitions a port into forwarding mode. Should be set on end user ports. Make sure no switches will ever connect to this port.
Uplinkfast – Used on access layer switches with multiple uplinks to distribution/core. If RP is lost immediately switch to other port as RP and start forwarding. Also notifies other switches to flush their CAM tables.
Backbonefast – Used to detect indirect failures, usually in the core. Avoids waiting for the maxage timer to expire, queries the switch attached to its RP.
These actions are taken when enabling uplinkfast:
- Increases the root priority to 49152
- Sets the port cost to 3000
- Tracks alternate RPs, which are ports in which root hellos are being received.
When a failure has occured the switch with uplinkfast sends frames with the source of its locally learned MAC addresses to the multicast destination 0100.0CCD.CDCD. This forces the upstream switch to relarn the MAC addresses.
When backbonefast is used (should be enabled everwhere if used) then when a hello goes missing the local switch ask its upstream if there is a failure by using a Root Link Query (RLQ). If the upstream has a failure it can reply to the local switch which can now converge to another port without waiting for maxage to expire.
Used for loadbalancing and redundancy. Multiple physical links bundled to one logical link, STP will see the port as one logical link. Can loadbalance on different fields like source and destination MAC address, source and destination IP and layer four port numbers. Can be hardcoded to form a portchannel or use PAgP or LACP. PAgP is Cisco proprietary and LACP is an IEEE standard (802.1AD). PAgP uses the modes auto and desirable (same as DTP) and LACP used active and passive. To be able to form a portchannel some conditions must be met:
- Same speed and duplex on ports
- If not trunking use the same access VLAN
- If trunking use the same trunk type, allowed VLANs and native VLAN
- On a single switch port costs per VLAN must be the same
- Must not be a port of a span session
Rapid Spanning Tree
Defined in IEEE 802.1W
Waits for only three missed hellos on RP before reacting
Fewer portstates, uses only discarding, learning and forwarding
Standardization of portfast, uplinkfast and backbonefast
Allows the use of backup RP when a switch has multiple links connected to the same shared segment.
RSTP link types
Point-to-point – Connects to another switch. Full duplex links are treated as point-to-point.
Shared – The link is shared, connected to a hub or using half duplex.
Edge – Connects a switch to a single end user device.
RSTP port roles
RP – Same as in 802.1D
DP – Same as in 802.1D
Alternate port – Same as in upinkfast, alternate port to RP
Backup port – Backup port for DP, can take over for the DP if the DP fails
Multiple Spanning Tree
Specified in IEEE 802.1S. Allows multiple instaces of spanning tree to run (like RSTP) but can have several VLANs mapped to every instance. Relies on RSTP for convergence. A group of switches that use MST is called an MST region. To be part of the same region some parameters must match:
- Globally enable MST with the spanning-tree mode mst command
- Set the name for the region with the name command
- Set a revision number with the revision command
- Map VLANs to the different instances
These parameters must be identical on the switches in the same region. MST can connect to non MST switches and to the outside world the MST region will be looked at as if it was one switch. MST uses an Internal Spanning Tree (IST) to communicate with the outside switches and ensure the link between is loop-free.
Protecting the spanning tree
To protect the spanning tree from choosing the wrong root or loops that form on end use ports there are some ways of protecting the spanning tree.
BPDU guard – Enabled on ports where switches never will connect (end user ports). If a BPDU is received on the port the port is put into error-disabled state. The port will not recover until the port is shutdown and then no shutdown unless error recovery has been configured.
Root guard – Protects from choosing the wrong RP, could happen by accident or a rougue switch has been connected. If a superior BPDU is received the port is put in the root-inconsistent state and will recover when the superior BPDUs ceases.
UDLD – UniDirectional Link Detection is used to detect unidirectional links which can leed to loops and loss of network connectivity. UDLD has a normal mode and aggressive mode. The normal mode can detect misconnect fibre strands but can not detect unidirectional links where interfaces are connected correctly.
Loop-guard – When BPDUs are no longer received on a port instead of going into forwarding mode the port ends up in a loop-inconsistent mode.
Commonly used in SP networks to put users in common subnet but no direct forwarding of packets between customers in same VLAN. Enforce security by forcing traffic to go through router instead of switched locally. There are three different types of VLANS that can be used, primary VLAN, community VLAN and isolated vLAN. The primary VLAN can talk to all the other VLANS, community VLANS can talk to the primary VLAN and others in the same community VLAN. The isolated VLAN can only talk to the primary VLAN.
Virtual Trunking Protocol
VTP is used for provisiong VLANs to switches in the same VTP domain. Switches can either be
servers, clients or transparent. Servers are responsible for sending the VLANs to the clients, VLANs can be created on the servers but not on the clients. The clients receive VLANs from the servers. Switches that are in transparent mode only forward VTP messages, they do not use the information contained within. Transparent switches can create VLANs locally.
VTP uses a revision number to keep track of changes in the database. When a VLAN is added, modified or deleted the revision number increases by one. A higher revision number indicates a newer database. Under the right circumstances it is possible that a client can originate an update and if it has a higher revision number than the servers all the VLAN information will be replaced. This is the major flaw with VTP and the reason why most engineers stay away from it.
VLANs on trunk
Active – VLAN is allowed, can be added or removed with switchport trunk allowed vlan add/remove
Allowed and active – Allowed on trunk and VLAN exists in configuration, if PVST+ is used STP is active for VLAN
Active and not pruned – Same as “allowed and active” but removes VTP pruned VLANs
Switchport mode trunk sets interface to always trunk but DTP is still active
Switchport nonegotiate – Disable sending of DTP frames
Switchport mode dynamic desirable – Trunk if other end is set to trunk, desirable or auto
Switchport mode dynamic auto – Trunk if other end is set to trunk or desirable
Uses an eight byte header. Common method for DSL access earlier but not widely spread any longer (at least not in Sweden). Assign the outside interface to a dial pool with pppoe-client dial-pool-number 1 and use the command pppoe enable. Create the interface dialer 1 and set IP address negotiated to receive IP from ISP. Set the encapsulation to PPP and configure authentication if needed. Create the dialer pool 1 and assign dialer-group 1 to it. Use a dialer-list to specify what traffic gets to activate the dialer interface. The static default route should point to the dialer interface.
RJ 45 pinouts
10-BASE-T and 100BASE-TX uses pairs two and three, gigabit Ethernet uses all four pairs.
Pinout for straight cable: 1-1;2-2;3-3;6-6
Pinout for crossover cable: 1-3;2-6;3-1;6-2
A standard PC transmits on pair one and two and receives on three and six. A switchport is
the opposite. If two alike devices are connected a crossover cable should be used although
MDI-X is a standard today.
Cisco switches can detect the speed of a link through Fast Link Pulses (FLP) even if autonegotiation is disabled but the duplex can not be detected and this means that half duplex must be assumed. This is true for 10BASE-T and 100BASE-TX. Gigabit Ethernet uses all four pairs in the cable and can only use full duplex mode of operation. Also note that for gigabit Ethernet autonegotiation is mandatory although it is possible to hardcode speed and duplex .
Ethernet uses Carrier Sense Multiple Acess/Collision Detection (CSMA/CD). Before a client can send a frame it listens to the wire to see that it is not busy. It sends the frame and listens to ensure a collision has not occured. If a collision occurs all stations that sent a frame send a jamming signal to ensure that all stations recognized the collision. The senders of the original collided frames wait for a random amount of time before sending again.
Frames that were meant to be sent but were paused because frames were being received at the moment. If in half duplex sending and receiving can not occur at the same time.
Collisions that are detected while the first 64 bytes are being transmitted are called collisions and collisions detected after the first 64 bytes are called late collisions.
Provides synchronization and signal transitions to allow proper clocking of the transmitted signal. Consists of 62 alternating one and zeroes and then ends with a pair of ones.
I/G bit and U/L bit
The I/G bit is placed in the most significant byte and the most significant bit of the MAC address. If set to zero it is an Individual (I) address and if set to one it is a Group (G) address. Multicast at layer two always sends to 01.00.5E which means that the G bit is set. The bit before the I/G bit is the U/L bit, this indicates if it is an Universally (U) administerad address or an Locally (L) assigned address. If it is an MAC address set by a manufacturer this should be set to zero.
SPAN and RSPAN
SPAN and RSPAN are used to mirror traffic. The source of traffic can be a VLAN or a switchport or a routed port. Traffic can be mirrored from both rx and tx or just one of them. SPAN sends the traffic to a local destination port, RSPAN sends the traffic to a RSPAN VLAN which is used to transfer the traffic to its destination. Note that some layer two frames are not sent by default including CDP, VTP, DTP, BPDU and PagP, to include these use the command encapsulation replicate. SPAN is configured with the monitor session command.
- Packets can be sent trough hardware queue without interrupting CPU
- Always uses FIFO logic
- Cannot be affected by IOS queuing tools
Class Based Weighted Fair Queing
- Every class (queue) gets a defined percentage/amount of bandwidth
- If a class does not used all its bandwidth this is distributed across the other classes
Max reserved bandwidth
Is by default 75%. Can be set by user. If interface has 1 Mbit, 750 kbit will be available
and 250 kbit reserved. Bandwidth can be reserved in percentages with bandwidth percent
and/or bandwidth remaining percent.
Low Latency Queuing, the low latency queue is a priority queue and the packets in this queue get sent first (usually voice). The LLQ has a bult in policer so the guaranteed amount of bandwidth for the queue is also the maximum amount of bandwidth. QoS as always is only active when there is congestion. If there is no congestion the LLQ can use any available bandwidth just as any other queue.
Queuing only occurs when there is congestion. IOS considers congestion when the TX ring is full which might occur before line rate of the interface.
Occurs when the queue is full and has no more room for packets. The packets that come in last (tail) are dropped. Most sessions are TCP which means when packets get dropped rate will lower. Performance can be improved by dropping random packets. This can be done by using WRED.
Weighted Random Early Detection
When traffic is below minimum threshold no packets are dropped. When traffic is between minimum threshold and maximum threshold packets are dropped at a linear growing rate. When the maximum threshould has been been reached full drop occurs. The Mark Probability Denominator (MPD) decides how many packets will be dropped. If set to ten every tenth packet will be dropped.
Modified Deficit Round Robin
This queuing mode serves packets in a round robin way. It does have support for a priority queue and the queue can be served in strict mode or in alternative mode. If using strict mode there is a risk for starvation of other queues. If alternate mode is used the priority queue is served in between other queues which means no starvation but more jitter and latency for the prioritized packets. Uses a Quantum Value (QV) to decide how many bytes to send for each queue every cycle. If too many packets have been taken one round this is a deficit and fewer bytes will be sent the next round, this gives every queue a certain amound of bandwidth which over time will be accurate.
Catalyst 3560 queuing
Has support for both ingress and egress queueing, two ingress queues are supported of which one can be configured as a priority queue. Uses Shared Round Robin (SSR) to schedule the packets being sent. Bandwidth for each queue is guaranteed but not limited, if other queues are empty that bandwidth may be used.
Queue two is priority queue
Gets 10 percent of bandwidth
CoS 5 traffic gets placed into queue two
Can use shared or shaped round robin, shared can use excess bandwidth when queues are not full but shaped only uses the configured amount of bandwidth.
Resource Reservation Protocol (RSVP) is a protocol that reserves bandwidth through the entire path that the packets take. The path is unidirectional. Uses PATH messages to setup the path and RESV messages to reserve the bandwidth needed.
Interfaces can only send at line rate. To send traffic “slower” traffic is sent during shorter periods of time. To half the bandwidth, traffic can be sent only half of the time. Cisco uses time interval (Tc) to define the time period. Every Tc an amount of commited burst (Bc) can be sent. Excess burst (Be) is the number of bits that can be sent in excess of Bc.
TC = BC/shaping rate
Traffic shaping adaption lowers shaping rate when there is congestion until it reaches the Minimum Information Rate (MIR) or the mincir. The shaper notices congestion if it receives a frame with BECN set or a Cisco ForeSight message. Every time a BECN or ForeSight message is received the shaper slows down the rate by 25%. If no messages have been received for 16 consecutive Tc the shaper starts increasing rate again. The shaping rate grows by 1/16 each Tc.
Generic Traffic Shaping is used on the interface. Shapes all traffic leaving the interface by default, can be modified with an access-list. GTS can also be used to do adaptive shaping.
Can only shape on egress traffic. Configured with MQC.
Shape average vs shape peak
Shape average fills token bucket with Bc bits every Tc, shape peak fills the bucket with Bc+Be
tokens every Tc which means that it can burst in every Tc.
Shaping rate = configured_rate (1+Bc/Be)
Frame Relay Traffic Shaping (FRTS) is only available for frame relay interfaces. Cannot classify traffic to shape a subset of traffic. FRTS can dynamically learn the Bc, Be and CIR by using Enhanced Local Management Interface (ELMI).
Policing can be done on ingress or egress. The policer meters the interface bandwidth. The difference between policing and shaping is that policing does not hold packets waiting for more tokens, it drops them or remarks them with a lower priority.
Single-rate two-color policing
Uses one bucket with Bc bits. Packets either conform to or exceed the configured rate. Does not use time intervals like shapers, replenishes tokens depending on when packets arrive in time.
(Current_packet_arrival_time – previous_packet_arrival_time) * Police_rate
Has support for excess burst. Packets can either conform, exceed or violate the configured rate. Uses dual buckets, tokens that are over when the Bc bucket is filled goes into the Be bucket.
Two-rate three-color policer
Uses to policing rates, the lower one is the Commited Information Rate (CIR) and the higher
is Peak Information Rate (PIR). Packets that fall under the CIR conforms to the rate and packets that exceed the CIR but are below the PIR exceed. Packets that exceed the PIR are violating the policy. Tokens are filled into both buckets instead of Be bucket relying on spillage from the Bc bucket which means bursting is always available.
- First defined in RFC 2547
- Originally called tag switching and was Cisco proprietary
- MPLS is the open standard
- Operates at layer 2.5 between switching and routing
Terms used in MPLS:
LER = Label Edge Router – MPLS capable, placed at edge of network.
LSR = Label Switch Router – MPLS capable, note that a LER is also a LSR.
CE = Customer Edge device, demarcation between service provider and customer, CE is often managed by provider.
PE = Provider Edge device, This is the router that the CE connects to.
P = Provider router, used in the core of the provider network.
LSP = Label Switched Path, the path taken between the edge devices, unidirectional path.
Push – The ingress LSR pushes a label onto the packet.
Swap – Swap incoming label with outgoing label.
Pop – The egress PE pops the label and forwards it according to IP routing table.
BGP free core – The core routers do not need to know routes for MPLS VPN connectivity, just need to know next-hop.
Types of VPN
Overlay VPN - Layer one or two network with point-to-point links or virtual circuits which separate customer traffic. Customer does not need to peer with ISP, customer is responsible for own routing. Generic Routing Encapsulation (GRE) can also be used to tunnel traffic.
Peer-to-peer VPN - Provider carries customer traffic but also peers with customer providing routing. Earlier to provide traffic separation, traffic filtering and access-lists had to be used, this is now solved in a much more scalable way with MPLS.
Reasons to use MPLS
- One infrastructure carrying multiple services and protocols
- BGP-free core
- Scalable VPN solutions
- Traffic engineering
- Less configuration needed in a fully meshed network than with overlay VPNs
Running MPLS to gain speed is a bogus reason, traffic is forwarded by Application Specific Integrated Circuits (ASICs) and the difference in looking up a route or a label is minimal if any with MPLS.
Normally a service provider needs to run BGP on all transit routers to know how to reach external prefixes. With MPLS BGP is not needed in the core since they only need to know how to reach the BGP next-hop. This is all great in theory but is this really implemented? This would require that only MPLS is used as transport even for regular IP traffic (non VPN).
The MPLS header is four bytes or 32 bits for every label, more than one label can be added to a packet if MPLS VPNS and/or traffic engineering is used. This can add up to three labels with 12 bytes of extra information. This needs to be accounted for on MPLS-enabled interfaces. Of the 32 bits in the header 20 bits are used for the label itself, this means that roughly one million labels are available. Labels 0-15 are reserved. There are also three experimental bits (EXP). These bits are used for Quality of Service (QoS) and aren’t really experimental at this stage. One bit is used to indicated Bottom of Stack (BoS). If this is set to one it means that this label is the final one in the stack. There is also Time To Live which uses eight bits, just as in an IP header.
Forward Equivalence Class (FEC) is a group of packets that are forwarded along the same path and that get the same treatment. All packets belonging to a FEC use the same label, however not all packets with the same label belong to the same FEC.
Examples of FEC
- Packets with layer three destination adress matching a certain prefix
- Multicast packets that belong to the same group
- Packets that have equal Diffserv markings
Label distribution modes
Downstream on Demand – LSR requests label from downstream neighbor (IP next hop) and receives one label for FEC.
Unsolicited Downstream – Each LSR distributes a remote label to its adjacent LSRs without them requesting it. DoD will produce only one label in LIB but UD can produce several. UD is default in Cisco IOS except for ATM interfaces.
Label retention modes
Liberal Label Retention (LLR) keeps all labels in LIB even those that will not end up in LFIB. The best goes to LFIB and others are kept in LIB in case of routing event which forces
reconvergence. Label for other next-hop will already be in LIB which means faster convergence.
Conservative Label Retention (CLR) keeps only label for next-hop in LIB. Default for ATM.
LSP control modes
Independent LSP control mode creates a local binding for FEC independent of other LSRS. It
will do this as soon as it recognizes a FEC meaning it is in the routing table. This will happen even if it is not egress LSR.
Ordered LSP control mode creates local binding if it is the egress LSR for the FEC or if it
has received a label from the next hop for the FEC.
0 – Explicit null – Instead of popping label at PHP, the second last router sets top label to zero, this means EXP bits are preserved.
1 – Router alert – Alerts LSR that packet needs a closer look. Can’t be forwarded in hardware, software needed.
2 – Explicit null for IPv6
3 – Implicit null – Used for PHP, penultimate router pops label and egress LSR only needs to do IP lookup (advertised for directly connected and summaries)
14 – OAM alert
Hello packets sent to multicast address 126.96.36.199 over UDP. TCP used to setup session. Uses TCP port 646. Hello is sent every five seconds, holdtime is 15 seconds by default. Timers above are used for discovery. When session is established a keepalive packet is sent every 60 seconds and the holdtime is 180 seconds. LDP packets will reset the holdtime. Assigns local label for every IGP prefix and is stored in LIB. All prefixes in IGP will get locally assigned label and all these prefixes are advertised to neighbors, even if neighbor owns prefix (no split horizon).
Neighbor ip-address as-override – Used to allow same AS as configured locally in AS-path, replaces the AS nr with the service providers AS.
allowas-in – Loosens loop check by allowing updates with own AS number in AS path.
SOO – Site Of Origin, used to prevent loops in MPLS VPN, every site has unique SOO which is an extended community.
Outer label also called IGP label used for finding next-hop in provider network. Inner label is VPN label used to find the right VRF for egress PE. IGP label is sent via LDP, based on routing table. VPN label and VPNv4 prefixes are sent via MP-BGP.
- Uses TCP as transport, port 179
- Path vector protocol
Checks before becoming a neighbor
- The TCP connection request must come from an IP associated with a neighbor command
- The AS number must match that in the neighbore statement
- The routers can not have duplicate router IDs
- If authentication is configured it must also match
Uses a keepalive and hold timer, defaults to 60 and 180 seconds.
BGP neighbor states
Idle - BGP not initiated yet
Connect - Listening for TCP
Active - Initiate TCP
Open sent - Open sent, TCP is up
Open confirm - Open receivec, TCP is up
Established - Peering has been established
BGP message types
Open - Used to establish neighbor session and exchange parameters
Keepalive - Used to maintain the neighbor relationship
Update - Used to exchange routing information
Notification - Used when BGP errors occur, resets neighbor session
- Uses a sub ASN, real AS divided into smaller sections where each section has an private ASN
- The range is from 64512 to 65535
- Every sub-AS has to be fully meshed internally and uses iBGP logic
- Connections between different sub AS acts as an EBGP connection
- Confederation ASNs is not considered when deciding the AS-path length
- Painful to migrate since it requires to change AS number in router bgp command
- Real AS identified with bgp confederation identifier
- Peers defined with bgp confederation peers
- Confederation AS numbers in AS-path will be removed before advertising to true eBGP peer
- Removes the need for full mesh, all iBGP routers peer with route reflector
- RR responsible for reflecting routes to clients, RR is usually not in forwarding path
- No change is needed on clients to implement RR
- The RR and its clients create a cluster, it is possible to have multiple RRs in a cluster
- Route reflectors in different clusters should be fully meshed
To ensure no loops in this topology BGP needs two new attributes:
Cluster_list - Route reflectors add their cluster ID to this attribute before sending an update. Updates with same cluster ID as local RR will be discarded.
Originator_ID - The ID of the router that originated the prefix. If a router sees its own ID in this attribute it will not use or propagate this prefix.
AS_PATH - Lists ASNs trough which the route has been advertised - Well known Mandatory
NEXT_HOP - Lists the next-hop IP address used to reach the NLRI - Well known Mandatory
AGGREGATOR - Lists the RID and ASN of the router that created a summary NLRI - Optional Transitive
ATOMIC_AGGREGATE - Tags a summary NLRI as being a summary - Well known Discretionary
ORIGIN - The origin of the route, igp, egp or incomplete - Well known Mandatory
ORIGINATOR_ID - The RID of the iBGP neighbor that injected a NLRI into the AS - Optional Nontransitive
CLUSTER_LIST - Used by RRs to lister the RR cluster IDs in order to prevent loops - Optional Nontransitive
Injecting routes into BGP
Done via network command or redistribute from an IGP or static routes.
Injecting a default route into BGP
Use the network 0.0.0.0 command - Requires that 0.0.0.0 exists in routing table
neighbor default-originate - Always advertise default route even if not present in local routing table
default-information originate - Requires route in routing table and a redistribute command
BGP best path algorithm
0. Discard routes with invalid next-hop
1. Routes with highest weight (Cisco proprietary)
2. Routes with highest local preference
3. Routes locally injected
4 Routes with shortest AS-path
5. Routes with best origin
6. Routes with lowest Multiple Exit Discriminator (MED)
7. Prefer eBGP over iBGP (confederation eBGP treated as iBGP)
8. Routes with lowest metric to next-hop
- Cisco proprietary
- Uses IP protocol 88 as transport
- Support for MD5 authentication (no clear text)
- Sends updates to 188.8.131.52
- Distance vector but has some link state like features
Uses a hello and a hold timer. Neighbors discovered via hello protocol. Hold timer used for declaring when a neighbor is dead. EIGRP doesn’t use it own timers for keeping track of the neighbor, it uses the timers that the neighbor supplied in the hello packet. Retransmission TimeOut (RTO) timer used for knowing if to resend an update to a neighbor. Smoothed Round Trip Time (SRTT) keeps track of latency between neighbors and the RTO is derived from the SRTT timer. SRTT is the average time in ms between sending a packet to a neighbor and receiving an ACK. The default timer for hello is 5 seconds for most interfaces and a hold time of 15. NBMA interfaces with T1 or lower speeds use a 60 second hello timer and a 180 second hold time. Changing the hello timer does not automatically adjust the hold time.
Updates are sent as multicasts but resends are unicast to neighbors who didn’t ACK the update before the RTO timer expired. 16 resends using unicast will be used before declaring a neighbor dead. The multicast flow timer is used for knowing when to switch to unicast packets instead of multicast for a neighbor.
Based on cumulative delay and constraining bandwidth. Can factor in load, reliability and MTU if needed but not recommended by Cisco. To change what K values are used (constants) set them with the metric weights command. To calculate the metric use: 256*(10^7/bandwidth)+256(delay).
EIGRP measures delay in tens of microseconds, this needs to be considered when calculating the metric.
EIGRP uses Reported Distance (RD) and Feasible Distance (FD) for the metric. Reported distance is what the neighbor sending the update has calculated the metric to be. Feasible distance is the distance of the route with the lowest metric, it is the RD + the distance between the neighbor announcing the route and the local router. The route with the lowest metric that is entered into the routing table is called a successor route. A feasible successor route is a route that doesn’t have the lowest metric but meets the feasibility condition meaning it has a RD lower than what the current FD is.
Input events and local computation
When an input event occurs EIGRP needs to react, this could be an interface failing, a neighbor failing or an update for a new prefix. When the input event has occured EIGRP performs a local computation, EIGRP looks for a Feasible Successor (FS) route in its topology table and if it cannot find one it will actively query its neighbors for a route.
Uses the Diffusing Update ALgorithm (DUAL). Functioning routes are in a passive mode. Routes that no longer have a successor is in active mode since the route has to query its neighbors for a FS. The term Stuck In Active (SIA) means that an route has been active for too long, the active timer has expired. The active timer is set to 180 seconds by default, the active timer can also be disabled if needed.
EIGRP allows for up to 16 equal-metric routes to be installed in the routing table, the default is four. EIGRP also has something called variance. Variance allows for non equal-metric load balancing. The route still has to meet the feasibility condition to be considered for load balancing. The variance command is a multiplier, if the FD is 10000 for the current succcessor and there is a FS with a RD of 5000 and FD of 200000, variance 2 would make the router load balance between these two routes, variance 2 means the FD of the second best route can be twice as high as the best.
The load balancing can be done in a few different ways, traffic-share balanced means that the traffic will be distributed according to the metric, routes with lower metrics will see more traffic on them. Traffic-share min, install multiple routes but send only traffic on the one with the lowest metric. Traffic-share min across-interfaces, if more than one route has the same metric choose different outgoing interfaces for a better load balancing. The no traffic-share command will balance evenly across routes no matter what the metric is.
EIGRP has support for MD5 authentication, clear text is not supported. The keys are entered into a key-chain. A key can have a lifetime specified or use a lifetime that is always valid. Authentication is configured per interface.
Uses auto-summary by default, turned off with no auto-summary. EIGRP has support for summarizing on every EIGRP interface compared to OSPF which can only summarize at area borders.
EIGRP is a distance vector protocol which means it uses split horizon. Split horizon means the router doesn’t send updates back out on the interface it received them. This can cause issues in non P2P networks. Split horizon can be turned off on an interface basis with the command no ip split-horizon eigrp asn command where asn is the AS-number specified.
Has support for distribute lists and offset lists. Distribute lists are used for filtering inbound or outbound routing updates and what is allowed to enter the routing table. Offset lists are used to change the metric, only adding to the metric is supported, not removing from it.
- Defined in RFC 2328
- Supports VLSM and CIDR
- Is a link state protocol
- Uses a link state database (LSDB) for topology information, identical within area
- Reliable flooding of LSAs
- Uses hello protocol to build adjacencies
- Runs directly over IP, protocol 89
- Uses the Dijkstra algorithm
OSPF uses five different packet types, do not confuse this with the different LSA types. The packet types are:
Type 1: Hello packet – The hello packet is used to discover/mantain neighbors
Type 2: Database description – Summarize database contents, sent when establishing adjacency.
Type 3: Link State Request – Database download
Type 4: Link State Update – Database update
Type 5: Link State ACK – Flooding acknowledgement
These are the most common LSAs:
LS type 1: Router-LSA
Originated by all routers. Describes the collected states of the routers interfaces to an area. Flooded throughout a single area only.
LS type 2: Network-LSA
Originated for broadcast and NBMA networks by the designated router. Contains a list
of routers connected to the network. Flooded throughout a single area only.
LS type 3: Summary-LSA
Originated by area border routers. Describes a route to a destination outside the area(Inter-area route) but still inside the AS.
LS type 4: Summary-LSA
Originated by area border routers. Describes routes to Autonomous System Border Routers.
LS type 5: AS-external-LSA
Originated by Autonomous System Border Routers, flooded throughout the AS. Describes routes external to the AS. Defaults routes for the AS can be described by this LSA.
LS type 7: NSSA-LSA
Originated by Autonomous System Border Routers. Used to flood AS external routes through a stub area. The ABR connected to the backbone will then convert it to a type five LSA.
Designated Router (DR)
On broadcast and NBMA networks a Designated Router (DR) is elected. The router with the highest priority will be elected the DR. The priority can range from 0 to 255 where 255 is the most preferred and where 0 is ineligible to become the DR. A Backup DR (BDR) will also be elected and it will be the router with the second highest priority. The election is not preemptive which means if a router is setup later with a higher priority it will not become the DR unless clearing the OSPF process. The DR has two main functions. Generate a network LSA that lists the set of routers connected to the network. It is also responsible for maintaining adjacencies. The DR and BDR uses the AllDRRouters address of 184.108.40.206. They send updates to the 220.127.116.11 AllSPFRouters address.
Timers used by OSPF
HelloInterval – Length in seconds between hello packets sent on interface, defaults to ten seconds on broadcast networks and thirty on NBMA.
RouterDeadInterval – Number of seconds before neighbor is declared dead, 40 on broadcast and 120 on NBMA (4x missed hello packets)
Wait Timer – Number of seconds before router leaves Wait state and elects designated router. If a router joins later than this it will not have a chance to be elected as DR. Same value as RouterDeadInterval.
RxmtInterval – Number of seconds between LSA restransmissions, also used for DBD and LSR packets.
Down – The initial state of an interface, lower level protocols have indicated that the interface is not ready for use. No protocol traffic can be sent or received and no adjacencies can form.
Loopback – The interface is looped back to the network either in hardware or in software. By default will be announced as host routes (/32). To announce with another mask on loopback interface use ip ospf network point-to-point.
Wait – Router is trying to determine the DR and BDR of the network. The router monitors the hello packets it receives. The router is not allowed to elect a DR or BDR until the wait timer has expired.
Point-to-point – In this state the interface is operational and connected to either a physical point-to-point network or to a virtual link. Upon entering this state the router attempts to form an adjacency and sends hello packets every HelloInterval.
DR Other – All routers except for DR and BDR will be in this state and will form adjacencies with the DR and BDR.
Backup – The backup designated router, will be promoted to DR if/when the DR fails. Forms adjacencies with all other routers.
DR – The designated router, forms adjacencies with all other routers. Responsible for building network LSA for attached network containing links to all routers.
Attempt – Only seen on NBMA networks. No recent information has been received by the neighbor, send hello packets every HelloInterval.
Init - A hello packet has recently been seen from the neighbor, 2-way communication has not yet beeen established. All neighbors in this state or higher are listed in hello packets sent from the interface.
2-way – Bidirectional communication has been assured through the hello protocol. The BDR is chosen from neighbors in state 2-way or greater.
ExStart – The first step in creating an adjacency between neighboring routers. The goal is to decide which router is the master and the initial DD sequence number.
Exchange – The router is describing its entire link state database with DBD packets. Every DBD packet has a sequence number and there can’t be more than one DBD packet outstanding unacknowledged at a time. LSR packets may also be sent requesting newer LSAs.
Loading – In this state LSR packets are sent asking the neighboring router for LSAs described in the DBD packets earlier.
Full – In this state the routers are now fully adjacent.
The hello protocol
Used to build and mantain neighbor adjacencies. Used to insure there is bidirectional communication between neighbors. Hello packets are sent out periodically on all OSPF interfaces unless passive interface is used. On broadcast and NMBA networks OSPF elects a Designated Router (DR) and a Backup Designated Router (BDR). If there is no support for multicast neighbors might need to be statically configured.
Synchronization of link state databases
When using link state protocols it is critical that the link state databases are syncrhonized. In OSPF this is done when building the adjacency by sending DataBase Description packets (DBD). The DBD packets describe the LSAs in the link state database, they are a summary only showing necessary information to request the whole LSA if needed. When exchanging LSAs there is a master/slave relationship. The router with the highest IP will become the master. This is indicated through the MS bit (Master/Slave). If the DBD packet is the first in sequence it will also have the I (Init) bit set. All DBD packets except for the last one will have the M bit set (More). After describing the database with DBD packets the routers can exchange the full LSAs through LSR (Link State Request) and LSU (Link State Update) packets.
ExternalRoutingCapability – indicates if the area supports external (type five) LSAs. Also known as the E-bit. Set to one if supporting external routes. Must be set to zero in stub areas.
Identifiers used by OSPF
A 32-bit number that uniquely identifies a router in the AS. In Ciscos implementation OSPF will choose a loopback interface with the highest IP configured as Router-ID, if no loopback is available it will pick the highest IP of normal interfaces. Recommended to set this manually. If Router-ID has changed, a restart of the OSPF process is necessary.This can be done with the clear ip ospf process command.
A 32-bit number identifying the area. The number 0.0.0.0 is reserved for the backbone, also written as 0. All areas must connect to the backbone but note that if running a single area only this area doesn’t need to be area zero.
OSPF design and router roles
Topology divided into areas, often not necessary with modern routers, scales to hundreds of routers in one area.
Depending on where router resides it can have different roles:
Internal router: Router with interfaces in only one area.
Backbone router: Router with an interface in the backbone (area zero).
Area border router (ABR): Router with interfaces in at least two areas.
Autonomous System Boundary Router (ASBR): Router which injects routing information external to the AS. Will often do redistribution.
Route preference in OSPF
1. Intra-area routes
2. Inter-area routes
3. Type 1 external
4. Type 2 external
External routing information
Can either be of type one or type two, E1 or E2. If using an E1 metric the metric will be the external cost and the cost internally to reach the router advertising the external route (ASBR). If the external metric is 100 and the internal metric is 150 then the E1 metric will be 250 but if using E2 metric it would be 100. If a route is advertised as both E1 and E2 then E1 is preferred.
Area zero is called the backbone, most often written as area 0 but can also be expressed as 0.0.0.0. The backbone area must be contigious. Doesn’t have to be physically contigious, can use virtual links to connect areas which are not directly connected to area zero.
Area where no external routing information is allowed (type five). To reach external routes a default route is used, the default route is sent by the ABR. The stub area can not contain an ASBR, since type five LSAs are not allowed. All routers in a stub area must agree that the area is in fact a stub. A stub area usually has only one exit point but note that a stub can have both several exit points and several ABRs in the area.