QoS Design Notes for CCDE – Daniels Networking Blog

Trying to get my CCDE studies going again. I’ve finished the End to End QoS Design book (relevant parts) and here are my notes on QoS design.

Basic QoS

Different applications require different treatment, the most important parameters are:

Delay: The time it takes from the sending endpoint to reach the receiving endpoint
Jitter: The variation in end to end delay between sequential packets
Packet loss: The number of packets sent compared to the number of received as a percentage

Characteristics of voice traffic:

Smooth
Benign
Drop sensitive
Delay sensitive
UDP priority

One-way requirements for voice:

Latency ≤ 150 ms
Jitter ≤ 30 ms
Loss ≤ 1%
Bandwidth (30-128Kbps)

Characteristics for video traffic:

Bursty
Greedy
Drop sensitive
Delay sensitive
UDP priority

One-way requirements for video:

Latency ≤ 200-400 ms
Jitter ≤ 30-50 ms
Loss ≤ 0.1-1%
Bandwidth (384Kbps-20+ Mbps)

Characteristics for data traffic:

Smooth/bursty
Benign/greedy
Drop insensitive
Delay insensitive
TCP retransmits

Quality of Service (QoS) – Managed unfairness, measured numerically in latency, jitter and packetloss

Quality of Experience (QoE) – End user perception of network performance, subjective and can’t be measured

Tools

Classification and marking tools: Session, or flows, are analyzed to determine what class the packets belong to and what treatment they should receive. Packets are marked so that analysis happens a limited number of times, usually at ingress as close to the source as possible. Reclassification and remarking is common as the packets traverse the network.

Policing, shaping and markdown tools: Different classes of traffic are alotted portions of the network resources. Traffic may be selectively dropped, delayed or remarked to avoid congestion when it exceeds the available network resources. Traffic can be dropped (policing), slowed down (shaped) or remarked (markdown) to conform.

Congestion management or scheduling tools: When there is more traffic than available network resources it will be queued. For traffic classes that don’t react well to queueing they can be denied access by a scheduling tool to avoid lowering quality of the existing flows.

Link-specific tools: Link fragmentation and interleaving fits into this category.

Packet Header

IPv4 packet has 8-bit Type of Service (ToS) field, IPv6 packet has 8-bit Traffic Class field. The first three bits are IP Precedence (IPP) bits for a total of 8 classes. The first three bits in combination with the nex three is known as DSCP for a total of 64 classes.

At layer two the most common marking is 802.1p Class of Service (CoS) or MPLS EXP bits, each using three bits for a total of 8 classes.

QoS Deployment Principles

Define business/organizational objectives of QoS deployment. This may including provisioning real-time services for voice/video traffic or guaranteeing bandwidth for critical business applications and also managing scavenger traffic. Seek executive endorsement of the business objectives to not derail the process later on.
Based on the business objectives, determine how many classes of traffic is needed. Define an end-to-end strategy how to identify the traffic and treat it across the network.
Analyze the requirements of each application class so that the proper QoS tools can be deployed to meet these requirements.
Design platform-specific QoS policies to meet the requirements with consideration for appropriate Place In the Network (PIN).
Test the QoS designs in a controlled environment.
Begin deployment with a closely monitored and evaluated pilot rollout.
The tested and pilot proven QoS designs can be deployed to the production network in phases during scheduled downtime.
Monitor service levels to make sure that the QoS objectives are being met.

The common mistake is to make it a technical process only and not research the business objectives and requirements.

QoS Feature Sequencing

Classification: The identification of each traffic stream.

Pre-queuing: Admission decisions, and dropping and marking the packet, are best applied before the packet enters a queue for egress scheduling and transmission.

Queueing: Scheduling the order of packets before transmission.

Post-queueing: Usually optional, sometimes needed to apply actions that are dependent on the transmission order of packets, such as sequence numbering(e.g. compression and encryption), which isn’t known until the QoS scheduling function dequeues the packets based on the priority rules.

Security and QoS

Trust Boundaries

A trust boundary is a network location where packet markings are not accepted and may be rewritten. Trust domains are network locations where packet markings are accepted and acted on.

Network Attacks

QoS tools can mitigate the effects of worms and DoS attacks to keep critical applications available during an attack.

Recommendations and Guidelines

Classify and mark traffic as close to the source as technically and administratively feasible
Classification and marking can be done on ingress or egress but queuing and shaping are usually done on egress
Use an end-to-end Diffserv PHB model for packet marking
Less granular fields such as CoS and MPLS EXP should be mapped to DSCP as close to the traffic source as possible
Set a trust boundary and mark or remark traffic that comes in beyond the boundary
Follow standards based Diffserv PHB markings if possible to ensure interopability with SP networks, enterprise networks or merging networks together
Set dscp and set precedence should be used to mark all IP traffic, set ip dscp and set ip precedence only marks IPv4 packets
When using tunnel interfaces, think of feature sequencing to make sure that the inner or outer packet headers (or both) are marked as intended

Policing and Shaping Tools

Policer: Checks for traffic violations against a configured rate. Does not delay packets, takes immediate action to drop or remark packet if exceeding rate.

Shaper: Traffic smoothing tool with the objective to buffer packets instead of dropping them, smoothing out any peaks of traffic arrival to not exceed configured rate.

Characteristics of a policer:

Causes TCP resends when traffic is dropped
Inflexible and inadaptable;makes instantaneous packet drop decisions
An ingress or egress interface tool
Does not add any delay or jitter to packets
Rate limiting without buffering

Characteristics of a shaper:

Typically delays rather than drops exceeding traffic, causes fewer TCP resends
Adapts to congestion by buffering exceeding traffic
Typically an egress interface tool
Adds delay and jitter if rate exceeds the shaper
Rate limiting with buffering

Placing Policers and Shapers in the Network

Policers make instantaneous decisions and should be deployed ingress, don’t transport packets if they are going to be dropped anyway. Policers can also be placed on egress to limit a traffic class at the edge of the network.

Shapers are often deployed as egress tools, commonly on enterprise to SP links to not exceed the commited rate of the SP.

Tail Drop and Random Drop

Tail drop means dropping the packet that is at the end of an queue. The TX ring is always FIFO, if a voice packet is trying to get into the TX ring but it’s full it will get dropped because it’s at the tail of the queue. Random drop via Random Early Detection (RED) or Weighted Random Early Detection (WRED) tries to keep the queues from becoming full by dropping packets from traffic classes to cause TCP slowing down.

Recommendations and Guidelines

Police as close to the source as possible, preferably on ingress.
Single rate three color policer handles bursts better than single rate two color policer resulting in fewer TCP retransmissions
Use a shaper on interfaces where speed mismatches, such as buying a lower rate than physical speed or between a remote-end access link and the aggregated head-end link
When shaping on an interface carrying real-time traffic, set the Tc value to 10 ms

Scheduling Algorithms

Strict priority: Lower priority queues are only served when higher priority queues are empty. Can potentially starve traffic in lower priority queues.

Round robin: Queues are served in a set sequence, does not starve traffic but can add unpredictable delays in real-time, delay sensitive traffic.

Weighted fair: Packets in the queue are weighted, usually by IP precedence so that some queues get served more often than others. Does not provide bandwidth guarantee, the bandwidth per flow varies based on number of flows and the weight of each flow.

WRED is a congestion avoidance tool and manages the tail of the queue. The goal is to avoid TCP synchronization where all TCP flows speed up and slow down at the same time, which leads to poor utilization of the link. WRED has little or no effect on UDP flows. WRED can be used to set the RFC 3168 IP ECN bits to indicated that it is experiencing congestion.

Recommendations and Guidelines

Critical applications like VoIP requires service guarantees regardless of network conditions. This requires to enable queueing on all nodes with a potential for congestion.
A large number of applications end up in the default class, reserve 25% for this default Best Effort class
For a link carrying a mix of voice, video and data traffic, limit the priority queue to 33% of the link bandwidth
Enable LLQ if real-time, latency sensitive traffic is present
Use WRED for congestion avoidance on TCP flows but evalute if it has any traffic on UDP flows
Use DSCP-based WRED wherever possible

Bandwidth Reservation Tools

Measurement based: Counting mechanism to only allow a limited number of calls (sessions). Normally statically configured by an administrator.

Resource based: Based on the availability of resources in the network, usually bandwidth. Uses the current status of the network to base its decision.

Resource Reservation Protocol (RSVP) is a resource based protocol, commonly used with MPLS-TE. The drawback of RSVP is that it requires a lot of state in the devices.

AC functionality is most effectively deployed at aplication level such as with Cisco Unified Communications Manager (CUCM). It works well in networks with limited complexity and where flows are of predictable bandwidth.

RSVP can be used in combination with Diffserv in an Intserv/Diffserv model where RSVP is only responsible for admission control and Diffserv for the queuing.

A RSVP proxy can be used because end devices such as phones and video endpoints usually don’t support the RSVP stack. A router closest to the endpoint is then used as a proxy together with CUCM to act as an AC mechanism.

Recommendations and Guidelines

Cisco recommends using RSVP Intserv/Diffserv model with a router-based proxy device. This allows for scaling of policies together with a dynamic network aware AC.

IPv6 and QoS

IPv6 headers are larger in size so bandwidth consumption for small packet sizes is higher. IPv4 header is normally 20 bytes but IPv6 is 40 bytes. IPv6 has a 20-bit Flow Label field and 8-bit Traffic Class field.

Medianet

Modern applications can be difficult to classify and can consists of multiple types of traffic. Webex provides text, audio, instant messaging, application sharing and desktop video conferencing through the same application. NBAR2 can be used to identify applications.

Application Visibility Control (AVC)

Consists of NBAR2, Flexible Netflow (FNF) and MQC. NBAR2 is used to identify traffic through Deep Packet Inspection (DPI), FNF reports on usage and MQC is used for the configuration.

FNF uses Netflow v9 and IPFIX to export flow record information. It can monitor L2 to L7 and identify apps by port and through NBAR2. When using NBAR2, CPU usage may increase significantly as well as memory usage. This is also true for FNF. Consider the performance impact before deploying it.

QoS Requirements and Recommendations by Application Class

Voice requirements:

One-way latency should be no more than 150 ms
One-way peak-to-peak jitter should be no more than 30 ms
Per-hop peak-to-peak jitter should be no more than 10 ms
Packet loss should be no more than 1%
A range of 20 – 320 Kbps of guaranteed priority bandwidth per call (depends on sampling rate, codec and L2 overhead)

Voice recommendations:

Mark to Expedited Forwarding (EF) / DSCP 46
Treat with EF PHB (priority queuing)
Voice should be admission controlled

May use jitter buffers to reduce the effects of jitter, however it does add delay. Voice packets are constant in size which means bandwidth can be provisioned accurately. Don’t forget to account for L2 overhead.

Broadcast video requirements:

Packet loss should be no more than 0.1%

Broadcast video recommendations:

Mark to CS5 / DSCP 40
May be treated with EF PHB (priority queuing)
Should be admission controlled

Flows are usually unidirectional and include application level buffering. Does not have strict jitter or latency requirements.

Real-time interactive video requirements:

One-way latency should be no more than 200 ms
One-way peak-to-peak jitter should be no more than 50 ms
Per-hop peak-to-peak jitter should be no more than 10 ms
Packet loss should be no more than 0.1%
Provisioned bandwidth depends on codec, resolution, frame rates, additional data components and network overhead

Real-time interactive video recommendations:

Should be marked with CS4 / DSCP 32
May be treated with an EF PHB (priority queuing)
Should be admission controlled

Multimedia conferencing requirements:

One-way latency should be no more than 200 ms
Packet loss should be no more than 1%

Multimedia conferencing recommendations:

Mark to AF4 class (AF41/AF42/AF43 or DSCP 34/36/38)
Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED
Should be admission controlled

Multimedia streaming requirements:

One-way latency should be no more than 400 ms
Packet loss should be no more than 1%

Multimedia streaming recommendations:

Should be marked to AF3 class (AF31/AF32/AF33 or DSCP 26/28/30)
Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED
May be admission controlled

Data applications can be divided into Transactional Data (low latency) or Bulk Data (high throughput)

Transactional data recommendations:

Should be marked to AF2 class (AF21/AF22/AF23 or DSCP 18/20/22)
Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED

This class may be subject to policing and remarking. Applications in this class can be Enterprise Resource Planning (ERP) or Customer Relationship Management (CRM).

Bulk data recommendations:

Should be marked to AF1 class (AF11/AF12/AF13 or DSCP 10/12/14)
Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED
Deployed in moderately provisioned queue to provide a degree of bandwidth constraint during congestion, to prevent long TCP session from dominating network bandwidth

Example applications are e-mail, backup operations, FTP/SFTP transfers, video and content distribution.

Best effort data recommendations:

Mark to DF (DSCP 0)
Provision in dedicated queue
May be provisioned with guaranteed bandwidth allocation and WRED/RED

Scavenger traffic recommendations:

Should be marked to CS1 (DSCP 8)
Should be assigned a minimally provisioned queue

Example traffic is Youtube, Xbox Live/360 movies, iTunes, Bittorrent.

Control plane traffic can be divided into Network Control, Signaling and Operations/Administration/Management (OAM).

Network Control recommendations:

Should be marked to CS6 (DSCP 48)
May be assigned a moderately provisioned guaranteed bandwidth queue

Do not enable WRED. Example traffic is EIGRP, OSPF, BGP, HSRP and IKE.

Signaling traffic recommendations:

Should be marked to CS3 (DSCP 24)
May be assigned a moderately provisioned guaranteed bandwidth queue

Do not enable WRED. Example traffic is SCCP, SIP and H.323.

OAM traffic recommendations:

Should be marked to CS2 (DSCP 16)
May be assigned a moderately provisioned guaranteed bandwidth queue

Do not enable WRED. Example traffic is SSH, SNMP, Syslog, HTTP/HTTPs.

QoS Design Recommendations:

Always enable QoS in hardware as opposed to software if possible
Classify and mark as close to the source as possible
Use DSCP markings where available
Follow standards based DSCP PHB markings
Police flows as close to source as possible
Mark down traffic according to standards based rules if possible
Enable queuing at every node that has potential for congestion
Limit LLQ to 33% of link capacity
Use AC mechanism for LLQ
Do not enable wred for LLQ
Provision at least 25% for Best Effort traffic

QoS Models:

Four-Class Model:

Voice
Control
Transactional Data
Best Effort

Eight-Class Model:

Voice
Multimedia-conferencing
Multimedia-streaming
Network Control
Signaling
Transactional Data
Best Effort
Scavenger

Twelve-Class Model:

Voice
Broadcast Video
Real-time interactive
Multimedia-conferencing
Multimedia-streaming
Network Control
Signaling
Management/OAM
Transactional Data
Bulk Data
Best Effort
Scavenger

This picture shows how different size models can be expanded or vice versa.

Campus QoS Design Considerations and Recommendations:

The primary role of QoS is campus networks is not to control latency or jitter, but to manage packet loss. Endpoints normally connect to the campus at high speeds, it may only take a few milliseconds if congestion to overrun the buffers of switches/linecards/routers.

Trust Boundaries:

Conditionally trusted endpoints: Cisco IP phones, Cisco Telepresence, Cisco IP video surveillance cameras, Cisco digital media players.

Trusted endpoints: Centrally administered PCs and endpoints, IP video conferencing units, managed APs, gateways and other similar devices.

Untrusted endpoints: Unsecure PCs, printers and similar devices.

Port-Based QoS versus VLAN-based QoS versus Per-Port/Per-VLAN QoS

Design recommendations:

Use port-based QoS when simplicity and modularity are the key design drivers
Use VLAN-based QoS when looking to scale policies for classification, trust and marking
Do not use VLAN-based QoS to scale (aggregate) policing policies
Use per-port/per-VLAN when supported and policy granularity is the key design driver

EtherChannel QoS

Load balance based on source and destination IP or what is expected to give the best distribution of traffic
Be aware that multiple real-time flows may up on the same physical link and oversubscribing the real-time queue

EtherChannel QoS will vary by platform and some policies are applied to the bundle and some to the physical interface.

Ingress QoS Models:

Design recommendations:

Deploy ingress QoS models such as trust, classification and policing on all access edge ports
Deploy ingress queuing (if supported and required)

The probability for congestion on ingress is less than on egress.

Egress QoS Models:

Design recommendations:

Deploy egress queuing policies on all switch ports
Use a 1 priority queue and 3 normal queues or better queuing structure

Enable trust on ports leading to network infrastructure and similar devices.

Trusted Endpoint:

Trust DSCP
Optional ingress marking and/or poling
Minimum 1P3Q

Untrusted Endpoint:

No trust
Optional ingress marking and/or poling
Minimum 1P3Q

Conditionally Trusted Endpoint:

Conditional trust with trust CoS
Optional ingress marking and/or poling
Minimum 1P3Q

Switch to Switch/Router Port QoS:

Trust DSCP
Minimum 1P3Q

Control Plane Policing

Can be used to harden the network infrastructure. Packets handled by main CPU typically include the following:

Routing protocols
Packets destined to the local IP of the router
Packets from management protocols such as SNMP
Interactive access protocols such as Telnet and SSH
ICMP or packets with IP options may have to be handled by CPU
Layer two packets such as BPDUs, CDP, DTP and so on

Wireless QoS

802.11e Working Group (WG) proposed QoS enhancements to the 802.11 standard in 2007. This was also revised in IEEE 802.11-2012. Wi-Fi Alliance has a compatibility standard called Wireless Multimedia (WMM).

In Wi-Fi networks only one station may transmit at a time, physical constraints that are not in place on wired networks. The Radio Frequency (RF) is shared between devices. This is similar to a hub environment. Wireless networks operate at variable speeds.

Distributed Coordination Function (DCF) is responsible for scheduling and transmitting frames onto the wireless medium.

Wirless uses Carrier Sense Multiple Access/Collision Avoidance (CSMA/CA). It actively tries to avoid collisions. A wireless client has a random period where it may send traffic to try to avoid collisions.

DCF evolved to Enhanced Distributed Channel Access (EDCA) which is a MAC layer protocol. It has the following additions compared to DCF:

Four priority queues, or access categories
Different interframe spacing for each AC as compared to a single fixed value for all traffic
Different contention window for each AC
Transmission Opportunity (TXOP)
Call admission control (TSpec)

802.11e Ethernet frame uses 3-bit field known as User Priority (UP) for traffic marking. It is analogous to 802.1p CoS. One difference is that voice is marked with UP 6 as compared to CoS 5.

Interframe spacing is a time the client needs to wait before starting to send traffic, the wait time is lower for higher priority traffic.

The contention window is used when the wireless media is not free, higher priority traffic waits a shorter period of time before trying to send again than lower priority data.

TXOP is a period of time when the client is allowed to send to not make it hog up the media for a long period of time.

TSpec is used for admission control, the client sends it requirements such as data rate, frame size to the AP and the AP only admits it if there is available bandwidth.

Upstream QoS is packets from the wireless network onto the wired network. Downstream QoS is packets from the wired network onto the wireless network.

Wireless marking may not be consistent with wired markings so mapping may have to be done to map traffic into the correct classes on the wired network.

Upstream QoS:

802.11e UP marking on upstream frame from client to AP is translated to a DSCP valued on the outside of the CAPWAP tunnel. The inner DSCP marking is preserved
After the CAPWAP packet is decapsulated at the WLC, the original IP headers DSCP value is used to derive the 802.1p CoS value

Downstream QoS:

A frame with 802.1p CoS marking arrives a WLC wired interface. DSCP value of the IP packet is used to set the DSCP of the outer CAPWAP header.
The DSCP value of the CAPWAP header is used to set the 802.11e UP value on the wireless frame

The 802.1p CoS value is not used in the above process.

Data Center QoS

Primary goal is to manage packet loss. A few milliseconds of traffic during congestion can cause buffer overruns.

Various data center designs have different QoS needs. These are a few data center architectures:

High-Performance Trading (HPT)
Big data architectures, including High-Performance Computing (HPC), High-Throughput Computing (HTC) and grid data
Virtualized Multiservice Data Center (VMDC)
Secure Multitenant Data Center (SMDC)
Massively Scalable Data center (MSDC)

High-Performance Trading:

Minimal or no QoS requirements because the goal of the architecture is to introduce as little delay as possible using low latency platforms such as the Nexus.

Big Data (HPC/HTC/Grid) Architectures

Have similar QoS needs as a campus network. The goal is to process large and complex data sets that are too difficult to handle by traditional data processing applications.

High-Performance Computing: Uses large amounts of computing power for a short period of time. Often measured in Floating-point Operations Per Second (FLOPS)

High-Throughput Computing: Also uses large amounts of computing power but for a larger period of time. More focused on operations per month or year.

Grid: A federation of computer resources from multiple locations to reach a common goal. A distributed system with noninteractive workloads that involve a large number of files. Compared to HPC, Grid is usually more heterogenous, loosely coupled and geographically dispersed.

Virtualized Multiservice Data Center (VMDC):

VMDC comes with unique requirements due to compute and storage virtualization, including provisioning a lossless Ethernet service.

Applications no longer map to physical servers (or cluster of servers)
Storage is no longer tied to a physical disk (or array)
Network infrastructure is no longer tied to hardware

Lossless compute and storage virtualization protocols such as RoCE and FCoE need to be supported as well as Live Migration/vMotion.

Secure Multitenant Data Center (SMDC):

Virtualization is leveraged to support multitenants over a common infrastructure and this affects the QoS design. SMDC has similar needs as VMDC but a different marking model.

Massively Scalable Data Center:

A framework used to build elastic data centers that host a few applications that are distributed across thousands of servers. Geographically distributed homogenous pools of compute and storage. The goal is to maximize throughput. Common to use a leaf and spine design.

Data Center Bridging Toolset

IEEE 802.1 Data Center Bridging Task Group has defined enhancements to Ethernet to support requirements of converged data center networks.

Priority flow control (IEEE 802.1Qbb)
Enhanced transmission selection (IEEE 802.1Qaz)
Congestion notification (IEEE 802.1Qau)
DCB exchange (DCBX) (IEEE 802.1Qaz combined with 802.1AB)

Priority Flow Control (802.1Qbb): PFC provides link level flow control mechanism that can be controlled independently for each 802.1p CoS priority. The goal is to provide zero frame loss due to congestion in DCB networks and mitigating Head of Line (HoL) blocking. Uses PAUSE frames.

Skid Buffers

Buffer management is critical to PFC, if transmit or receive buffers are overflowed, transmission will not be lossless. A switch needs sufficient buffers to:

Store frames sent during the time it takes to send the PAUSE frame across the network between stations
Store frames that are already in transit when the sender receives the PFC PAUSE frame

The buffers used for this are called skid buffers and usually engineered on a per port basis in hardware on ingress.

An incast flow is a flow from many senders to one receiver.

Virtual Output Queuing (VOQ)

Artifically induce congestion on ingress ports where there is an incast flow going to a host. This lessens the need for deep buffers on egress. VOQ consumes congestion at every ingress port and optimizes switch buffering capacity for incast flows. It does not consume fabric bandwidth only to be dropped on the egress port.

Enhanced Transmission Selection – IEEE 802.1Qaz

Uses a virtual lane concept on a DCB enabled NIC, also called Converged Network Adaptor (CNA). Each virtual interface queue is accountable for managing its alloted bandwidth for its traffic group. If a group is not using all its bandwidth it may be used by other groups.

ETS virtual interface queues can be serviced as follows:

Priority – a virtual lane can be assigned a strict priority service
Guaranteed bandwidth – a percentage of the physical link capacity
Best effort – the default virtual lane service

Congestion Notification IEEE 802.1Qau

Layer two traffic management system that pushes congestion to the edge of the network by instructing rate limiters to shape the traffic that is causing congestion. The congestion point such as a distribution switch connecting to several access switches can instruct these switches called reaction points to throttle the traffic by sending control frames.

Data Center Bridging Exchange (DCBX) IEEE 802.1Qaz + 802.1AB

DCB capabilities:

DCB peer discovery
Mismatched configuration detection
DCB link configuration of peers

The following DCB parameters can be exchanged by DCBX:

PFC
ETS
Congestion notification
Applications
Logical link-down
Network interface virtualization

DCBX can be used between switches and with some endpoints.

Data Center Transmission Control Protocol (DCTCP)

A goal of the data center is to maximize the goodput which is the application level throughput excluding protocol overhead. Goodput is reduced by TCP flow control and congestion avoidance, specifically TCP slow start.

DCTCP is based on two key concepts:

React in proportion to the extent of congestion, not its presence – this reduces variance in sending rates
Mark ECN base on instantaneous queue length – this enables fast feedback and corresponding window adjustments to better deal with bursts

Considerations affecting the marking model to be used in the data center include the following:

Data center applications and protocols
CoS/DSCP marking
CoS 3 overlapping considerations
Application-based marking models
Application- and tenant-based marking modelse

Data Center Applications and Protocols

Recommendations:

Consider what applications/protocols are present in the data center and may not already be reflected in the enterprise QoS model and how these may be integrated
Consider what applications/protocols may not be present or have a significantly reduced presence in the DC

Compute Virtualization Protocols:

Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE):
Supports direct memory access of one computer into another over converged Ethernet without involving either one’s operating system. Permits high-throughput low-latency networking, especially useful in massively parallel computer clusters. It’s a link layer protocol that allows communcation between any two hosts in the same broadcast domain. RoCE requires lossless service via PFC. When implemented along with FCoE, it should be assigned its own no-drop class/virtual lane, such as CoS 4. Other applications such as video using CoS 4 need to be reassigned to improve RoCE performance.

Internet Wide Area RDMA Protocol (iWARP):
Extends the reach of RDMA over IP networks. Does not require lossless service because it runs over TCP or STCP which uses reliable transport. It can be marked to unused CoS/DSCP or combined with internetwork control (CS6/CoS 6) or network control (CS7/CoS 7).

Virtual machine control and live migration protocols (VM control):
Virtual Machines (VMs) require control traffic to be passed between hypervisors. VM control is control plane traffic and should be marked to CoS 6 or CoS 7, depending on QoS model in use.

Live migration:
Protocols that support the process of moving a running VM (or application) between different phsycical machines without disconnecting the client or the application. Memory, storage and network connection are moved from original host matchine to the destination. A common example being vMotion. Can be argued to be a candidate for internetwork control (CoS 6) due to being a control plane protocol but sends too much traffic to be put in that class. Use an available marking or combine with CoS 4, CoS 2 or even CoS 1.

Storage Virtualization Protocols:

Fibre Channel over Ethernet (FCoE):
Encapsulates Fibre Channel (FC) frames over Ethernet networks, requires lossless service and is a layer two protocol that can’t be natively routed. Requires lossless service via PFC and usually marked with CoS 3 which should be dedicated for FCoE.

Internet Protocol Small Computer System Interface (iSCSI):
Encapsulates SCSI commands within IP to enable data transfers. Can be used to transmit data over LANS, WANS or even the Internet and can enable location independent data storage and retrieval. Does not require lossless service due to using TCP. Can be provisioned in dedicated class or in another class such as CoS 2 or CoS 1.

CoS/DSCP Marking:

Recommendations:

Some layer two protocols within the DC require CoS marking
CoS marking has limitations so consider a hybrid CoS/DSCP model (when supported)

CoS 3 Overlap Considerations and Tactical Options:

Recommendations:

Recognize the potential overlap of signaling (and multimedia streaming) markings with FCoE
Select a tactical option to address this overlap

Signaling is normally marked with CoS 3 but so is also FCoE. Some administrators prefer to dedicate CoS 3 to FCoE but that leaves the question what to do with signaling. Options to handle the overlap:

Hardware Isolation:
Some platforms and interface modules do not support FCoE such as Nexus 7k M-Series module but F-series do. M-series module can connect to CUCM and multimedia streaming servers and F-series modules to DCB extended fabric supporting FCoE.

Layer 2 Versus Layer 3 Classification:
Signaling and multimedia streaming can be classified by DSCP values (CS3 and AF3) to be assigned to queues and FCoE can be classified by CoS 3 to its own dedicated queue.

Asymmetrical CoS/DSCP Marking:
Asymmetrical meaning the that the three bits forming the CoS do not match the first three bits of the DSCP value. Signaling could be marked with CoS 4 but DSCP CS3.

DC/Campus DSCP Mutation:
Perform ingress and egress DSCP mutation on data center to campus links. Signaling and multimedia streams can be assigned DSCP values that map to CoS 4 (rather than CoS 3).

Coexistence:
Allow signaling and FCoE to coexist in CoS 3. The reasoning being that if the CUCM server has CNA then both signaling and FCoE will be provided a lossless service.

Data Center QoS Models:

Trusted Server Model:
Trust L2/L3 markings sent on application servers. Only approved servers should be deployed in the DC.

Untrusted Server Model:
Do not trust markings, reset markings to 0.

Single-Application Server Model:
Same as the untrusted server model but remarked to a non zero value.

Multi-Application Server Model:
Access-lists are used for classification and traffic is marked to multiple codepoints. Application server does not mark traffic at all or it marks it to different values than the enterprise QoS model.

Server Policing Model:
One or more application classes are metered via one-rate or two-rate policers, with conforming, exceeding and optionally violating traffic marked to different DSCP values.

Lossless Transport Model:
Provision lossless service to FCoE.

Trusted Server/Network Interconnect:

Trust CoS/DSCP
Ingress queuing
Egress queuing

Untrusted Server:

Set CoS/DSCP to 0
Ingress queuing
Egress queuing

Single-App Server:

Set CoS/DSCP to non zero value
Ingress queuing
Egress queuing

Multi-App Server:

Classify by ACL
Set CoS/DSCP values
Ingress queuing
Egress queuing

Policed Server:

Police flows
Remark/drop
Ingress queuing
Egress queuing

Lossless Transport:

Enable PFC
Enable ETS
Enable DCBX
Ingress queuing
Egress queuing

WAN & Branch QoS Design Considerations & Recommendations:

To manage packet loss (and jitter) by queuing policies
To enhance classification granularity by leveraging deep packet inspection engines

Packet jitter is most apparent at WAN/branch edge because of downshift in link speeds.

Latency and Jitter:
Recommendations:

Choose service provider paths to target 150 ms for one-way latency. If this target can’t be met, 200 ms is generally acceptable
Only queuing delay is managable by QoS policies

Network latency consists of:

Serialization delay (fixed)
Propagation delay (fixed)
Queuing delay (variable)

Serialization delay is the time it takes to convert a layer two frame into electrical or optical pulses onto the transmission media. The delay is fixed and a function of the line rate.

Propagation delay is also fixed and a function of the physical distance between endpoints. The gating factor is speed of light at 300 000km/s in vacuum but speed in fiber circuits is around a third of that. Propagation delay is then approximately 6.3 microseconds per km. Propagation delay is what makes up most of the network delay.

Queuing delay is variable and a function of whether a node is congested or not and if scheduling policies have been applied to resolve congestion events.

Tx-Ring:
Recommendation:

Be aware of the Tx-Ring function and depth;tune only if necessary

The Tx-Ring is the final IOS output buffer for an interface, it’s a relatively small FIFO queue that maximizes physical link bandwidth utilization by matching the outbound packet rate on the router with the physical interface rate. If the size of the Tx-Ring is too large, packets will be subject to latency and jitter while waiting to be served. If the Tx-Ring is too small the CPU will be continually interrupted, causing higher CPU usage.

LLQ:
Recommendations:

Use a dual-LLQ design when deploying voice and real-time video applications
Limit sum of all LLQs to 33% of bandwidth
Tune the burst parameter if needed

Some applications like Telepresence may be bursty by nature, the burst value may have to be adjusted to account for this.

WRED:
Recommendations:

Optionally tune WRED thresholds as required
Optionally enable ECN

To match behavior of AF PHB defined in RFC 2597 use these values:

Set minimum WRED threshold for AFx3 to 60% of queue depth
Set minimum WRED threshold for AFx2 to 70% of queue depth
Set minimum WRED threshold for AFx1 to 80% of queue depth
Set all WRED maximum thresholds to 100%

RSVP
Recommendations:

Enable RSVP for dynamic network-aware admission control requirements
Use the Intserv/Diffserv RSVP model to increase efficiency and scalability
Use application-identification RSVP policies for greater policy granularity

Ingress QoS Models
Recommendations:

DSCP is trusted by default in IOS
Enable ingress classification with NBAR2 on LAN edges, as required
Enable ingress/internal queuing, if required

Egress QoS Models
Recommendations:

Deploy egress queuing policies on all WAN edge interfaces
Egress queuing policies may not be required on LAN edge interfaces

Recommendation for queues:
LLQ:

Limit the sum of all LLQs to 33%
Use an admission control mechanism
Do not enable WRED

Multimedia/Data:

Provision guaranteed bandwidth according to application requirements
Enable fair-queuing presorters
Enable DSCP-based WRED

Control:

Provision guaranteed bandwidth according to control traffic requirements
Do not enable presorters
Do not enable WRED

Scavenger:

Provision with a minimum bandwidth allocation such as 1%
Do not enable presorters
Do not enable WRED

Default/Best effort:

Allocate at least 25% for the default/Best effort queue
Enable fair-queuing pre-sorters
Enable WRED

WAN and Branch Interface QoS Roles:

WAN aggregator LAN edge:

Ingress DSCP trust should be enabled
Ingress NBAR2 classification and marking policies may be applied
Ingress Medianet metadata classification and marking policies may be applied
Egress LLQ/CBWFQ/WRED policies may be applied (if required)

WAN aggregator WAN edge:

Ingress DSCP trust should be enabled
Egress LLQ/CBWFQ/WRED policies should be applied
RSVP policies may be applied
Additional VPN specific policies may be applied

Branch WAN edge:

Ingress DSCP trust should be enabled
Egress LLQ/CBWFQ/WRED policies should be applied
RSVP policies may be applied
Additional VPN specific policies may be applied

Branch LAN edge:

Ingress DSCP trust should be enabled
Ingress NBAR2 classification and marking policies may be applied
Ingress Medianet metadata classification and marking policies may be applied
Egress LLQ/CBWFQ/WRED policies may be applied (if required)

MPLS VPN QoS Design Considerations & Recommendations
The role of QoS over MPLS VPNs may include the following:

Shaping traffic to contracted service rates
Performing hierarchical queuing and dropping within these shaped rates
Mapping enterprise classes to the service provider classes
Policing traffic according to contracted rates
Restoring packet markings

MEF Ethernet Connectivity Services

E-line:
A service connecting two customer Ethernet ports over a WAN. It is based on point-to-point Ethernet Virtual Connection (EVC)

Ethernet Private Line(EPL):
A basic point-to-point service characterized by low frame delay, frame delay variation and frame loss ratio. Service multiplexing is not allowed. No CoS bandwidth profiling is allowed, only a Committed Information Rate (CIR).

Ethernet Virtual Private Line(EVPL):
Multiplexing of EVCs is allowed. The individual EVCs can be defined with different bandwidth profiles and layer two control processing methods.

E-LAN:
A multipoint service connecting customer endpoints and acting as a bridged Ethernet network. It is based on multipoint EVC and service multiplexing is allowed. It can be configured with a CIR, Committed Burst Size (CBS) and Excess Information Rate (EIR).

E-Tree:
A point-to-multipoint version of the E-LAN, essentialy it’s a hub and spoke topology where the spokes can only communicate with the hub but not each other. Common for franchise operations.

Sub-Line-Rate Ethernet Design Implications
Recommendations:

Sub line rate may require hierarchical shaping with nested queuing policies
Configure the CE shaper’s Committed Burst (Bc) value to be no more than half of the SP’s policer Bc

If the Bc of the shaper is set too high, packets may be dropped by the policer even though the shaper is shaping to CIR of the service.

When using sub line rate there will be no congestion on the interface, congestion is artificially induced by using a shaper and then a nested policy for the queuing. This may be referred to as Hierarchical QoS (HQoS).

QoS Paradigm Shift
Recommendation:

Enterprises and service providers most cooperate to jointly administer QoS over MPLS VPNs

MPLS VPNs offer a full mesh of connectivity between campus and branch networks. This fully meshed connectivity has implications for the QoS design. Previously WANs were usually point-to-point or hub and spoke which made the QoS design simpler. Branch to branch traffic would pass through the hub which controlled the QoS.

When using MPLS VPNs traffic from branch to branch will not pass the hub meaning that QoS needs to be deployed on all the branches as well. However, this is not enough, contending traffic may not be coming from the same site, it could be coming from any site. To overcome this the service provider needs to deploy QoS policies that are compatible with the enterprise policies on the PE routers. This is a paradigm shift in QoS administration and requires the enterprise and SP to jointly administer the QoS policies.

Service Provider Class of Service Models
Recommendations:

Fully understand the CoS models of the SP
Select the model that most closely matches your strategic end-to-end model

MPLS DiffServ Tunneling Modes
Recommendations:

Understand the different MPLS Diffserv tunneling modes and how they affect customer DSCP markings
Short pipe mode offers enterprise customers the most transparency and control of their traffic classes

Uniform Mode
Recommendation:

If provider uses uniform mode, be aware that your packets DSCP values may be remarked

Uniform mode is generally used when the customer and SP share the same Diffserv domain, which would be the case for an enterprise deploying MPLS.

Uniform mode is the default mode. The first three bits of the IP ToS field are mapped to MPLS EXP bits on the ingress PE when it adds the label. If a policer or other mechanism remarks the MPLS EXP value this value is copied to lower level labels and at the egress PE the MPLS EXP value is used to set the IPP value.

Short Pipe Mode

It is used when customer and SP are in different Diffserv domains. This mode is useful when the SP wants to enfore its own Diffserv policy but the customer wants its Diffserv information to be preserved across the MPLS VPN.

The ingress PE sets the MPLS EXP value based on the SPs policies. Any remarking will only propagate to the MPLS EXP bits of labels but not to the IPP bits of the customers IP packet. On egress the queuing is based on the IPP marking of the customers packet, giving the customer maximum control.

Pipe Mode

Pipe mode is the same as short pipe mode except for that the queuing is based on MPLS EXP bits at the egress PE and not on the customers IPP marking.

Enterprise-to-Service Provider Mapping
Recommendation:

Map the enterprise application classes to the SP CoS classes as efficiently as possible

Enterprise to service provider mapping considerations include the following:

Mapping real-time voice and video traffic
Mapping signaling and control traffic
Separating TCP-based applications from UDP-based applications (where possible)
Remarking and restoring packet markings (where required)

Mapping Real-Time Voice and Video
Recommendation:

Balance the service level requirements for real-time voice and video with the SP premium for real-time bandwidth
In either scenario, use a dual LLQ policy at CE egress edge

SPs often only a single real-time CoS, if you are deploying both real-time voice and video you will have to make a choice to put the video in the real-time class or not. Putting both voice and video into the real-time class may be costly or even cost prohibitive. You should still use a dual LLQ at the CE edge since that is under your control and that way you can protect voice from video. Downgrading video to a non real-time class may only produce slightly lower quality which could be acceptable.

Mapping Control and Signaling Traffic
Recommendation:

Avoid mixing control plane traffic with data plane traffic in a single SP CoS

Signaling should be separated from data traffic if possible since the signaling could get dropped if the class is oversubscribed and thus producing voice/video instability. If the SP does not offer enough classes to put signaling in its own, consider putting it in the real-time class since these flows are lightweight, but critical.

Separating TCP from UDP
Recommendation:

Separate TCP traffic from UDP traffic when mapping to SP CoS classes

It is generally best to not mix TCP-based traffic with UDP-based traffic (especially if the UDP traffic is streaming video such as broadcast video) within a single SP CoS. These protocols behave differently under congestion. Some UDP applications may have application-level windowing, flow control and retransmission capabilities but most UDP transmitters are oblivious to drops and don’t lower transmission rates due to dropping.

When TCP and UDP share a SP CoS and that class experiences congestion, the TCP flows continually lower their transmission rates, potentially giving up their bandwidth to UDP flows that are oblivious to drops. This is called TCP starvation/UDP dominance.

Even if enabling WRED the same behavior would be seen because WRED (primarily) manages congestion only on TCP-based flows.

Re-Marking and Restoring Markings
Recommendation:

Remark application classes on CE edge on egress (as required)
Restore markings on the CE edge on ingress via deep packet inspection policies (as required)

If packets need to be remarked to fit with the SP CoS model, do it at the CE edge on egress. This requires less of an effort than doing it in the campus.

To restore DSCP markings, traffic can be classified on ingress on the CE edge via DPI.

MPLS VPN QoS Roles

CE LAN edge:

Ingress DSCP trust should be enabled (enabled by default)
Ingress NBAR2 classification and marking policies may be applied
Ingress Medianet metadata classification and marking policies may be applied
Egress LLQ/CBWFQ/WRED policies may be applied (if required)

CE VPN edge:

Ingress DSCP trust should be enabled (enabled by default)
Ingress NBAR2 classification and marking policies may be applied (to restore markings lost in transit)
Ingress Medianet metadata classification and marking policies may be applied (to restore markings lost in transit)
RSVP policies may be applied
Egress LLQ/CBWFQ/WRED policies should be applied
Egress hierarchical shaping with nested LLQ/CBWFQ/WRED policies may be applied
Egress DSCP remarking policies may be applied (used to map application classes into specific SP CoS)

PE customer-facing edge:

Ingress DSCP trust should be enabled (enabled by default)
Ingress policing policies to meter customer traffic should be applied
Ingress MPLS tunneling mode policies may be applied
Egress MPLS tunneling mode policies may be applied
Egress LLQ/CBWFQ/WRED policies should be applied

PE core-facing edge:

Ingress DSCP trust should be enabled (enabled by default)
Ingress policing policies to meter customer traffic should be applied
Egress MPLS EXP-based LLQ/CBWFQ policies should be applied
Ergess MPLS EXP-based WRED policies may be applied

P edges:

Ingress DSCP trust should be enabled (enabled by default)
Egress MPLS EXP-based LLQ/CBWFQ policies may be applied
Egress MPLS EXP-based WRED policies may be applied

IPSEC QoS Design

Tunnel Mode

Default IPSEC mode of operation on Cisco IOS routers. The entire IP packet is protected by IPSEC, the sending VPN router encrypts the entire original IP packet and adds a new IP header to the packet. It supports multicast and routing protocols.

Transport Mode

Often used for encrypting peer-to-peer communications, does not encase the original IP packet into a new packet. Only the payload is encrypted while the original IP header is preserved, in effect being copied to outside of the new IP packet. Because the header is left intact its not possible to do multicast or routing protocols in transport mode.

IPSEC with GRE

GRE can be used to enable VPN services that connect disparate networks. It’s a key building block when using VRF Lite, a technology allowing related Virtual Routing and Forwarding (VRF) instances running on different routers to be interconnected across an IP network, while maintaining their separation from both the global routing table and other VRFs.

When using GRE as a VPN technology, it is often desirable to encrypt the GRE tunnel so that privacy and authentication of the connection can be ensured. GRE can be used with IPSEC tunnel mode or transport mode but if the tunnel transits a NAT or PAT device, tunnel mode is required.

Remote-Access VPNs

Cisco’s primary remote-access VPN client is AnyConnect Secure Mobility Client, which supports both IPSEC and Secure Sockets Layer (SSL) encryption.

Anyconnect uses Data Transport Layer Security (DTLS) to optimize real-time flows over SSL encrypted tunnel. Anyconnect connects to remote headend concentrator (such as an ASA firewall) through TCP-based SSL. All traffic from the client including voice, video and data traverses the SSL TCP connection. When TCP loses packets it pauses and waits for them to be resent, this is not good for real-time UDP based packets.

DTLS is a datagram technology, meaning it uses UDP packets instead of TCP. After Anyconnect establishes the TCP SSL tunnel it also establishes an UDP-based DTLS tunnel which is reserved for the use of real-time applications. This allows RDP voice and video packets to be sent unhindered. In case of packet loss, the session does not pause.

The decision on which tunnel to send the packets to is dynamic and made by the Anyconnect client.

QoS Classification of IPsec Packets
Recommendation:

Understand the default behavior of Cisco VPN routers to copy the ToS byte from the inner packet to the VPN packet header

Cisco routers by default copy the the ToS field from the original IP packet and write it into the new IPSEC packet header, thus allowing classification to still be accomplished by matching DSCP values. The same holds true for GRE packets as well. The IP packet is encrypted so it’s not possible to match on other fields such as IP addresses, ports, protocol and so on without using another feature.

The IOS Preclassify Feature
Recommendations:

Be aware of the limitations of QoS classification when using something other than the ToS byte
Use the IOS preclassify feature for all non ToS types of QoS classification
As a best practice, enable this feature for all VPN connections

Normally tunneling and encryption takes place before QoS classification in the order of operations, QoS preclassify reverses the order so that classification can be done on the IP header before it gets encrypted. Actually the order isn’t really reversed but the router clones the original IP header and keeps it in memory so that it can be used for QoS classification after tunneling and encryption.

This feature is only applicable on the encrypting routers outbound interface (physical or tunnel). Downstream routers can’t make decisions on the header because the packet will be encrypted at that point. Always enable the feature since tests have shown that it has very little impact on the routers performance to enable it.

MTU Considerations
Recommendations:

Be aware that MTU issues can severely impact network connectivity and the quality of user experience in VPN networks

When tunneling technologies are used there is always the risk of exceeding the MTU somewhere in the path. Unless jumbo frames are available end-to-end, MTU issues will almost always need to be addressed when dealing with any kind of VPN technology. Common symptoms when having MTU issues is that applications using small packets such as voice work but not e-mail, file server connections and many other applications.

Path MTU Discovery (PMTUD) can be used to discover what the MTU is along the path but it relies on ICMP messages which may be blocked on intermediary devices.

TCP Adjust-MSS

TCP Maximum Segment Size (MSS) is the maximum amount of payload data that a host is willing to accept in a single TCP/IP datagram. During a TCP connection setup between two hosts (TCP SYN), the MSS for each side of the connection is reported to each other. It’s the responsibility of the sending host to limit the size of the datagram to a value less than or equal to the receiving hosts MSS.

For an IP packet that is 1500 bytes and using TCP, the MSS is 1460 bytes, 20 bytes for IP and 20 bytes for TCP excluded from the 1500 byte packet.

Two hosts may not be aware they are communication through a tunnel and send a TCP SYN with MSS 1460 but the MTU may be lower. TCP Adjust-MSS can rewrite the MSS of the SYN packet so that when the receiving hosts gets it, the value is set to something lower to be able to send traffic through the tunnel without fragmentation. The receiving host will then reply with this value to the sender host. The router is acting as a middleman for the TCP session.

When using IPSEC over GRE, a MTU of 1378 bytes can be used:

Original IP packet = 1500 bytes
Subtract 20 bytes for IP header = 1480 bytes
Subtract 20 bytes for IP header = 1460 bytes
Subtract 24 bytes for GRE header = 1436 bytes
Subtract a maximum of 58 bytes for IPSEC = 1378 bytes

Adjusting MSS is a CPU intensive process. Enable it at remote sites rather than headend since it might be terminating a lot of tunnels. Adjusting MSS only needs to be done at one point in the path.

TCP Adjust-MSS only has impact on TCP packets, UDP packets are less likely to be of large size compared to TCP.

Compression Strategies Over VPN
Recommendations:

Compression can improve overall throughput, latency and user experience on VPN connections
Some compression technologies tunnel and may hide the fields used for QoS classification

TCP Optimization Using WAAS

Wide Area Application Services (WAAS) is a WAN accelerator, it uses compression technologies such as LZ compression, Date Redundancy Elimination (DRE) and specific Application Optimizers (AO). This significantly reduces the amount of data send over the WAN or VPN. For a technology like WAAS to work, the compression must take place before encryption.

Compression technologies can have a significant effect on the QoE but it works mainly for TCP traffic. Some WAN acceleration solution may break classification if the traffic is tunnel so that the original IP header is obfuscated. WAAS only compresses the data partion of the packet and keeps the header intact leaving the ToS byte available for classification.

Using Voice Codecs over a VPN Connection

To improve voice quality over bandwidth constrained VPN links, administrators may use compression codecs such as ILBC or G.729.

G.729 uses about a third of the bandwidth of G.711 but this also increases the effect of packet loss since more data is lost in every packet. To overcome this when the a packet is lost and the jitter buffer expires, the voice from the previous packet can be replayed to hide the gap, essentially tricking the listener. Through this technology, up to 5% of packet loss can be acceptable.

Internet Low Bitrate Codec (ILBC) uses 15.2 Kbit/s or 13.33 Kbit/s and performs similarly to G.729, the Mean Opinion Score (MOS) for ILBC is significantly better though when there is packet loss.

Compress Real-Time Protocol (cRTP) is not compatible with IPSEC because the packets are already encrypted when cRTP would try to compress them.

Antireplay Implications
Recommendation:

Antireplay drops may introduce in an IPSEC VPN network with QoS enabled

When ESP authentication is configured in an IPSEC transform set, every Security Association (SA) keeps a 64-packet sliding window where it checks the incoming sequence number of the encrypted packets. This is to stop from someone replaying packets and is called connectionless integrity. If packets arrive out of order due to queuing it must fit inside the window or the packet will be drop and seen as antireplay error. A data packet may get stuck behind voice in a queue so that it misses to fit inside its sliding window and then the packet would get dropped. To overcome this use a line in the ACL for every type of traffic such as voice, data, video. This will create a SA for each type of traffic.

TCP will be affected by packet loss, it will not know that the packets are dropped due to antireplay.

Antireplay drops are around 1 to 1.5% on congested VPN links with queuing enabled. A CBWFQ policy will often hold 64 packets per queue, decreasing this will lead to fewer antireplay drops as the packets are dropped before traversing the VPN but it may also increase the CPU usage.

DMVPN QoS Design

DMVPN offers some advantages regarding QoS compared to IPSEC, such as the following:

Reduction of overall hub router QoS configuration
Scalability to thousands of sites, with QoS for each tunnel on the hub router
Zero-touch QoS support on the hub router for new spokes
Flexibility of both hub and spoke and spoke to spoke (full mesh) deployment models

DMVPN Building Blocks

mGRE: Multi-point GRE allows a single tunnel interface to server a large number of remote spokes. One outbound QoS policy can be applied instead of one per tunnel as with normal GRE which is point-to-point.

Dynamic discover of IPSEC tunnel endpoints and crypto profiles: Dynamic creation of crypto maps, no need to statically build crypto map for each tunnel endpoint.

NHRP: Allows spoke to be configured with dynamically configured IP address. Also enables zero-touch deployment that makes DMVPN spokes easy to set up. Think of the hub router as a “next-hop server” rather than a traditional VPN router. NHRP is also used for per tunnel QoS feature.

The Per-Tunnel QoS for DMVPN Feature

Allows the administrator to enable QoS on a per-tunnel or per-spoke basis. QoS policy is applied to the mGRE tunnel interface. This protects spokes from each other and keeps one spoke from using all the BW so that there is none left for the others. The QoS policy at the hub is automatically generated for each tunnel when a spoke registers with the hub.

Queuing only kicks in when there is congestion, to signal to the routers QoS mechanism that there is congestion a shaper is used. Shape the traffic flows to the real VPN tunnel bandwidth to produce artificial back pressure. With per-tunnel QoS for DMVPN, a shaper is automatically applied by the system to each and every tunnel. This allows the router to implement differentiated services for the various data flows corresponding to each tunnel. This technique is called Hierarchical Queuing Framework (HQF).

Using NHRP, multiple spokes can be grouped together to use the same QoS policy.

This technique provides QoS in the egress direction of the hub towards the spokes. For QoS from the spokes to the hub, a QoS policy needs to be applied at the spokes.

At this time it is not possible to have an unique policy for traffic between spoke to spoke due to spokes not having access to the NHRP database.

GET VPN QoS Design

Group Encrypted Transport (GET) VPN is a technology to encrypt traffic between IPSEC endpoints without the use of tunnels. Packets transmitted use IPSEC tunnel mode but it is not defined by traditional IPSEC SA.

Because there are no tunnels, the QoS configuration is simplified.

GET VPN QoS Overview

DMVPN is suitable for hub and spoke VPNs over a public untrusted network such as the Internet, GET VPN is suitable for private networks such as a MPLS VPN. A MPLS VPN is private but not encrypted and GET VPN can encrypt the traffic between the MPLS sites. GET VPN has no real concept of hub and spoke, which simplifies the QoS architecture. There is not one major hub aggregating all the remote sites and being liable to massive oversubscription.

These are some of the major differences between DMVPN and GET VPN model:

Group Domain of Interpretation (GDOI)

GDOI is a technology that supports any to any IPSEC VPN without the use of tunnels. There is no concept of SA between specific routers, instead it uses a group SA which is used by all the encrypting nodes in the network. There is no per tunnel QoS needed since it does not use tunnels, QoS is simply applied egress on each GET VPN router.

GDOI control plane protocol uses UDP port 848 and ISAKMP on port UDP 500. These packets are normally marked DSCP CS6 by the router.

IP Header Preservation

Normally with IPSEC tunnel mode the ToS byte is copied to the new IP header but the original IP header is not preserved. On a public network such as the Internet it makes good sense to hide the source and destination IP addresses but GET VPN is deployed on MPLS networks which are private.

GET VPN keeps the original IP header intact which simplifies QoS, dynamic routing and multicast. The packet is still considered an ESP IPSEC packet, not TCP or UDP, so to classify based on port numbers the QoS preclassify feature will still be needed.

How and When to Use the QoS Preclassify Feature
Design principles:

If classification is based on source or destination IP, preclassify is not needed but still recommended
If classification is based on TCP or UDP port numbers, QoS preclassificy is needed
Enable the QoS preclassify feature in GET VPN deployments

A Case for Combining GET VPN and DMVPN

DMVPN has some drawbacks, spoke to hub tunnel is always up but spoke to spoke tunnels are dynamically brought up. This causes a delay which can take a second or two and may have negative impact on real-time traffic. The delay is not caused by NHRP or the packetization of the GRE tunnel but rather the exchange of ISAKMP messaging and the establishment of the IPSEC SAs between the routers.

DMVPN could then be used solely for setting up GRE tunnels and GET VPN for encryption of the packets going into the tunnel. This then allows for fast establishment of tunnels and encrypting the packets, increasing the overall user experience.

Working with Your Service Provider When Deploying GET VPN
Design principles:

Ensure that the service provider handles DSCP consistently troughout the MPLS WAN network

QoS Design Notes for CCDE

Tagged on: CCDE Data center Design DMVPN GET VPN MPLS QoS

15 thoughts on “QoS Design Notes for CCDE”

Haroon
January 18, 2015 at 12:24 pm

Nice write up Daniel thanks for sharing this useful info hope to see new findings
- reaper81Post author
  January 18, 2015 at 5:51 pm
  
  Thank you Haroon. I plan to write more as I progress through my studies.
Hamed
January 20, 2015 at 9:13 am

Daniel,

This is an amazing short note for QOS that I have not seen before . Keep it up and continue your writing . I highly appreciate it !
- reaper81Post author
  January 20, 2015 at 7:15 pm
  
  Thanks, Hamed!
Hamed
January 20, 2015 at 12:42 pm

Daniel,

I shared it in my linkedin
- reaper81Post author
  January 20, 2015 at 7:15 pm
  
  Thanks for sharing!
Waldemar Pera
January 20, 2015 at 1:56 pm

Great post !!!
- reaper81Post author
  January 20, 2015 at 7:15 pm
  
  Thank you!
Willy
January 21, 2015 at 11:47 am

Great summary of all the important points related to QOS. Best of luck for the CCDE exam
Tim Martin
March 22, 2015 at 7:49 pm

Very nice work. Thanks for sharing
John Rodgers
April 6, 2015 at 7:47 pm

Great article there Daniel .
- reaper81Post author
  April 6, 2015 at 7:51 pm
  
  Thanks, John!
Bengt J. Olsson
August 3, 2015 at 10:45 am

Very nice and comprehensive QoS overview. For a more detailed understanding of IP QoS I can recommend an article I wrote for the NAB show in Las Vegas last year: “IP QoS Objectives for Broadcast Services”: https://twitter.com/bengtxyz/status/554192638304616448
- reaper81Post author
  August 3, 2015 at 11:05 am
  
  Thanks, Bengt. I’ve downloaded it to read through it later. Nice hearing from a fellow Swede.
Srinivas
April 23, 2019 at 11:41 pm

this is one of the best document ever written on QOS(in nutshell)

15 thoughts on “QoS Design Notes for CCDE”

Leave a Reply Cancel reply