CCDE – Introduction to GET VPN and GET VPN Design Considerations

Introduction to GET VPN

GET VPN is a Cisco proprietary technology aimed for private WAN designs where there is a need to encrypt the traffic. This may be due to regulatory requirements or just a need to keep traffic private. GET VPN is common deployed over private WAN topologies such as MPLS VPN or VPLS.

GET VPN uses IPSec to encrypt the traffic but the main concept of GET VPN is to use group security association (SA) as opposed to the standard LAN to LAN tunnels where the SA is created in a point to point fashion.

Technologies such as DMVPN requires overlaying a secondary routing infrastructure through the tunnels while GET VPN can use the underlying routing infrastructure. Traditional point to point IPSec tunneling solutions suffer from multicast replication issues because the replication must be performed before tunnel encapsulation and encryption at the router closest to the source. The provider will see all traffic as unicasts due to the overlay which means that replication can not performed in the provider network.

In GET VPN, all group members (GMs) share a common SA which is also known as the group SA. A GM can then decrypt traffic that was encrypted by another GM. There is no need to negotiate point to point IPSec tunnels because GET VPN is “tunnel-less”.

Group Domain of Interpretation (GDOI) is a key function of GET VPN and is what enables GET VPN to exchange the keys and policies used for encryption.

Some of the main benefits of GET VPN are.

Large scale any to any IP connectivity using a group IPSec security paradigm
It takes advantage of the underlying VPN routing infrastructure and does not require an overlay routing control plane
Multicast performs better because it does not have the multicast replication issues that are typically seen in traditional tunnel based IPSec solutions
The IP source and destination addresses are preserved during the IPSec encryption and encapsulation process. GET VPN therefore integrates well with features such as QoS and traffic engineering

GET VPN Overview

The following are the most important components that make up GET VPN.

GDOI
Key Servers (KSs)
Cooperative (COOP) KSs
GMs
IP tunnel header preservation
Group security association
Rekey mechanism
Time-based anti replay (TBAR)

GDOI

The GDOI group key management protocol is what is used to provide a set of cryptographic keys and policies to the GMs. GDOI distributes the common IPSec keys to a group of enterprise VPN gateways that must communicate securely. These keys are periodically refreshed through a process called rekey.

The GDOI protocol is protected by a Phase 1 Internet Key Exchange (IKE) SA. The participating VPN gateways authenticate themselves to the device providing keys using IKE. Authentication can be performed with a pre-shared key (PSK) or through a public key infrastructure (PKI). After the VPN gateways have been authenticated and provided with the appropriate security keys via the IKE SA, the IKE SA will expire and GDOI is then used to update the GMs in a more scalable and efficient manner.

GDOI uses two different types of encryption keys. The key that is used for securing the control plane is the Key Encryption Key (KEK) and the key used for the data plane to encrypt the traffic is called the Traffic Encryption Key (TEK).

Tunnel Header Preservation

I mentioned earlier that the IP header is preserved in GET VPN. In a traditional IPSec, the tunnel endpoint addresses are used as the new packet source and destination. The tunnel header preservation seems very similar to IPSec transport mode but the underlying mode of operation is IPSec tunnel mode. IPSec transport mode reuses the original IP header but it suffers from fragmentation and reassembly limitation and must not be used in deployments where encrypted or clear text packets may require fragmentation.

GET VPN is suitable for MPLS, L2 or an IP infrastructure with end to end IP connectivity. It’s not suitable for deploying over the Internet though because the addresses are typically not routable and NAT functions interfere with the tunnel header preservation. GET VPN can be combined with DMVPN where DMVPN is used for routing and GET VPN for encryption.

Key Servers

A key server is an IOS device that is responsible for creating and maintaining the GET VPN control plane. The KS is a centralized device that will push encryption policies, such as interesting traffic, encryption protocols, security association, and rekey timers to the GMs at registration time. This means that almost all the configuration is done on the KS.

GMs use IKE Phase 1 to authenticate to the KS and then download the encryption policies and keys required for GET VPN operation. The KS is responsible for refreshing and distributing these keys.

The interesting traffic is defined on the KS using an ACL and is downloaded to every GM, whether it owns that network or not. It is recommended to use a symmetric and summarized policy such as “permit 10.0.0.0/8 to 10.0.0.0/8” as opposed to “permit 10.1.1.0/24 to 10.1.2.0/24”. With the second permit statement another line would have to be added to permit traffic in the other direction from 10.1.2.0/24 to 10.1.1.0/24 which means that the ACL will grow a lot faster.

It’s also possible to use “permit ip any any” in the ACL but protocols that shouldn’t be encrypted would have to be denied before the permit statement, such as PIM, routing protocols and management traffic.

A KS can not be a GM, the roles are mutually exclusive.

GMs

A GM is an IOS router doing the actual encryption and decryption, meaning that it is the device responsible for the GET VPN data plane. The GM only needs to be configured with IKE Phase 1 parameters, it will get the rest of the configuration from the KS after registering with it. The GM will based on the downloaded policy decide what traffic needs to be encrypted or decrypted and what keys to use. Policies are configured at the KS but a local policy can be used at the GM to deny traffic to be encrypted and decrypted. This can be used if a GM has a differing policy than the global one. One example would be if a few GMs run another routing protocol than the rest of the GMs. The GM can only use deny statements and not permit statements.

Group SA

GET VPN uses a group SA meaning that all the members of the GET VPN group can communicate with each other using a common encryption policy and a shared SA. This also means that there is no need to negotiate IPSec between GMs. This reduces load on the GMs.

The ACL used for interesting traffic should not have more than 100 permit entries. Each permit entry in the ACL will result in a pair of SAs which means that the number of SAs should not exceed 200.

It’s easy to get confused by the group SA concept. A group SA does not mean that only one SA is used. It means that a group of routers can use the same SA as opposed to point to point fashion normally used by IPSec. There can still be several SAs and the number of SAs is dependant on how the ACL is configured for interesting traffic.

Rekey Process

The keys used for GET VPN need to be refreshed and distributed to the GMs. The rekey process can be handled by unicast or multicast.

If a GM does not get rekey information from the KS, it will try to reregister with an ordered set of KSs before the existing IPSec SAs expire. If it is successful, it will receive new SAs as part of the reregistration process and traffic in the data plane can flow without disruption.

Unicast Rekey

When using unicast, the KS will generate a rekey message and send multiple copies of this message, one copy to each GM. The GM will then ACK this rekey message to the KS. The ACK mechanism keeps the list of GMs at the KS current and ensures that the rekey message is only sent to active GMs.

A KS can be configured to retransmit rekey messages to overcome reachability issues in the network. If a GM does not send an ACK for three consecutive rekey messages, the KS will remove the GM from the active GM database and stop sending rekey messages to that GM.

Multicast Rekey

When using multicast for the rekey process, a single copy of the rekey message is sent to a multicast group which the GMs will have joined. Each GM joins this multicast group at registration, each GM will therefore receive this rekey message.

The multicast rekey process does not have an ACK mechanism and the KS does not keep a list of active GMs. The KS can be configured to retransmit rekey messages. The CPU overhead is a lot less with multicast rekeying since only a single messages needs to be sent compared to having to replicating it in unicast mode to potentially hundreds of routers.

To use multicast rekey, multicast must be supported in network where GET VPN is running, such as VPLS or MPLS VPN.

COOP KSs

Based in the information so far, we can see that the KS is a very critical part of the GET VPN concept. This means that we will want to ensure that GET VPN functions even when the KS goes away, hence multiple KSs. Cooperative KSs are used to ensure seamless fault recovery if a KS fails or becomes unreachable.

A GM is configured to join a list of KSs where the order they are listed in decides which KS to try to register with first.

At bootup COOP KSs will assume a secondary role and then hold an election to decide which KS becomes the primary one. The primary KS gets elected based on the highest priority configured. The primary KS is responsible for the creation and distribution of group policies to all the GMs. The primary KS will synchronize the COOP KSs.

GMs can register to any KS but only the primary KS will send rekey messages. It is recommended to distribute the GM registration to several KSs to lessen the IKE load on the KS.

COOP KSs exchange one-way announcement messages in the direction of primary to secondary. If a secondary KS does not hear from the primary KS it will contact the primary KS to request updated information. If the primary KS does not respond or if the secondary KS does not hear from the primary KS, a COOP reelection is triggered and a new primary KS will be selected.

Up to eight KSs can be configured in the same group but since rekey messages are always sent from the primary KS, the advantage of having multiple KSs is the ability to handle the registration load in case of network failure and reregistration taking place at the same time. If PKI is used, this is more important because the CPU load when using PKI is a lot higher than with PSK.

Periodic ISAKMP keepalives should be enabled between the KSs. That way the primary KS can keep track of the secondary KSs. IKE keepalives between the KS and GM should not be configured and is not supported.

Time Based Anti Replay

Traditional IPSec solutions have anti replay capabilities to prevent a malicious third party from capturing IPSec packets and relaying those packets at a later time to perform an attack against the IPSec endpoints. This is normally done by having a counter based sliding window where the sender sends a packet with a sequence number and the receiver will use the sliding window to determine if the packet is acceptable or if it has arrived out of sequence and outside the window of acceptable packets.

This mechanism is not useful in GET VPN because it uses a group SA. GET VPN therefore uses a time based anti replay function where the KS uses a pseudo time clock. Because the KS uses a pseudo time clock, there is no need to synchronize the time with NTP for this reason.

The primary KS will keep this pseudo time synchronized on all GMs with rekey updates. Every GM will include its pseudo time as a time stamp in the data packets. The receiving VPN gateway will then compare the time stamp of the received packet with the GM reference pseudo time clock it maintains for the group. If the packet is too late it will be dropped.

GET VPN Solution Comparison

The following table compares EzVPN, DMVPN and GET VPN.

KS Selection

KS selection depends mostly on the required network scalability (the number of GMs supported in a group). The limiting factors for KS scalability are the registration rate and the ability of the KS to handle rekeys towards the GMs. The registration rate is the single most important for KS selection. Using multicast for rekeying will lessen the load on the KS.

Using PKI to register the GMs with the KS decreases the registration rate which means that PSK scales better from a performance perspective.

GM Selection

Selecting the GM is mainly based on the required throughput or packet forwarding rate. If a lot of small packets are used such as with VoIP, then the pps number becomes more important than the throughput number.

Number of SAs

The number of supported SAs on a platform will vary depending on the platform. This number needs to be verified compared to the desired number of SAs in the GET VPN network. Creating an efficient ACL for interesting traffic will decrease the number of SAs in use and therefore a cheaper platform may be used than if a lot of SAs needed to be used.

The following diagram shows a typical GET VPN design.

The KSs should not be located behind the GMs because then the entire GET VPN setup becomes dependant on those GMs to function properly.

Fail Open or Fail Close

Normally a GM will be able to send traffic in clear text before it has registered with the KS, this is referred to as Fail-Open. In Fail-Close mode, no traffic is allowed to be sent before the GM has registered with the KS so no clear text traffic can be sent. Fail-Close increases the security of GET VPN by enforcing that:

Prior to registration or during registration, the GM will drop any packets arriving in the clear
Failure of any step in the registration process will also lead to the GM to dropping clear text packets

When the GM has registered with the KS and it is operating in Fail-Close mode it will be able to send and receive packets in clear text if they are not part of the policy defined by the KS. Packets matching the policy downloaded from the KS will be encrypted and decrypted. If the GM does not receive rekey messages and is not able to reregister it will drop traffic matching the policy but forward the clear text packets that are not part of the policy. After a GM has successfully registered it will keep the downloaded policy even if the IPSec SAs expire.

Fail-Close mode must be explicitly configured and combined with an ACL.

Number of KSs

Using only a single KS means that there is a single point of failure in the network. The KS is only responsible for the control plane though, traffic will be forwarded as long as the GMs have valid keys and SAs. At least two KSs should be used. It is recommended to use a few KSs as necessary to scale the network and provide control plane resiliency.

As the number of GMs increase it may be desirable to place the KSs in geographically dispersed locations such as different data center locations. That way, if an entire location goes offline, there is still a KS available in another location. One design can be to have two KSs in one site and one KS in another site. It’s also possible to have two KSs in one site and two KSs in another site.

Priorities of the KSs can be set depending on different failure scenarios. Either one site can have the two KSs with the highest priority which is mostly protecting against device failure. Another option is to have KS1 in location A have the highest priority and KS1 in location B have the second highest priority. Setting the priorities like this assumes that it’s more likely for a site to go offline than a device which may be an arguable perspective.

Load Balancing

Whenever a KS (primary or secondary) receives a new GM registration, it will send an announcement message with policy information for the new GM to the other KSs. This keeps the GM database synchronized on all the KSs.

Load balancing of GM registrations can be achieved in the following ways:

Load balancing using configuration
Load balancing using routing
Load balancing using server load balancing (SLB)

Load balancing using configuration is achieved by configuring the order of KSs differently on different GMs. Half of the GMs could be configured to use KS1 as the primary and the other half to use KS2 as the primary. It’s also possible to use more KSs and different schemes to balance the load.

Load balancing using routing is achieved by using anycast. This means that several KSs will be using the same IP address so that only one KS is configured on the GMs. The KSs still need to have an unique IP address to be able to synchronize the GM database. This concept is very similar to anycast RP and MSDP.

It’s also possible to do load balancing by using SLB. This is a concept that is sometimes used in DMVPN designs to scale the number of spokes towards the hubs. The SLB device is then configured with a virtual IP (VIP) and this KS address is configured on the GMs. The SLB device is then responsible for sending the registration towards different KSs in the backend.

It’s important to configure all the KSs with identical policies.

It’s important that the KSs can communicate with each other. For that reason it is important to have multiple paths between the KSs to increase the resiliency of the design. If the KSs can’t communicate with each other they may start operating as split brain devices.

MTU Considerations

As in any design where IPSec is involved, there can be issues with MTU in the network. Hosts should use PMTU Discovery to find the largest MTU supported on the transit path.

A lower MTU can be set on hosts to make sure that there is no fragmentation of packets, this is very cumbersome though. A more common approach is to use the ip tcp adjust-mss command on the LAN of the GM. The GM will then intercept TCP packets and report back a MSS to the host which ensures it uses a MSS which is supported on the transit path without fragmenting the packets.

For non TCP traffic, if the DF bit is set the GM will drop the packet and send back an ICMP message to the sender notifying it to adjust its MTU. If the sender and the application is PMTU compliant, this will result in packets that can be properly handled by the WAN.

Some hosts or applications may not be compliant with PMTU, if ip tcp adjust-mss is configured it will take care of the TCP packets. For UDP, the only workaround is to clear the DF bit if it is set and fragment the packets.

VRF Aware GET VPN

GET VPN GMs are VRF aware but KSs are not. For this reason it is recommended to deploy a distinct set of KSs per group. This means that a KS set per VRF is recommended.

It is possible to have a set of KSs serve all VRFs by allowing all the GM VRF subinterfaces to connect to a shared interface on the KS. However, special security considerations should be taken for such a design.

Receive Only SA and Passive Mode GET VPN

When deploying GET VPN, it is desirable to ensure that all GMs can decrypt traffic before starting to encrypt the traffic. The receive only SA is configured on the KS and ensures that the GMs install a SA in the inbound direction only. Outbound traffic will not be encrypted. When all GMs have been enabled for GET VPN, the receive only SA command can be removed from the KS and the GMs will start to encrypt the traffic.

Passive mode is configured on the GMs and overrides the setting from the KS. The receive only SA affects the entire topology but the passive mode only affects the local GM. This is useful if encryption is configured for the topology but a few GMs need to be tested without encryption.

It’s also possible to enable encryption on a few GMs when the KS is configured with the receive-only SA to test that encryption is working before enabling it fully in the topology.

NAT

NAT is not supported with GET VPN because it uses IP header preservation which is not compatible with GET VPN.

Combining GET VPN and DMVPN

There is a use case for combining GET VPN and DMVPN. DMVPN is commonly used when there is an Internet transport, GET VPN is not suitable for Internet transport. When DMVPN uses IPSec, it builds SAs in a point to point fashion. If a spoke needs to communicate with another spoke it will have to negotiate IPSec parameters and build the tunnel. This adds delay to traffic passing between spokes. If the spokes use technologies such as VoIP, this may lead to poor performance of the calls initially.

By combining DMVPN with GET VPN, Internet transport can be used and DMVPN builds the routing tables and mGRE tunnels but the tunnel less SA of GET VPN can be used. That way, there is no need to build point to point SAs which removes the initial delay of communicating between the spokes. This should improve the performance of VoIP calls.

This post should give you a good overview of GET VPN and the design considerations involved in it.