In the previous post on MSS, MSS Clamping, PMTUD, and MTU, we learned how PMTUD is performed by setting the Don’t fragment flag in the IP header which leads to the device that needs to perform fragmentation dropping the packet and sending ICMP Fragmentation needed packet towards the source. In MPLS-enabled networks, it’s not always possible to send the ICMP packet straight towards the source as the P routers have no knowledge of the customer specific networks. In RFC 3032 – MPLS Label Stack Encoding, such a scenario is described:

Suppose one is using MPLS to "tunnel" through a transit routing
domain, where the external routes are not leaked into the domain's
interior routers. For example, the interior routers may be running
OSPF, and may only know how to reach destinations within that OSPF
domain. The domain might contain several Autonomous System Border
Routers (ASBRs), which talk BGP to each other. However, in this
example the routes from BGP are not distributed into OSPF, and the
LSRs which are not ASBRs do not run BGP.

In this example, only an ASBR will know how to route to the source of
some arbitrary packet. If an interior router needs to send an ICMP
message to the source of an IP packet, it will not know how to route
the ICMP message.

Then, a scenario using private IP addresses, which would be the norm for MPLS L3 VPNs, is described:

In some cases where MPLS is used to tunnel through a routing domain,
it may not be possible to route to the source address of a fragmented
packet at all. This would be the case, for example, if the IP
addresses carried in the packet were private (i.e., not globally
unique) addresses, and MPLS were being used to tunnel those packets
through a public backbone. Default routing to an ASBR will not work
in this environment.

In this environment, in order to send an ICMP message to the source
of a packet, one can copy the label stack from the original packet to
the ICMP message, and then label switch the ICMP message. This will
cause the message to proceed in the direction of the original
packet's destination, rather than its source. Unless the message is
label switched all the way to the destination host, it will end up,
unlabeled, in a router which does know how to route to the source of
original packet, at which point the message will be sent in the
proper direction.

This technique can be very useful if the ICMP message is a "Time
Exceeded" message or a "Destination Unreachable because fragmentation
needed and DF set" message.

To make things a bit more visual, I have created the diagram below:

The PE routers have both customer routes and internal routes, while P routers have internal routes only. The P routers are unaware of both customer networks, as well as VPN labels generated by BGP. If they were to generate an ICMP packet, they don’t know how to get the packet to the source.

To demonstrate the behavior of sending the ICMP Fragmentation needed packet in MPLS-enabled networks, we’ll use a simple lab consisting of two PE routers, two P routers, and two hosts, as shown below:

From HostA, we’ll send an ICMP packet towards HostB that’s 1500 bytes, 1472 bytes of payload, 8 bytes of ICMP, and 20 bytes of IP with the Don’t fragment flag set:

cisco@HostA:~$ ping 10.0.2.10 -s 1472 -M do
PING 10.0.2.10 (10.0.2.10) 1472(1500) bytes of data.
From 10.0.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1492)

There’s an ICMP packet received from R1 with the IP 10.0.1.1, but the interesting part is how the packet got there. Let’s take it step by step by first looking at the packet that HostA sends. HostA sends the following packet:

Frame 2: 1514 bytes on wire (12112 bits), 1514 bytes captured (12112 bits)
Ethernet II, Src: 52:54:00:0b:af:08, Dst: 52:54:00:1a:36:dd
Internet Protocol Version 4, Src: 10.0.1.10, Dst: 10.0.2.10
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
    Total Length: 1500
    Identification: 0x0000 (0)
    010. .... = Flags: 0x2, Don't fragment
    ...0 0000 0000 0000 = Fragment Offset: 0
    Time to Live: 64
    Protocol: ICMP (1)
    Header Checksum: 0x1e0e [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 10.0.1.10
    Destination Address: 10.0.2.10
    [Stream index: 0]
Internet Control Message Protocol
    Type: 8 (Echo (ping) request)
    Code: 0
    Checksum: 0x4a6b [correct]
    [Checksum Status: Good]
    Identifier (BE): 9 (0x0009)
    Identifier (LE): 2304 (0x0900)
    Sequence Number (BE): 1 (0x0001)
    Sequence Number (LE): 256 (0x0100)
    [No response seen]
    Timestamp from icmp data: Sep  4, 2024 15:35:15.761705000 Paris, Madrid, sommartid
    [Timestamp from icmp data (relative): 0.002196000 seconds]
    Data (1456 bytes)

This packet reaches R1. R1 has a route for 10.0.2.10:

R1#show ip route vrf CUST 10.0.2.10

Routing Table: CUST
Routing entry for 10.0.2.0/24
  Known via "bgp 65000", distance 200, metric 0, type internal
  Last update from 192.0.2.2 5d16h ago
  Routing Descriptor Blocks:
  * 192.0.2.2 (default), from 192.0.2.2, 5d16h ago
      Route metric is 0, traffic share count is 1
      AS Hops 0
      MPLS label: 19
      MPLS Flags: MPLS Required

The VPN label is 19 and the transport label is 18:

R1#show mpls forwarding-table 192.0.2.2
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop    
Label      Label      or Tunnel Id     Switched      interface              
18         18         192.0.2.2/32     0             Gi0/1      10.0.0.5

However, the IP MTU of the outgoing interface is 1500 bytes:

R1#show int gi0/1 | i MTU
  MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec, 

The IP packet that is to be transported by MPLS is 1500 bytes, with the additional 8 bytes it would be 1508 bytes which is more than the interface can handle. As the Don’t fragment flag is set, the packet is dropped and an ICMP Fragmentation needed packet is generated and sent towards MPLS1:

Frame 11: 78 bytes on wire (624 bits), 78 bytes captured (624 bits)
Ethernet II, Src: 52:54:00:10:d1:5d, Dst: 52:54:00:05:58:02
MultiProtocol Label Switching Header, Label: 18, Exp: 0, S: 0, TTL: 63
MultiProtocol Label Switching Header, Label: 19, Exp: 0, S: 1, TTL: 63
Internet Protocol Version 4, Src: 10.0.1.1, Dst: 10.0.1.10
Internet Control Message Protocol
    Type: 3 (Destination unreachable)
    Code: 4 (Fragmentation needed)
    Checksum: 0xa4b2 [correct]
    [Checksum Status: Good]
    Unused: 0000
    MTU of next hop: 1492
    Internet Protocol Version 4, Src: 10.0.1.10, Dst: 10.0.2.10
        0100 .... = Version: 4
        .... 0101 = Header Length: 20 bytes (5)
        Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        Total Length: 1500
        Identification: 0x0000 (0)
        010. .... = Flags: 0x2, Don't fragment
        ...0 0000 0000 0000 = Fragment Offset: 0
        Time to Live: 63
        Protocol: ICMP (1)
        Header Checksum: 0x1f0e [validation disabled]
        [Header checksum status: Unverified]
        Source Address: 10.0.1.10
        Destination Address: 10.0.2.10
        [Stream index: 6]
    Internet Control Message Protocol
        Type: 8 (Echo (ping) request)
        Code: 0
        Checksum: 0x4a6b [unverified] [in ICMP error packet]
        [Checksum Status: Unverified]
        Identifier (BE): 9 (0x0009)
        Identifier (LE): 2304 (0x0900)
        Sequence Number (BE): 1 (0x0001)
        Sequence Number (LE): 256 (0x0100)

Notice the two labels and that the original IP and ICMP headers are included in the packet. Also note that the TTL of the IP packet was copied to the MPLS headers.

MPLS1 sends packet towards MPLS2:

Frame 7: 78 bytes on wire (624 bits), 78 bytes captured (624 bits)
Ethernet II, Src: 52:54:00:1d:03:db, Dst: 52:54:00:15:2a:07
MultiProtocol Label Switching Header, Label: 18, Exp: 0, S: 0, TTL: 62
MultiProtocol Label Switching Header, Label: 19, Exp: 0, S: 1, TTL: 63
Internet Protocol Version 4, Src: 10.0.1.1, Dst: 10.0.1.10
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0xc0 (DSCP: CS6, ECN: Not-ECT)
    Total Length: 56
    Identification: 0x047d (1149)
    000. .... = Flags: 0x0
    ...0 0000 0000 0000 = Fragment Offset: 0
    Time to Live: 255
    Protocol: ICMP (1)
    Header Checksum: 0xa07d [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 10.0.1.1
    Destination Address: 10.0.1.10
    [Stream index: 3]
Internet Control Message Protocol
    Type: 3 (Destination unreachable)
    Code: 4 (Fragmentation needed)
    Checksum: 0xa4b2 [correct]
    [Checksum Status: Good]
    Unused: 0000
    MTU of next hop: 1492
    Internet Protocol Version 4, Src: 10.0.1.10, Dst: 10.0.2.10
        0100 .... = Version: 4
        .... 0101 = Header Length: 20 bytes (5)
        Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        Total Length: 1500
        Identification: 0x0000 (0)
        010. .... = Flags: 0x2, Don't fragment
        ...0 0000 0000 0000 = Fragment Offset: 0
        Time to Live: 63
        Protocol: ICMP (1)
        Header Checksum: 0x1f0e [validation disabled]
        [Header checksum status: Unverified]
        Source Address: 10.0.1.10
        Destination Address: 10.0.2.10
        [Stream index: 4]
    Internet Control Message Protocol
        Type: 8 (Echo (ping) request)
        Code: 0
        Checksum: 0x4a6b [unverified] [in ICMP error packet]
        [Checksum Status: Unverified]
        Identifier (BE): 9 (0x0009)
        Identifier (LE): 2304 (0x0900)
        Sequence Number (BE): 1 (0x0001)
        Sequence Number (LE): 256 (0x0100)

R2 receives the following packet from MPLS2:

Frame 3: 74 bytes on wire (592 bits), 74 bytes captured (592 bits)
Ethernet II, Src: 52:54:00:00:63:30, Dst: 52:54:00:1a:cb:1f
MultiProtocol Label Switching Header, Label: 19, Exp: 0, S: 1, TTL: 61
Internet Protocol Version 4, Src: 10.0.1.1, Dst: 10.0.1.10
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0xc0 (DSCP: CS6, ECN: Not-ECT)
    Total Length: 56
    Identification: 0x047d (1149)
    000. .... = Flags: 0x0
    ...0 0000 0000 0000 = Fragment Offset: 0
    Time to Live: 255
    Protocol: ICMP (1)
    Header Checksum: 0xa07d [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 10.0.1.1
    Destination Address: 10.0.1.10
    [Stream index: 2]
Internet Control Message Protocol
    Type: 3 (Destination unreachable)
    Code: 4 (Fragmentation needed)
    Checksum: 0xa4b2 [correct]
    [Checksum Status: Good]
    Unused: 0000
    MTU of next hop: 1492
    Internet Protocol Version 4, Src: 10.0.1.10, Dst: 10.0.2.10
        0100 .... = Version: 4
        .... 0101 = Header Length: 20 bytes (5)
        Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        Total Length: 1500
        Identification: 0x0000 (0)
        010. .... = Flags: 0x2, Don't fragment
        ...0 0000 0000 0000 = Fragment Offset: 0
        Time to Live: 63
        Protocol: ICMP (1)
        Header Checksum: 0x1f0e [validation disabled]
        [Header checksum status: Unverified]
        Source Address: 10.0.1.10
        Destination Address: 10.0.2.10
        [Stream index: 3]
    Internet Control Message Protocol
        Type: 8 (Echo (ping) request)
        Code: 0
        Checksum: 0x4a6b [unverified] [in ICMP error packet]
        [Checksum Status: Unverified]
        Identifier (BE): 9 (0x0009)
        Identifier (LE): 2304 (0x0900)
        Sequence Number (BE): 1 (0x0001)
        Sequence Number (LE): 256 (0x0100)

As this is the end of the Label Switched Path (LSP), R2 will do a lookup of 10.0.1.10:

R2#show ip route vrf CUST 10.0.1.10

Routing Table: CUST
Routing entry for 10.0.1.0/24
  Known via "bgp 65000", distance 200, metric 0, type internal
  Last update from 192.0.2.1 5d17h ago
  Routing Descriptor Blocks:
  * 192.0.2.1 (default), from 192.0.2.1, 5d17h ago
      Route metric is 0, traffic share count is 1
      AS Hops 0
      MPLS label: 19
      MPLS Flags: MPLS Required

R2#show mpls forwarding-table 192.0.2.1
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop    
Label      Label      or Tunnel Id     Switched      interface              
16         16         192.0.2.1/32     0             Gi0/1      10.0.0.9

R2 label switches the packet towards MPLS2:

Frame 4: 78 bytes on wire (624 bits), 78 bytes captured (624 bits)
Ethernet II, Src: 52:54:00:1a:cb:1f, Dst: 52:54:00:00:63:30
MultiProtocol Label Switching Header, Label: 16, Exp: 6, S: 0, TTL: 60
MultiProtocol Label Switching Header, Label: 19, Exp: 6, S: 1, TTL: 60
Internet Protocol Version 4, Src: 10.0.1.1, Dst: 10.0.1.10
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0xc0 (DSCP: CS6, ECN: Not-ECT)
    Total Length: 56
    Identification: 0x047d (1149)
    000. .... = Flags: 0x0
    ...0 0000 0000 0000 = Fragment Offset: 0
    Time to Live: 60
    Protocol: ICMP (1)
    Header Checksum: 0x637e [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 10.0.1.1
    Destination Address: 10.0.1.10
    [Stream index: 2]
Internet Control Message Protocol
    Type: 3 (Destination unreachable)
    Code: 4 (Fragmentation needed)
    Checksum: 0xa4b2 [correct]
    [Checksum Status: Good]
    Unused: 0000
    MTU of next hop: 1492
    Internet Protocol Version 4, Src: 10.0.1.10, Dst: 10.0.2.10
        0100 .... = Version: 4
        .... 0101 = Header Length: 20 bytes (5)
        Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        Total Length: 1500
        Identification: 0x0000 (0)
        010. .... = Flags: 0x2, Don't fragment
        ...0 0000 0000 0000 = Fragment Offset: 0
        Time to Live: 63
        Protocol: ICMP (1)
        Header Checksum: 0x1f0e [validation disabled]
        [Header checksum status: Unverified]
        Source Address: 10.0.1.10
        Destination Address: 10.0.2.10
        [Stream index: 3]
    Internet Control Message Protocol
        Type: 8 (Echo (ping) request)
        Code: 0
        Checksum: 0x4a6b [unverified] [in ICMP error packet]
        [Checksum Status: Unverified]
        Identifier (BE): 9 (0x0009)
        Identifier (LE): 2304 (0x0900)
        Sequence Number (BE): 1 (0x0001)
        Sequence Number (LE): 256 (0x0100)

MPLS2 label switches it towards MPLS1:

Frame 8: 78 bytes on wire (624 bits), 78 bytes captured (624 bits)
Ethernet II, Src: 52:54:00:15:2a:07, Dst: 52:54:00:1d:03:db
MultiProtocol Label Switching Header, Label: 16, Exp: 6, S: 0, TTL: 59
MultiProtocol Label Switching Header, Label: 19, Exp: 6, S: 1, TTL: 60
Internet Protocol Version 4, Src: 10.0.1.1, Dst: 10.0.1.10
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0xc0 (DSCP: CS6, ECN: Not-ECT)
    Total Length: 56
    Identification: 0x047d (1149)
    000. .... = Flags: 0x0
    ...0 0000 0000 0000 = Fragment Offset: 0
    Time to Live: 60
    Protocol: ICMP (1)
    Header Checksum: 0x637e [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 10.0.1.1
    Destination Address: 10.0.1.10
    [Stream index: 3]
Internet Control Message Protocol
    Type: 3 (Destination unreachable)
    Code: 4 (Fragmentation needed)
    Checksum: 0xa4b2 [correct]
    [Checksum Status: Good]
    Unused: 0000
    MTU of next hop: 1492
    Internet Protocol Version 4, Src: 10.0.1.10, Dst: 10.0.2.10
        0100 .... = Version: 4
        .... 0101 = Header Length: 20 bytes (5)
        Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        Total Length: 1500
        Identification: 0x0000 (0)
        010. .... = Flags: 0x2, Don't fragment
        ...0 0000 0000 0000 = Fragment Offset: 0
        Time to Live: 63
        Protocol: ICMP (1)
        Header Checksum: 0x1f0e [validation disabled]
        [Header checksum status: Unverified]
        Source Address: 10.0.1.10
        Destination Address: 10.0.2.10
        [Stream index: 4]
    Internet Control Message Protocol
        Type: 8 (Echo (ping) request)
        Code: 0
        Checksum: 0x4a6b [unverified] [in ICMP error packet]
        [Checksum Status: Unverified]
        Identifier (BE): 9 (0x0009)
        Identifier (LE): 2304 (0x0900)
        Sequence Number (BE): 1 (0x0001)
        Sequence Number (LE): 256 (0x0100)

MPLS1 label switches it towards R1:

Frame 12: 74 bytes on wire (592 bits), 74 bytes captured (592 bits)
Ethernet II, Src: 52:54:00:05:58:02, Dst: 52:54:00:10:d1:5d
MultiProtocol Label Switching Header, Label: 19, Exp: 6, S: 1, TTL: 58
Internet Protocol Version 4, Src: 10.0.1.1, Dst: 10.0.1.10
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0xc0 (DSCP: CS6, ECN: Not-ECT)
    Total Length: 56
    Identification: 0x047d (1149)
    000. .... = Flags: 0x0
    ...0 0000 0000 0000 = Fragment Offset: 0
    Time to Live: 60
    Protocol: ICMP (1)
    Header Checksum: 0x637e [validation disabled]
    [Header checksum status: Unverified]
    Source Address: 10.0.1.1
    Destination Address: 10.0.1.10
    [Stream index: 5]
Internet Control Message Protocol
    Type: 3 (Destination unreachable)
    Code: 4 (Fragmentation needed)
    Checksum: 0xa4b2 [correct]
    [Checksum Status: Good]
    Unused: 0000
    MTU of next hop: 1492
    Internet Protocol Version 4, Src: 10.0.1.10, Dst: 10.0.2.10
        0100 .... = Version: 4
        .... 0101 = Header Length: 20 bytes (5)
        Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        Total Length: 1500
        Identification: 0x0000 (0)
        010. .... = Flags: 0x2, Don't fragment
        ...0 0000 0000 0000 = Fragment Offset: 0
        Time to Live: 63
        Protocol: ICMP (1)
        Header Checksum: 0x1f0e [validation disabled]
        [Header checksum status: Unverified]
        Source Address: 10.0.1.10
        Destination Address: 10.0.2.10
        [Stream index: 6]
    Internet Control Message Protocol
        Type: 8 (Echo (ping) request)
        Code: 0
        Checksum: 0x4a6b [unverified] [in ICMP error packet]
        [Checksum Status: Unverified]
        Identifier (BE): 9 (0x0009)
        Identifier (LE): 2304 (0x0900)
        Sequence Number (BE): 1 (0x0001)
        Sequence Number (LE): 256 (0x0100)

Only the VPN label (19) is remaining as the transport label was popped by MPLS1. R1 then sends the IP packet towards HostA:

Frame 3: 70 bytes on wire (560 bits), 70 bytes captured (560 bits)
Ethernet II, Src: 52:54:00:1a:36:dd, Dst: 52:54:00:0b:af:08
Internet Protocol Version 4, Src: 10.0.1.1, Dst: 10.0.1.10
Internet Control Message Protocol
    Type: 3 (Destination unreachable)
    Code: 4 (Fragmentation needed)
    Checksum: 0xa4b2 [correct]
    [Checksum Status: Good]
    Unused: 0000
    MTU of next hop: 1492
    Internet Protocol Version 4, Src: 10.0.1.10, Dst: 10.0.2.10
        0100 .... = Version: 4
        .... 0101 = Header Length: 20 bytes (5)
        Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        Total Length: 1500
        Identification: 0x0000 (0)
        010. .... = Flags: 0x2, Don't fragment
        ...0 0000 0000 0000 = Fragment Offset: 0
        Time to Live: 63
        Protocol: ICMP (1)
        Header Checksum: 0x1f0e [validation disabled]
        [Header checksum status: Unverified]
        Source Address: 10.0.1.10
        Destination Address: 10.0.2.10
        [Stream index: 0]
    Internet Control Message Protocol
        Type: 8 (Echo (ping) request)
        Code: 0
        Checksum: 0x4a6b [unverified] [in ICMP error packet]
        [Checksum Status: Unverified]
        Identifier (BE): 9 (0x0009)
        Identifier (LE): 2304 (0x0900)
        Sequence Number (BE): 1 (0x0001)
        Sequence Number (LE): 256 (0x0100)

The path of the Fragmentation needed packet is shown below:

In this post we learned that ICMP Fragmentation needed packets are always sent via the same LSP as the original packet. This is because the router may not have knowledge how to reach the source. Note that the PEs, R1 and R2 in this scenario, have knowledge about the customer networks, but still use the same behavior as the P routers. Most likely to achieve consistent behavior regardless of if the MPLS-enabled router is a PE or P device.

PMTUD in MPLS-enabled Networks
Tagged on:             

Leave a Reply

Your email address will not be published. Required fields are marked *