gurfin / MTU and PMTUD on L2VPNs

Created Fri, 26 Apr 2024 12:22:55 +0100 Modified Wed, 15 Jan 2025 18:47:33 +0000
583 Words

I ran into an interesting issue at work today. One of our customers were having issues with a site in Gothenburg. They were using L2VPNs as circuits between their central site and the remote sites. Across this L2VPN they are running MPLS MP-eBGP peering using inter AS option 2b to allow multiplexing of different routing-instances on the WAN. We were observing BGP flapping between the secondary ASBR router on site and their central ASBR router. The link showed no flapping and pinging across the link worked fine.

The BGP peering was being established without any issues. However, there were no routes being received on either side. To make a short story short, it was MTU related. But the reason why was quite interesting!

The transit interfaces on each ASBR were set to 9000 for their MTU, which is not uncommon for backbone transit links to have. To allow for MSS negotiation there is a protocol known as Path MTU Discovery (PMTUD), which will send UDP packets with incrementing size as well as the don’t fragment flag set, between a source and destination in an IP network. It does this up until is reaches the local devices MTU or until it reaches the remove devices MTU. If the receiving devices notes an MTU which is too high it will drop the UDP packet and return an ICMP packet stating “Fragmentation needed and DF set”. Once the sending PMTUD device receives an ICMP packet describing that it has reached breached the MTU for a given link it will adjust the TCP-MSS according to the learned MTU, while accounting for network and transport headers.

When BGP is establishing a new neighborship it will run the PMTUD process if the following command is active towards the peer (by default, it is):

neighbor <peer-ip> transport path-mtu-discovery

Interestingly you can also disable the PMTUD process for your neighbor by just appending a “no” to the command above.

This usually works very well to adjust the maximum segment size of the BGP neighborship. However in this case the MTU was limited to 1536 bytes in the provider network. Since the L2VPN is not an IP network, but an ethernet network it will not cause return an ICMP fragmentation notice to the sending PMTUD device, in the event that the L2VPN MTU is exceeded, in stead it will just drop the frame. This caused the PMTUD process to set the maximum segment size for the BGP TCP session to 8896 bytes. When the BGP OPEN packets are sent they are small enough to pass through the L2VPN, while the BGP UPDATE packets cause the frame to be bigger than 1536 bytes, which results in an unannounced drop of the frame.

To solve the issue we just set the MTU on each side to the correct MTU of the L2VPN, which is our case would be 1514 bytes:

  • Maximum Frame Size (MFS): 1536 bytes – Ethernet headers (18 bytes) and MPLS label (4 bytes)
    • Maximum Transmission Unit (MTU): 1514 bytes – IP headers (20 bytes) and transport headers (20 bytes)
      • Maximum Segment Size (MSS): 1474 bytes

This issue was very interesting from my point of view and hopefully I can find some time to experiment a bit more with the issue. For example, does configuring “ip tcp adjust-mss” allow the MSS to be correctly adjusted as to not drop any packets.

Anyway, just dumping the info onto here to should it be useful for me or anyone else in the future. 🙂