In a traditional MPLS VPN cloud, it is always understood that the core is BGP free and doesn’t hold any VPN specific information. Any core MPLS router will only have reachability information about the PE and P routers within the same domain.
So how does a traceroute triggered from a CE list all the nodes within MPLS core?.
In the above MPLS VPN topology, CE1 and CE2 belong to a VRF with PE1 and PE2 as Provider Edge nodes. P1, P2 and P3 being core nodes don’t have any reachability information about CE1 or CE2. But when a trace route is triggered from CE1 to CE2 and if TTL propagation is not disabled on PE routers, the trace output in CE1 will list all routers along the path within MPLS cloud.
CE1#traceroute 192.168.6.6 source 192.168.1.1 numeric
Type escape sequence to abort.
Tracing the route to 192.168.6.6
VRF info: (vrf in name/id, vrf out name/id)
1 192.168.12.2 0 msec 0 msec 0 msec
2 10.1.23.3 [MPLS: Labels 19/22 Exp 0] 1 msec 1 msec 1 msec
3 10.1.34.4 [MPLS: Labels 19/22 Exp 0] 1 msec 1 msec 1 msec
4 192.168.56.5 [MPLS: Label 22 Exp 0] 0 msec 0 msec 0 msec
5 192.168.56.6 2 msec * 2 msec
So it appears that the P routers, even though they don’t have reachability information for the VPN prefixes still were able to send ICMP error message (Remember, ICMP error message “TTL expired” plays a key role in trace route) back to CE nodes.
So, How does it work?
The default behavior of any LSR on receiving a packet with TTL=1 on top label will drop the packet and send ICMP error message to source of the packet. But when the packet is VPN traffic (with more than 1 label), the LSR will perform the below,
- Buffer the label stack from incoming packet (the packet received with TTL=1)
- Generate ICMP error message with source as its own address and destination as source address from received packet.
- Append all labels from bottom of label stack (that was buffered earlier in step 1) with TTL=255 except the top one.
- Get the top label from buffered label stack and perform local LFIB lookup to get the label to swap and the associated next hop.
- Append the new label to the top of stack with TTL=255 and send across.
With this approach, the ICMP error message will traverse from transit LSR to egress LSR and then back to ingress LSR to actual source in VRF.
Below is a simple example explained for more clarity.
In the above topology, when a trace is performed from CE1 (192.168.1.1) to CE2 (192.168.6.6), the first packet with TTL=1 will reach ingress PE which drops the packets and send ICMP error message directly.
The second with TTL=2 will reach Ingress PE that pushes <19><22>. While performing the same, the TTL of these labels will be set to 1.22>19>
P1 on receiving it will drop the packet, generate ICMP reply message with destination as 192.168.1.1. The reply packet will be using the same label stack for forwarding. It pushes the VPN label as 22, swap the transport label as per the local forwarding table and send towards remote PE router.
PE2 on receiving it will perform an IP lookup in the VRF table and forward back to core towards 192.168.1.1. The same procedure will be continued till it reaches 192.168.6.6.