Normally QoS policy will be applied based on Source IP, destination IP or combination of both, or IP header markings like DSCP/Precedence. Now with QPPB, it can also be applied based on BGP AS-PATH or community. This helps us apply QoS policy per customer.
How to configure QPPB?
QPPB can be configured by the following simple steps,
On the edge PE devices, configure route-map to match BGP attributes like AS-PATH, community or using prefix match and mark IP Precedence or qos-group.
Apply the same using table-map under BGP process.
Apply BGP policy under the interface mentioning if source or destination of the incoming traffic to be matched for QoS treatment.
Below is the configuration example,
In this example, we use as-path to classify the traffic. As mentioned in Step 1, “route-map” will be configured to match 65004 in AS-PATH attribute field and will be applied to mark qos-group as 50,
R2#sh ip bgp regexp 65004
BGP table version is 3, local router ID is 172.16.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*>i10.1.4.4/32 172.16.3.3 0 100 0 65004 i
R2#
R2#config t
R2(config)#
R2(config)#ip as-path access-list 2 permit 65004
R2(config)#
R2(config)#route-map QPPB_USING_ASPATH
R2(config-route-map)#match as-path 2
R2(config-route-map)#set ip qos-group 50
R2(config-route-map)#exit
R2(config)#
As mentioned in Step 2, Now apply the policy under BGP using “table-map” which will get reflected in RIB,
R2(config)#router bgp 65023
R2(config-router)#table-map QPPB_USING_ASPATH
R2(config-router)#end
As mentioned in Step 3, the policy will be applied in incoming interface of the traffic to apply the policy for destination address of the traffic,
R2(config)#int e0/0.12
R2(config-subif)#bgp-policy ?
accounting bgp based policy accounting of traffic (input on default)
destination use destination IP address for route lookup
It should be noted that the policy will not affect any prefixes which are already present in BGP table while applying the same under BGP process.
R2#sh ip cef 10.1.4.4 detail
10.1.4.4/32, epoch 0
recursive via 172.16.3.3
nexthop 172.16.23.3 Ethernet0/0.23
Soft clearing the BGP process will not help with reflecting the policy at data plane,
R2#clear ip bgp * in
R2#clear ip bgp * ou
R2#sh ip cef 10.1.4.4 detail
10.1.4.4/32, epoch 0
recursive via 172.16.3.3
nexthop 172.16.23.3 Ethernet0/0.23
Once BGP is heard reset, policy will be applied to all required prefixes at data plane,
R2#clear ip bgp *
R2#sh ip cef 10.1.4.4 detail
10.1.4.4/32, epoch 0
QOS: qos-group 50
recursive via 172.16.3.3
nexthop 172.16.23.3 Ethernet0/0.23
R2#
For testing purpose, I have applied rate-limiting on egress port using qos-group 50 which is applied at ingress port. Now it can be observed that packet matches seen in rate-limit output,
When a host generates Data, the packetization layer (TCP/UDP) will decide the packet size based on the MTU size of the outgoing interface. When the packet traverses along the path to ultimate destination, it may get fragmented if the MTU of outgoing interface on any router is less than the packet size. Packet fragmentation on intermittent router is always considered inefficient as it may result in below:
1.One fragment lost will result in entire packet sent from the source.
2.Introduce CPU/buffer burden.
Path MTU Discovery is introduced to reduce the chances of IP packet getting fragmented along the path. The ultimate source will use this feature to identify the lowest MTU along the path to destination and will decide the packet size.
How does PMTUD works?
When the host generates the packet, it decides the size as MTU size of the outgoing interface and set the DF bit.
Any receiving intermittent device who has MTU less than the packet size on outgoing interface have two choices: 1. Fragment and send if the DF bit is not set 2. Drop the packet and send an ICMP error message with Type=3 (Destination Unreachable); Code=4 (Fragmentation needed and DF bit set)
ICMP error message will laos have the MTU details of the outgoing interface in “Next-Hop MTU” field.
Source on receiving the error message will now send the packet with mentioned MTU. This continues till it reaches the ultimate destination.
BGP support for Path MTU Discovery
Introducing Path MTU Discovery on BGP session allows the BGP router to discover the best MTU size along the path to neighbor resulting in efficient way of exchanging BGP packets.
Consider the below scenario for further reading,
Initial TCP negotiation between R1 and R5 will have MSS value equal to (IP MTU – 40 bytes of IP header) with DF set. In our case, IP MTU is 1500 which results in 1460 as MSS. As the initial negotiation packets are very small, it mostly moves the BGP to Established state with MSS as same value.
R1#sh ip bgp nei | inc Data
Datagrams (max data segment is 1460 bytes):
After TCP negotiation, when the BGP update packets are sent, DF bit will be set wich will result in ICMP error message from R3 with 300 as Next-Hop MTU. Now the MSS is reduced to 260 (300 – 40 bytes of IP header).
R1#sh ip bgp nei | inc Data
Datagrams (max data segment is 260 bytes)
R1#
Now, with the same topology, when some intermittent device is not able to forward ICMP (some Firewall in between), end to end Path MTU discovery will not be successful. This may result in BGP session flap.
We have configured ACL on R2 to block ICMP message towards R1. So ICMP error message from R3 will not reach R1.
As soon we have BGP configured between R1 and R5, TCP negotiation will be successful and BGP will move to Established state. Now when the BGP Update is sent to R5, it will send the same with DF bit set. When a BGP router send BGP Update to any neighbor, it will not send keepalive. R3 on receiving it, will send an ICMP error message to R1 which is getting blocked in R2.
R5 after BGP session is up will except either BGP update or keepalive from R1 to reset the hold down timer. After 180 seconds, it will neither receive Update nor keepalive resulting in sending BGP Notification to R1 with error message as “Hold time expired”.
R1#sh ip bgp nei | inc Data
Datagrams (max data segment is 1460 bytes):
R1#
*Mar 22 15:16:23.033: %BGP-3-NOTIFICATION: received from neighbor 150.1.5.5 4/0 (hold time expired) 0 bytes
R1#
*Mar 22 15:16:23.033: %BGP-5-ADJCHANGE: neighbor 150.1.5.5 Down BGP Notification received
R1#
*Mar 22 15:16:55.621: %BGP-5-ADJCHANGE: neighbor 150.1.5.5 Up
R1#
*Mar 22 15:19:56.409: %BGP-3-NOTIFICATION: received from neighbor 150.1.5.5 4/0 (hold time expired) 0 bytes
R1#
*Mar 22 15:19:56.409: %BGP-5-ADJCHANGE: neighbor 150.1.5.5 Down BGP Notification received
R1#
*Mar 22 15:20:13.361: %BGP-5-ADJCHANGE: neighbor 150.1.5.5 Up