Big Gaming Protocol
The big bad Border Gateway Protocol (BGP) is, unlike the OSPF, EIGRP and ISIS, an eBGP designed to connect together large scale networks. Because BGP is built for scale, it is tuned to be a bit more stable than the IGPs. For example it does not allow for dynamic neighbor discovery, but rather each neighbor must be explicitly specified. The BGP hold timers are also a lot more generous than the IGP timers, with the default being 3 minutes. When you think about it, it makes sense, you don’t want the entire internet performing recalculations on their routes if a single link flaps.
BGP is also TCP based (using destination port 179), and can thus build sessions across L3. Based on this, BGP is more like an application than a control plane protocol. Running BGP across L3 is known as multihop and is configured by setting the TTL on the BGP session packets. BGP multihop is enabled, with a TTL of 255, on iBGP sessions by default, however, it is disabled, with a TTL of 1, on eBGP sessions. Multihop on public networks should be used with caution, since each TTL increment increases the number of nodes you expose your BGP session to.
Since BGP is running on TCP, you have to account for the MTU between peers, to ensure that the session can be maintained. BGP packets usually have the DF bit set to 1, which does not allow for fragmentation, in stead relying on the PMTUD process for determining a suitable packet size. This too can cause some issues though. If you want to learn more, checkout my post on PMTUD for BGP sessions across L2VPNs.
Peering
When setting up peering with another BGP peer you must have the following:
- A router ID (RID) in IPv4 format defined. This can be statically assigned or the router will automatically pick the highest loopback IP. Please note, if the RID is changed all BGP sessions will be dropped.
- A local AS number.
- A remote AS number.
- The remote peers peering IP.
This is a sample configuration for peering over IPv4:
R1:
interface GigabitEthernet0/0/0
ip address 10.54.2.1 255.255.255.0
router bgp 1
router-id 172.16.1.1
neighbor 10.54.2.2 remote-as 2
R2:
interface GigabitEthernet0/0/0
ip address 10.54.2.2 255.255.255.0
router bgp 2
router-id 172.16.2.2
neighbor 10.54.2.1 remote-as 1
This would trigger the BGP peering process, which should initiate a TCP session between the two interfaces.
The different types of packets used by BGP
BGP uses four types of packets for communicating between peers. This session and these packets can be sent over any L2 and L3, allowing for BGP sessions being formed across IPv4 and/or IPv6 for example. A single BGP session can then carry information for different protocols across this session.
Type | Description |
---|---|
Open | These packets are used to setup the adjacency between peers. |
Update | These types of packets add, update and/or remove routes between peers. |
Notify | These packets are used to notify errors or issues to neighbors. |
Keepalive | Used to ensure that the peer is up and available. BGP does not simply rely on the TCP session being up as an indication for peers being up. |
Keepalive packets are sent every 1/3 of the hold time. The default hold time is 180 seconds, which would put the default hello time (the interval between each keepalive packet) at 60 seconds. During neighbor negotiation, the neighbor with the lowest timers will win, thus setting the timers for both peers.
Steps of the peering
When a new peering is to be initiated, there are a set of steps that BGP will pass through. The starting state is Idle
meaning that there has been no peering setup or initiated, while the desired state is Established
. The mechanism that tracks the state of each connection is referred to as Finite State Machine (FSM) and a separate FSM exists for each peer. The BGP peer also listens to incoming requests on TCP179, which also has a separate FSM.
Both peers will attempt to initiate peering with each other. This may result in something known as a connection collision.
- Idle
- Connect
- Active
- OpenSent
- OpenConfirm
- Established
Idle
This is the initial state of a BGP peer. During this stage the peers may attempt to connect to each other. Even though either peer is able to initiate a connection, one of the peers will be the active owner and one will be the passive owner of the TCP session. The router with the active connection will jump to the Connect stage, while the router with the passive connection will jump straight to the Active stage.
Connect
During the connect phase the router with the active side of the TCP session will wait for the TCP session to be fully established. If it succeeds it will move the FSM to the OpenSent stage.
Active
The router with the passive end of the TCP session will enter into the Active stage. Here it will wait for the forming of a TCP session. Once it has been formed it will send out a Open message and move to the OpenSent stage.
OpenSent
In this stage the router will await an Open message from the neighbor. Once an Open message is received, the FSM moves onto the OpenConfirm stage and sends a Keepalive packet to the neighbor.
OpenConfirm
During this stage the router has successfully received an Open message across the TCP session and is now waiting for a Keepalive to be received. If the Keepalive is received the FSM is complete and moves into the Established stage.
Established
During this stage the BGP peering is up and working, which will allow BGP to exchange route updates. Happy days.
The protocols BGP supports
Initially BGP was made for IPv4, but with the introduction of Multi-protocol BGP (MP-BGP/MBGP) there is support for a lot of different protocols like IPv4, IPv6 and MPLS L3VPNs.
BGP divides each protocol into an address family (AF). These are identified using an address family identifier (AFI) for IPv4 or IPv6, and then a subsequent address family identifier (SAFI) such as unicast or multicast. This feature means that the “underlay” used for the actual TCP session is decoupled from the protocols that the router forwards.
AFI | SAFI | Protocol |
---|---|---|
1 | 1 | IPv4 + unicast |
2 | 1 | IPv6 + unicast |
1 | 2 | IPv4 + multicast |
2 | 2 | IPv6 + multicast |
The route information is communicated between routers using NLRIs in a Update packet.
Incoming and outgoing routes
Once a route is received, the receiving router puts the route into the Adj-RIB-In table. This table is used for all incoming routes, before any sort of filtering or validity checks occur. After this, the router will run its checks on each route, ensuring the routes validity as well as any prefix-lists and/or route-maps.
The newly vetted routes are the placed into the Loc-RIB, which is commonly referred to as the BGP-table. These routes are the ones that are presented to the routers RIB, where the router can choose which routes should be selected.
As for the outgoing routes, these will be sourced from the Loc-RIB and will pass through outgoing checks, validity of routes, prefix-lists and route-maps, before being placed in the Adj-RIB-Out. The routes in the Adj-RIB-Out are then advertised to the neighboring peer.
Aggregating routes
Aggregating routes in BGP is a necessity for keeping the size of the internet routing table within reason. In order to aggregate routes you use the command aggregate-address <prefix> <subnet mask>
, under the relevant address family. Here is an example:
router bgp 1
address-family ipv4
aggregate-address 10.24.0.0 255.255.0.0
This will inject a summary address into the BGP Loc-RIB table, which will then be able to be advertised out to the neighboring peers. Note, that the component routes will also be forwarded using this method. If you wish to suppress the component routes you can use this configuration:
router bgp 1
address-family ipv4
aggregate-address 10.24.0.0 255.255.0.0 summary-only
The summary route will be stripped of all the transitive Path Attributes. If you wish to preserve the AS-Path, you can in stead use this command:
router bgp 1
address-family ipv4
aggregate-address 10.24.0.0 255.255.0.0 summary-only as-set
This will cause the AS-Path from the component routes to be added to the summary route within brackets. This allows the routers that are receiving the summary route to correctly understand which AS:es the route has crossed.
Here is an example of how a summary-only and as-set aggregate looks in the BGP-table:
R1# show ip bgp
BGP table version is 2, local router ID is 10.10.0.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Network Next Hop Metric LocPrf Weight Path
*> 10.24.0.0/16 10.10.0.1 0 1 {200,100} i