Like other very successful protocols such as HTTP and DNS, BGP has received more and more tasks over the years. In this blog post, we'll look at the new features and use cases that have been added to BGP over the years. This includes various applications of BGP in corporate networks and data centers.
BGP for Internet routing
The world looked completely different in 1989 than the specification for the first version of BGP has been published. Before the BGP, there was the Exterior Gateway Protocol (EGP), but EGP was designed to have a central backbone to which other networks were connected (similar to how OSPF's Area 0 connects other areas). As regional networks began to connect directly and the first commercial network providers emerged, a more flexible routing protocol was required to handle packet routing between the various networks that make up the Internet. Global "cross-domain" routing is of course still the most notable use of BGP.
In those early days of BGP, the Internet Protocol (IP) was just one protocol among many: Large vendors usually had their own network protocols such as IBM's SNA, Novells IPX, Apple's AppleTalk, Digital & # 39; s DECnet, and Microsoft & # 39; s NetBIOS. Over the course of the 1990s, these protocols quickly became less relevant because IP was needed to communicate with the Internet anyway, so it was easier to run internal applications over IP.
BGP in corporate networks
The result was that BGP quickly played a role as intern Routing protocol in large corporate networks. The reason for this is that, unlike internal routing protocols such as OSPF and IS-IS, BGP enables the application of guidelines so that the routing between parts of a company can be controlled as required. Of course, BGP is used for cross-domain routing (Internet) with public IP addresses and public AS numbers (Autonomous System). These are published by the five regional Internet registers (RIRs) such as ARIN in North America. In corporate networks, the private address ranges 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16 are often used, but surprisingly, corporate networks may also use public addresses to coordinate the use of private ranges very quickly in large organizations complex, especially after mergers.
Corporate networks tend to use private AS numbers frequently, as RIRs have become more restrictive over the years to assign public AS numbers – despite the fact that we switched from 16-bit to 32-bit AS about ten years ago Numbers have changed, so there is absolutely no shortage. The original range for private 16-bit AS numbers is between 64512 and 65534, so that 1023 private AS numbers are possible. This is not much in a large network, so it is not uncommon for the same AS number to be used in different parts of the network. The private 32-bit range extends from 4200000000 to 4294967294 and enables almost 95 million additional private AS numbers. So clashing AS numbers shouldn't be a problem as long as people pick them more or less randomly and don't all start at 4200000001.
Aside from these considerations, the structure of corporate networks is quite similar to the structure of the Internet in general, with BGP security possibly less in the foreground.
BGP in the data center document
It's different in the data center. BGP has three different roles in large data centers. First, there is the "underlay", the physical network that allows packets to be moved between any server in any rack in the data center to another server in a different location, as well as to the firewalls and routers that manage external connectivity. Second, there is often an "overlay" that creates a logical structure over the physical structure of the pad. And third, BGP can be used by physical servers to forward packets to and from virtual machines (VMs) running on those servers.
It is estimated that up to 930 kilobytes of internal (east-west) traffic is generated within the data center for each kilobyte of external (north-south) traffic. This means that the data traffic that is moved in a large data center with many thousands (physical) servers is absolutely massive. Internet-like hierarchical network topologies cannot support this. Therefore, data centers typically use leaf spine topology, as explained in our blog post BGP in large data centers.
5-level Clos network topology with clusters
Even more than in corporate networks, BGP is used in the data center document as an internal routing protocol. This means that it would be useful for BGP to behave more like an internal routing protocol and automatically detect neighboring routers instead of requiring neighbor relationships to be explicitly configured. Work on this started a few years ago at the IETF, as explained in our blog post BGP LLDP Peer DiscoveryHowever, this work has not yet been published as an RFC.
Another problem with such extensive use of BGP is that a significant amount of address space is required to number each page of each BGP connection. An interesting way to get around this is to use BGP without a number. The use of BGP, which is not numbered with the FRRouting package (a branch of the open source routing software Quagga), is described in detail in this book chapter at O’Reilly.
The idea is to set up BGP sessions for the IPv6 link local address of neighboring routers and then exchange IPv4 prefixes over these BGP sessions. An IPv6 next hop address is used for these prefixes RFC 5549. This doesn't seem to make any sense at first: How can a router forward IPv4 packets to an IPv6 address? However, the next hop address does not take any direct actions. The function of the next hop address is to be the input for ARP to determine the MAC address of the next hop. The same MAC address can of course be obtained from the IPv6 address for the next hop using IPv6 Neighbor Discovery. In this way, BGP routers can forward packets to one another without BGP having to consume a large number of IPv4 addresses.
BGP EVPN in the data center overlay
Many data center tenants have their own network requirements that go beyond the simple IP-based leaf spine model. Therefore, they implement an overlay network over the data center underlay network by tunneling. These can be IP-based tunnels that use a protocol such as GRE, or layer 2 that runs on a layer 3 document, often in the form of VXLAN.
VXLAN is typically implemented in the hypervisor on a physical server. In this way, the VMs that are running on the same and different physical servers can be networked with one another as required – even with Layer 2 virtual networks. However, running a Layer 2 network over a Layer 3 network is a unique challenge: BUM traffic. BUM stands for broadcast, unknown unicast and multicast. These are the types of traffic that a switch typically floods to all ports.
To avoid this, VTEPs (VXLAN tunnel endpoints) / NVEs (Network Virtualization Edges) can use BGP to communicate which IP addresses and which MAC addresses are used where along with other parameters, eliminating the need to distribute BUM traffic is largely avoided. The VTEPs implement "ARP suppression", which allows them to respond to ARP requests for remote addresses locally, so that ARP broadcasts do not have to be flooded by replicating them to all remote VTEPs.
RFC 7432 Specifies "BGP MPLS-based Ethernet VPN" and is largely reused for VXLAN. Originally, BGP could only be used for IPv4 routing, however Multiprotocol extensions (also used for IPv6 BGP) enables BGP to be used to communicate EVPN information between VTEPs. Each VTEP injects the known MAC and IP addresses into BGP, i.e. all other VTEPs. The remote VTEPs can then tunnel traffic towards these addresses to the next hop address that is included in the BGP update. In contrast to BGP in the document, which normally uses eBGP, EVPN information is transported via iBGP: All VTEPs / NVEs are part of the same AS. If there are many of them, it is helpful to use route reflectors to avoid excessive iBGP sessions.
BGP and pods
In addition to the underlay and the overlay, there is a third routing level that is becoming increasingly relevant in the data center: the routing between "pods" on (virtual) hosts. Systems like docker Allow applications to be deployed in lightweight containers for easy deployment. A container is a self-contained system that contains the correct versions of libraries and other dependencies. Unlike virtual machines, which each run a separate copy of an entire operating system in their own virtualized environment, multiple containers run side by side under a single copy of an operating system. container can They run many of their own networks, but are usually only run as a service under a TCP or UDP port number on the (virtual) host's IP address.
However, this sharing of the host's IP address makes it difficult to deploy multiple instances of the same application or service, which tend to expect a known port number to be used. Kubernetes Addresses this issue by grouping containers together in a pod. Pods are relatively short-lived and the idea is to run multiple pods on the same (virtual) machine. Containers in a pod share an IP address and a TCP / UDP port range. Within a Kubernetes deployment, all pods can communicate with each other without using NAT.
A pod's network interface (s) can be connected to the host's network interface (s) at layer 2 and thus directly connect to the layer 2 or layer 3 service provided by an overlay network. However, this is not the most scalable solution. The alternative is to have the host operating system route between its network interfaces and the pods running on that host. In this situation, pods are usually provided with an IP address space of, for example, etcd. In order for other pods and the rest of the world to reach the pod, these addresses must be made available in a routing protocol. Here, too, BGP has the advantage due to its flexibility, as shown in this blog post at Cloud Native Labs and in this blog post at Flockport.
BGP is undoubtedly one of the most advanced IP routing protocols available on the Internet today. Its complexity is mainly due to its focus on the routing guidelines. The generic statement that BGP is only used when routing between two autonomous systems is rather misleading. There are several scenarios in which the protocol can be used or even required. BGP remains the right tool for so many jobs!