Kubernetes Cilium Service Announcement: In-Depth L2 and BGP Comparison
Imagine you’ve successfully set up your Kubernetes environment — everything’s running smoothly, your applications are deployed, and you’re ready to roll. But then, you encounter a common challenge that often follows this initial triumph: enabling external access to your Kubernetes services.
In this article, we’ll delve into the solutions to this predicament and explore the underlying technologies that empower us to make our Kubernetes services accessible to the world: Layer 2 announcement and BGP load balancing, each offering unique advantages in achieving this goal.
Our Current Kubernetes Environment
- Here, we have a fully operational Kubernetes setup with Cilium as the container network interface (CNI), configured within a Proxmox Virtual Environment. Refer to this official documentation for installation details.
- Virtual router (which I will refer to as router) to centralize the gateway traffic into our Kubernetes cluster ensuring all necessary configurations reside within the Proxmox VE. Although not mandatory, a physical router can also be used as an alternative.
- Kubernetes services (SVC) with its external IP.
However, those IP addresses can only be accessed from the nodes. External traffic and even the virtual router remain unaware of how to reach the services.
The Objectives
Our primary aim is to make services accessible from the router.
So, What’s the Solution?
It’s simple: we need to inform the router that services are reachable from the nodes.
In the illustration above, the crucial step is the first one. If we have successfully advertised the services via nodes, the router will understand to send any packet to the services through the nodes initially.
It’s worth noting that the router can choose any node as the packet destination, a concept we’ll explore further below.
Currently, we have two well-established solutions to achieve this in cilium: Layer 2 announcement and BGP load balancing.
Layer 2 announcement
Let’s begin by examining the foundation of this solution, which is the ARP protocol.
Analogous Scenario
Imagine two people, Alice and Bob. If Alice intends to send a gift to Bob, she must first ascertain Bob’s residence.
How can Alice know of Bob’s address? it’s simple: she asks!
- Alice inquires about Bob’s address.
- Bob responds with his address.
- Alice records Bob’s address.
- Alice can now send the gift!
We can apply this same concept to computer communication.
ARP protocol
The ARP (Address Resolution Protocol) serves the purpose of obtaining the MAC address corresponding to an IP address.
The example above is nearly identical, with the only difference being the use of IP addresses as identifiers and MAC addresses as actual addresses.
Communication begins with an ARP request, which seeks the MAC address and receives an ARP reply containing the MAC address. This pair of IP and MAC addresses is then stored in the ARP table.
How about our Kubernetes’ case?
In our topology, the router won’t initiate the communication with an ARP request, as shown in the illustration above. It means the whole communication won’t be started and the services remain inaccessible from the router and external traffic.
The solution involves bypassing the request phase and proceeding directly to the reply.
So, Do Nodes Send ARP Replies Without Prior Router Requests?
Precisely. An ARP reply without any preceding ARP request from the router is termed a gratuitous ARP reply. We employ this to notify the router that the SVC is accessible from the nodes, even without the router’s inquiry.
Reviewing our goal illustration, the outcome remains the same. We’ve achieved our objective of informing the router of the SVC’s location, enabling the router to access the SVC.
Failover, Not Load Balancer
Due to how Layer 2 ARP works, all traffic to a service will only handled by one node while the others remain in standby mode.
The screenshot above displays an example ARP table on a Linux machine generated by the arp
command. Each IP address corresponds to a single MAC address, indicating that for the same SVC IP address, the ARP table stores only one node's MAC address.
Because of this reason, the SVC will be accessible from only one node from the router. The process of selecting which node communicates with the router is called election where the winner will obtain a lease for the SVC.
You can check which nodes are selected as leaders for each service by running
kubectl get leases --all-namespaces
Here I’m using k9s to display the leases.
In my example, Master-1 is consistently selected as the leader for all services, creating an imbalance despite having three master nodes.
This is the reason the L2 advertisement is more like a failover, not a load balancing. Moreover, a node may handle way more traffic compared to others.
L2 summary
- + Relatively easy to set up, requiring only advertising from Kubernetes without router configuration.
- - Limited network expansion options, as network advertising relies on ARP, not a routing protocol (explained further in the BGP section).
- - Like explained above, Primarily functions as failover rather than load balancing.
BGP load balancing
BGP (Border Gateway Protocol) stands as a dynamic routing protocol extensively used in global networking. Chances are, you’re accessing this article via the internet, which most likely employs BGP somewhere in its network.
Dynamic routing
Suppose Alice knows Bob and Bob knows Cindy.
Without dynamic routing, Alice and Cindy won’t know that they can communicate with each other by using Bob as the intermediate.
However, dynamic routing rectifies this issue by informing both Alice and Cindy that they can communicate through Bob.
How Does It Work?
In BGP routing, we refer to each node and router as peers. Both sides of BGP peering must be connected by registering them as neighbors. Those connections will have many attributes that help BGP determine which route will considered the best route, including weight, distance, router ID, and many more.
You can verify BGP connectivity using the Cilium CLI command cilium bgp peers. A successful BGP peering connection will display an established session state.
You may find any other states includes:
- active: indicates BGP is actively looking for BGP peers. Usually occurs when the router has not been set up with a correct BGP setting and neighbors
- idle: means BGP stopped looking for peers. The last time I found this was because of a wrong node’s IP configuration which led to IP address conflict.
Load Balancing, Not Just failover
Each connection between the router and every node in the case of this Kubernetes environment shares the same configuration, including distance, autonomous number, and other attributes. Furthermore, each of these nodes can direct traffic to the same SVC.
When identical routes lead to the same destination, BGP incorporates a feature known as ECMP (Equal Cost Multi-Path). This feature evenly distributes traffic to the SVC through all nodes with BGP peering for identical routes which is the core of how load balancing in BGP-based advertisement.
Scalable
When it comes to BGP advertisements, we can interconnect this dynamic route within Kubernetes to external dynamic routes, not only to BGP but also to another dynamic routing protocol like OSPF.
BGP summary
- + Supports real load balancer
- + Offers scalability by connecting BGP to external traffic for network expansion.
- - Can be relatively complex for small-scale projects and may require more extensive setup knowledge.
Conclusion
Throughout this article, we have discussed in detail of how layer 2 advertisement and BGP load balancer works and comparing their functionality. However, we’ve only scratched the surface without diving into the nitty-gritty details and real-world examples.
What’s Next on the Horizon?
The journey doesn’t end here. Stay on the lookout for the forthcoming detailed configuration guides for both Layer 2 and BGP. I’m eager to provide you with the practical insights you need to navigate the realm of Kubernetes connectivity effectively.