eBPF, XFP, Cilium…

user-space networking achieves high-speed performance by moving packet-processing out of the kernel’s realm into user-space. XDP does in fact the opposite: it moves user-space networking programs (filters, mappers, routing, etc) into the kernel’s realm. XDP allow us to execute our network function as soon as a packet hits the NIC, and before it starts moving upwards into the kernel’s networking subsystem

http://blogs.igalia.com/dpino/2019/01/07/a-brief-introduction-to-xdp-and-ebpf/

The typical workflow is that BPF programs are written in C, compiled by LLVM into object / ELF files, which are parsed by user space BPF ELF loaders (such as iproute2 or others), and pushed into the kernel through the BPF system call. The kernel verifies the BPF instructions and JITs them, returning a new file descriptor for the program, which then can be attached to a subsystem (e.g. networking). If supported, the subsystem could then further offload the BPF program to hardware (e.g. NIC).

clang -O2 -Wall -target bpf -c xdp-example.c -o xdp-example.o

ip link set dev em1 xdp obj xdp-example.o

https://docs.cilium.io/en/stable/bpf/

JIT compilers speed up execution of the BPF program significantly since they reduce the per instruction cost compared to the interpreter. Often instructions can be mapped 1:1 with native instructions of the underlying architecture.

Maps are efficient key / value stores that reside in kernel space. They can be accessed from a BPF program in order to keep state among multiple BPF program invocations. They can also be accessed through file descriptors from user space and can be arbitrarily shared with other BPF programs or user space.

https://docs.cilium.io/en/stable/bpf/

XDP allows you to attach an eBPF program to a lower-level hook inside the kernel. Such a hook is implemented by the network device driver, inside the ingress traffic processing function.

Not all network device drivers implement the XDP hook. In such a case, you may fall back to the generic XDP hook, implemented by the core kernel.

Since such a hook takes place later in the networking stack, the performance observed there is much lower.

https://developers.redhat.com/blog/2018/12/06/achieving-high-performance-low-latency-networking-with-xdp-part-1

XDP runs in the kernel network driver, it can read the ethernet frames from the RX ring of the NIC and take actions immediately. XDP plugs into the eBPF infrastructure through an RX hook implemented in the driver.

https://developer.nvidia.com/blog/accelerating-with-xdp-over-mellanox-connectx-nics/

Networking programs in BPF, in particular for tc and XDP do have an offload-interface to hardware in the kernel in order to execute BPF code directly on the NIC.

Currently, the nfp driver from Netronome has support for offloading BPF through a JIT compiler which translates BPF instructions to an instruction set implemented against the NIC. This includes offloading of BPF maps to the NIC as well, thus the offloaded BPF program can perform map lookups, updates and deletions.

https://docs.cilium.io/en/stable/bpf/

We present a solution to run Linux’s eXpress Data Path programs written in eBPF on FPGAs, using only a fraction of the available hardware resources while matching the performance of high-end CPUs.

https://fosdem.org/2021/schedule/event/sdn_hxdp_fpga/

LLVM (formerly Low Level Virtual Machine) is a compiler infrastructure designed to optimize the compilation, binding and execution times of programs written in various programming languages.

The LLVM project began in 2000 at the University of Illinois at Urbana – Champaign under the direction of Vikram Adve and Chris Lattner. In 2005 Apple Inc. hired Lattner and formed a team to work on the LLVM system for various uses in development systems. from Apple.

https://en.terminalroot.com.br/gcc-vs-llvm-which-is-the-best-compiler/

When no configuration is provided, Cilium automatically runs in [encapsulation] mode as it is the mode with the fewest requirements on the underlying networking infrastructure.

In this mode, all cluster nodes form a mesh of tunnels using the UDP-based encapsulation protocols VXLAN or Geneve. All traffic between Cilium nodes is encapsulated.

In native routing mode, Cilium will delegate all packets which are not addressed to another local endpoint to the routing subsystem of the Linux kernel. This means that the packet will be routed as if a local process would have emitted the packet.

https://docs.cilium.io/en/v1.8/concepts/networking/routing/

Replacing iptables with eBPF in
Kubernetes with Cilium

https://archive.fosdem.org/2020/schedule/event/replacing_iptables_with_ebpf/attachments/slides/3622/export/events/attachments/replacing_iptables_with_ebpf/slides/3622/Cilium_FOSDEM_2020.pdf

Main actions [outgoing from pod]:

1. Service load balancing: select a proper Pod from backend list, we assume POD4 on NODE2 is selected.

2. Create or update connection tracking (CT or conntrack) record.

3. Perform DNAT, replace ServiceIP with POD4_IP for the dst_ip field in IP header.

4. Perform egress network policy checking.

5. Perform encapsulation if in tunnel mode, or pass the packet to kernel stack if in direct routing mode.

https://arthurchiao.github.io/blog/cilium-life-of-a-packet-pod-to-service/

Leave a comment