Revolutionising System Monitoring and Observability: The Game-Changing Impact of eBPF
In the world of technology - beneath the surface of the operating system - there is a fascinating realm called the kernel - the heart of your computer system, where it manages resources, executes tasks, and keeps everything running smoothly.
written by Lucian Herciu (DevOps Engineer), Renata Muresan (DevOps Engineer) and Andrei Grigoriu (DevOps Engineer), in the July 2023 issue of Today Software Magazine.
Read the article in Romanian here
In the world of technology - beneath the surface of the operating system - there is a fascinating realm called the kernel - the heart of your computer system, where it manages resources, executes tasks, and keeps everything running smoothly.
But have you ever wondered how you can gain insight into this mysterious world?
eBPF is a revolutionary technology that allows you to look inside the kernel to achieve new levels of observability.
This article explores this new fascinating world, its role in observability, and how it changed the way we understand and optimise systems.
For system monitoring and observability, eBPF emerges as a game-changer, even for non-technical individuals: with this technology, you can better understand the health, performance, and behaviour of your systems without delving into complex technical details.
By seamlessly collecting data and analysing it in real-time, eBPF empowers you to make informed decisions, troubleshoot issues, and ensure the optimal functioning of your systems while providing a user-friendly experience.
What is eBPF?
BPF (Berkeley Packet Filter) emerged in the 1990s as a method for filtering and analysing network packets within the kernel. However, in 2014, extended BPF (eBPF) was introduced to the Linux kernel to overcome the limitations of its predecessor.
eBPF is a revolutionary technology that allows custom programmes to run dynamically inside the kernel without changing the code base.
Originally conceived initially as a packet filtering mechanism, eBPF has evolved into a versatile tool for observability.
It introduces a safe and efficient virtual machine that allows us to execute custom programmes directly in the kernel, capturing and analysing events, data, and metrics in real-time.
At the heart of all this lies the eBPF virtual machine, a sandboxed environment within the kernel where custom programmes can run - these programmes, written in a restricted subset of the C programming language, are loaded dynamically into the kernel and attached to specific events or hooks.
When triggered, the eBPF programmes execute and can perform a range of operations, including capturing data, modifying behaviour, or generating telemetry for observability purposes.
The ability to safely execute custom programmes within the kernel empowers us to gain unprecedented visibility into system behaviour without compromising stability or security.
An eBPF program must be able to store its state and share collected data. eBPF maps can help programmes retrieve and store information according to a range of data structures.
Users can access these maps via system calls, from both eBPF programmes and applications. Map types include hash tables or arrays, ring buffer, stack trace, least-recently used, longest prefix match, and more.
The Game-Changing Impact of eBPF
eBPF provides powerful observability and monitoring capabilities in the Linux kernel, allowing developers and system administrators to dynamically instrument the kernel, collect detailed data, and figure out system behaviour without modifying the kernel itself.
Here's how eBPF works in the context of observability and some applications that utilise it:
Tracing: enabling dynamic tracing of various kernel events, such as system calls, function calls, and kernel internals. By attaching eBPF programs to trace points or specific functions, developers can collect fine-grained information about the system's execution flow.
Monitoring: eBPF programs can be used to monitor and collect metrics about system resources, network traffic, disk I/O, memory usage, and more. By leveraging eBPF's low-level access to kernel data structures, developers can efficiently extract relevant statistics without incurring significant overhead.
Network analysis: by attaching eBPF programs to networking hooks, such as XDP (eXpress Data Path) or socket filters, developers can perform advanced packet filtering, implement custom network protocols, perform load balancing, or enforce security policies at wire speed.
Security: eBPF has gained popularity in the security domain. It can be used to implement security monitoring and enforcement mechanisms at the kernel level. For example, eBPF programs can detect and prevent malicious activities, such as system call abuse, kernel exploits, or unauthorized access attempts. By leveraging its dynamic nature and low-level access, security solutions can react quickly to emerging threats and adapt to changing attack patterns.
Performance analysis: eBPF enables deep insights into system performance by allowing developers to trace and analyse critical paths, latency, and resource consumption - this helps in fine-tuning system configurations, identifying inefficiencies, and improving overall system performance.
Several applications and tools leverage eBPF for observability and monitoring purposes, including:
BPFtrace: a high-level tracing language that simplifies the development of eBPF-based tracing tools.
Falco: an open-source behavioural activity monitor that uses eBPF for real-time threat detection in containerized environments.
Cilium: a networking and security project that uses eBPF to provide enhanced network visibility, security, and load balancing for container environments.
Prometheus: a popular monitoring and alerting toolkit that leverages eBPF to collect low-level system metrics for monitoring and analysis.
Sysdig: a container intelligence platform that utilises eBPF for monitoring and troubleshooting containerized environments.
tcpdump and Wireshark/Kubeshark: network analysis tools that can utilise eBPF-based packet filtering to capture and analyse network traffic efficiently.
These are just a few examples of how eBPF is used for observability and monitoring.
The versatility of this technology allows developers to create custom solutions tailored to their specific monitoring and analysis needs, making it a powerful tool in the field of system observability.
Real-World Examples
The impact of eBPF on observability becomes clear when we explore its real-world applications.
Netflix: Netflix uses eBPF to monitor network performance and detect latency issues. They developed their own eBPF tool called "FlameScope" to analyse and visualise performance issues.
Facebook: Facebook uses eBPF to monitor the performance of its web servers. They developed their own eBPF tool called "Katran" to perform load balancing and traffic management.
Google: Google uses eBPF to improve the performance of its network infrastructure. They developed their own eBPF tool called "gVisor" to provide a sandboxed environment for containers.
Cloudflare: Cloudflare uses eBPF to improve the performance and security of its network infrastructure. They developed their own eBPF tool called "eBPF Maps" to perform distributed denial-of-service (DDoS) protection.
Red Hat: Red Hat uses eBPF to monitor system calls and kernel events in its operating system. They developed their own eBPF tool called "BCC" (BPF Compiler Collection) to analyse and troubleshoot performance issues.
Exploring the Potential of Cilium and Hubble: A Hands-On Test
In our quest to understand the capabilities of the latest technologies, we decided to dive into the world of Cilium - an open-source solution designed for Cloud-native environments.
With Cilium, you can seamlessly enhance network connectivity, security, and observability without needing to modify your application code.
To see it in action, we installed Cilium on a testing Kubernetes cluster hosted on Google Cloud Platform.
One of the standout features of Cilium is its integration with Hubble, a distributed networking and security observability platform - built on top of Cilium and eBPF, Hubble provides deep visibility into service communication and network infrastructure.
Together, they offer control over traffic at various layers of the OSI model, allowing you to monitor TCP connections, DNS queries, and HTTP requests across clusters.
To put Cilium and Hubble to the test, we deployed a "Star Wars" themed application with Kubernetes Pods named after iconic spaceships. Implementing a CiliumNetworkPolicy, we carefully regulated access to the Deathstar spaceship. We then conducted tests to examine the effectiveness of this policy, and the results were fascinating. Some pods had direct access to the Deathstar, while others were restricted.
Allowed:
Denied:
If we also click on the entry, we can see more information regarding Source, Destination, Source IP, Destination IP, Protocol, L7 Info, etc.:
Conclusion
eBPF has opened a gateway to the kernel, enabling us to peer inside and unravel its mysteries. With eBPF, we can achieve unparalleled observability, optimise system performance, and enhance the reliability of our applications. As eBPF continues to evolve, the possibilities for its applications will only grow, making it an indispensable tool for both general audiences and DevOps professionals.