Engineering teams building high-throughput systems know that scaling infrastructure is never just about speed, it’s about resilience. At extreme volumes, even widely adopted tools can become bottlenecks. That’s exactly what we ran into while developing a high-volume surveillance platform designed to handle sudden spikes of over 300,000 requests per second.
Our team was developing a platform for a VIP surveillance service. The goal was to ensure that busy clients — like Wall Street executives — weren’t bombarded with 101 notifications just because a squirrel ran past their house a few times a day. We had to build a system from scratch that could handle massive request volumes, process them very fast, and know where to direct them.
On a typical day, the system handled a large load. During certain events, such as Halloween, when many people triggered motion sensors, traffic increased to 300,000 requests per second. A single query could take 20 to 50 milliseconds or more. To address this, we implemented an additional caching layer in front of Redis.
That’s where eBPF came in.
How Did We Find It?
When we were exploring new cache solutions, there weren’t many real-world implementations with eBPF. We came across
Unfortunately, we didn’t have PhD-level engineers on our team. But we wanted to make this work, and make it work fast. So we dug into the paper, broke down its key insights, and figured out how to apply them to our own system.
It wasn’t easy, but once we successfully implemented it, the results spoke for themselves. It ended up changing how we approached caching overall.
So, What Exactly Is eBPF and Why Did We Use It?
eBPF is a Linux technology that allows small, high-performance programs to run directly inside the kernel. Normally, application code runs in user space, while the Linux kernel handles system operations like networking, disk access, and process management in kernel space. The problem is that switching between these two spaces — sending a request from an application to the kernel and waiting for a response — creates overhead.
eBPF allowed us to execute custom code directly in kernel space, avoiding the back-and-forth delays between user space and the kernel. This meant we could store the hottest cache lookups inside eBPF itself and eliminate unnecessary Redis queries altogether.
How Exactly a Cache in Front of the Cache Worked?
So, instead of hitting Redis for every request, we used small eBPF-based caches to store our most frequently accessed keys. These caches sat in front of Redis, catching the hottest data before it ever needed to reach our main caching layer.
Thanks to this, the 99th percentile latency reached 800 ms; the 95th percentile response time was also reduced. This led to lower latency, reduced infrastructure usage, and a more consistent experience.
But There Was Another Challenge
Even with this optimization in cache, we still had latency spikes at the 99th percentile. Our backend was written in Go, which, despite being performant, has a garbage collector that can introduce unpredictable pauses. We were monitoring everything with Victoria Metrics and had detailed Grafana dashboards tracking latency distribution, so the impact was obvious — whenever the garbage collector kicked in, we saw latency spikes.
Since optimizing around garbage collection didn't get us very far, we made the call to rewrite one of the most latency-sensitive services in Rust. Rust has no garbage collector, which made it a better fit for ultra-low-latency workloads. We initially rolled out the Rust-based service as a canary release. Then, we gradually increased traffic while monitoring for performance improvements. Once we confirmed that the latency spikes were gone, we fully replaced the Go with Rust.
To Sum Up
This led to a change in how we structure our cache design. Redis is no longer the first destination for every cache request; it now functions as a fallback layer. If a response is available through eBPF, the request does not reach Redis. This reduces Redis load and avoids repeated lookups.
The use of eBPF for caching goes way beyond our specific use case. As more companies run into Redis scaling challenges, I wouldn’t be surprised to see more teams adopting this approach — or even someone turning it into an open-source project.
In this case, the implementation reduced latency and infrastructure usage. The solution did not require additional servers, Redis instances, or engineering capacity. It just reused existing data with a different processing approach.