- Edge anomaly detection on ARM hardware is possible without cloud dependencies, GPUs, or heavy ML runtimes.
- A developer built a working edge anomaly detection tool called Cerberus using eBPF and applied statistics.
- The project evolved through five model versions, from basic statistical thresholds to tiny autoencoders running offline.
- Explainability — not raw accuracy — turned out to be the most critical design requirement for operators in the field.
- Edge anomaly detection on ARM hardware is possible without cloud dependencies, GPUs, or heavy ML runtimes.
- A developer built a working edge anomaly detection tool called Cerberus using eBPF and applied statistics.
- The project evolved through five model versions, from basic statistical thresholds to tiny autoencoders running offline.
- Explainability — not raw accuracy — turned out to be the most critical design requirement for operators in the field.
When Cloud-Native Assumptions Hit a Brick Wall
Edge anomaly detection isn’t supposed to be this hard. But drop any cloud-native observability stack onto an ARM gateway sitting at a remote industrial site and the assumptions collapse fast. That’s exactly what happened to developer Mohamed Zrouga, who documented the problem — and his solution — in a widely shared post on Dev.to. His project, Cerberus, started not as an ambition but as a necessity.
The tooling he tried kept failing in the same ways. Prometheus needed a LAN that never goes offline. ML inference endpoints sat in cloud regions the device couldn’t reach. Collectors competed with the monitored workload for CPU cycles. The gear he was deploying on had maybe 512MB of RAM. That’s not a software bug. It’s a fundamental mismatch between where modern observability tooling was designed to run and where the actual devices are.
This is a problem that’s only getting more urgent. The number of connected industrial and IoT edge devices is growing fast — Statista projects over 29 billion IoT devices by 2030 — and the assumption that they’ll all have a reliable cloud backhaul is increasingly untenable. Security monitoring at the edge is one of those problems the industry keeps kicking down the road. Effective edge anomaly detection at this scale demands a fundamentally different approach.
Reframing the Question: What Does Edge Observability Actually Need?
Zrouga’s real breakthrough wasn’t technical — it was conceptual. He stopped asking “how do I run this tool on constrained hardware?” and started asking “what does edge observability actually need to answer?” The answer, stripped of enterprise tooling bloat, turned out to be surprisingly small.
Did traffic behavior change? Is something probing unusual ports? Are protocol patterns different from yesterday? Is there unexplained traffic acceleration? Which specific device changed? That’s the full list. Not distributed tracing. Not full packet capture. Just behavioral signals.
Once you frame edge anomaly detection that way, the architecture gets dramatically simpler. You don’t need to capture everything — you need to capture the right things, cheaply. This reframing is what makes edge anomaly detection tractable on hardware that would choke under a traditional observability stack.
eBPF: The Kernel Already Sees Everything
The technical foundation Cerberus is built on is eBPF, the Linux kernel technology that’s been quietly reshaping systems observability over the past several years. Companies like Cloudflare, Meta, and Netflix have been running eBPF in production for years for networking and security use cases. But those deployments typically assume powerful x86 servers. Zrouga’s contribution is applying the same approach to ARM edge hardware with real resource constraints.
The key insight is that the kernel already sees every packet, every connection, every TCP flag — before any userspace process touches them. eBPF lets you attach small programs directly to the network stack using TC (Traffic Control) or XDP hooks. Instead of running a packet capture tool through a pipe, or copying full payloads into userspace for analysis, you write a kernel-side filter that extracts only the metadata you care about and delivers it via a ring buffer.
For Cerberus, that’s roughly 208 bytes per network event: source IP, destination IP, ports, TCP flags, event type (ARP, TCP, UDP, DNS, TLS, HTTP, ICMP), and the first 128 bytes of the L7 payload for application-layer inspection. The kernel filters. The ring buffer delivers. Userspace gets a clean event stream at near-zero overhead — no full payload copies, no competing agent processes. On an ARM device where every CPU cycle matters, this difference is measurable, not theoretical. It’s also what makes continuous edge anomaly detection viable without draining the device’s limited resources.
Edge Anomaly Detection Without a Neural Network
Here’s where it gets interesting. Zrouga is upfront that he’s not an ML engineer. What he built, he calls “ML-Lite” — applied statistics with some online learning on top. That’s not false modesty; it’s a deliberate design philosophy with real engineering justification.
The instinct when building anomaly detection is to reach for a neural network. On constrained hardware that’s a dead end for two reasons: resource cost and explainability. An operator getting paged at 2am doesn’t want a 0.87 confidence score. They want to know what changed and why. That requirement shapes every technical decision downstream.
The system compresses the event stream into a feature vector every 30 seconds: packet rate, DNS rate, TLS rate, SYN rate, traffic entropy, and unusual port counts. Each 30-second window becomes a compact behavioral snapshot. As snapshots accumulate, the system learns what normal looks like using three statistical tools.
First, Median Absolute Deviation (MAD) rather than standard deviation. MAD is resistant to outliers in a way mean and stddev aren’t — a single traffic spike won’t shift the entire baseline. Second, Exponentially Weighted Moving Average (EWMA) gives recent windows more influence than old ones, so the baseline adapts slowly without being thrown off by short anomalies. Third, centroid distance tracks how far the current feature vector sits from the center of historical observations.
Entropy is particularly clever. Normal traffic hits the same handful of ports repeatedly — HTTP on 80, HTTPS on 443, SSH on 22 — so port entropy stays low. A port scan touching dozens of ports in sequence causes entropy to spike. The math is straightforward Shannon entropy, but applied to destination port distributions it becomes a sensitive and explainable signal for reconnaissance activity. This is edge anomaly detection through statistical insight rather than brute-force computation.
Five Versions, One Constraint
The Cerberus detection model went through five iterations. Each one added capability without discarding what came before — and critically, none of them introduced a cloud dependency.
Version one was pure statistical detection: medians, MAD, thresholds, entropy. It worked, but generated noise on IoT networks where traffic patterns are genuinely chaotic. Version two introduced adaptive learning — rolling baselines and per-device profiles using EWMA. False positives dropped significantly once the baseline had enough history to work with.
Version three added Isolation Forest, an unsupervised machine learning algorithm that doesn’t need labeled attack data. It isolates outliers by randomly partitioning the feature space — anomalies are easier to isolate because they sit in sparse regions. Effective for genuinely novel patterns the statistical rules hadn’t seen. Each iteration brought the edge anomaly detection capability closer to something reliable in the wild.
Version four introduced tiny autoencoders — neural networks so small they run comfortably on ARM CPUs. The architecture Zrouga settled on is 9→16→4→16→9: nine input features compressed through a bottleneck of just four dimensions, then reconstructed back to nine. The bottleneck forces the network to learn a compact representation of normal behavior. When a current window can’t be reconstructed accurately, that reconstruction error is the anomaly signal. It’s a technique borrowed from much larger ML systems, miniaturized here to run offline on a device with limited RAM.
Version five is still in progress: temporal graph ML, modeling relationships between devices over time rather than analyzing each device in isolation. That’s a meaningful conceptual jump — from “this device is behaving strangely” to “this device is behaving strangely relative to what the rest of the network is doing around it.” It’s the kind of context that catches lateral movement attacks that per-device baselines miss entirely. For edge anomaly detection, that network-wide perspective represents the next frontier.
The Tradeoffs Nobody Talks About
Zrouga is unusually candid about where Cerberus falls short, which makes the project more credible, not less. Three problems don’t have clean solutions yet.
Baseline drift is the quiet killer. Normal behavior changes — a device starts a new scheduled task at 3am, a firmware update changes its communication patterns. Until the baseline catches up, you get false positives. EWMA helps, but there’s an inherent tension between adapting fast enough to stop the noise and adapting slowly enough to actually catch anomalies. This is one of the hardest open problems in edge anomaly detection generally, not just in Cerberus.
Encrypted traffic creates inspection limits. TLS SNI is visible at the handshake, which gives you the destination hostname. But payload content is opaque. The entropy signals on port distributions still work, but deep application-layer inspection has a hard ceiling imposed by encryption — and that’s actually a good thing for privacy, even if it complicates detection.
Noisy IoT environments remain the hardest problem. Some IoT devices have traffic patterns that are genuinely irregular by design — polling at random intervals, broadcasting to unusual addresses, speaking obscure protocols. Per-device behavioral profiles help, but they need enough history to be meaningful, which means new devices have a cold-start problem.
Why This Matters Beyond One Developer’s Project
The broader implication here isn’t about Cerberus specifically. It’s about a gap that the security industry has mostly ignored: edge anomaly detection in resource-constrained environments is an unsolved problem at scale, and the current answer — “use the cloud” — doesn’t work for industrial sites, remote infrastructure, air-gapped networks, or anywhere latency and connectivity are unreliable.
What Zrouga has done is demonstrate that a meaningful edge anomaly detection capability can be built from eBPF plus applied statistics plus carefully chosen lightweight ML, all running offline on an ARM device. It’s not a finished enterprise product. But it’s a proof of concept that challenges the assumption that serious network security monitoring requires serious hardware. As edge computing continues to push workloads further from the data center, that challenge is going to matter more, not less.
Source: https://dev.to/zrouga/im-not-an-ml-engineer-i-built-one-anyway-3feb



