- Bijou64 is a variable-length integer encoding that guarantees exactly one representation per number, eliminating a whole class of security bugs.
- Unlike LEB128, this variable-length integer encoding makes canonical form a structural property, not a runtime check developers can accidentally skip.
- The encoding reads the entire length of a number from the first byte alone, allowing O(1) memory allocation versus LEB128’s byte-by-byte scanning.
- Canonicality attacks on formats like ASN.1 and PKCS#1 show what happens when encoding validation is an afterthought — Bijou64 takes a different path.
- Bijou64 is a variable-length integer encoding that guarantees exactly one representation per number, eliminating a whole class of security bugs.
- Unlike LEB128, this variable-length integer encoding makes canonical form a structural property, not a runtime check developers can accidentally skip.
- The encoding reads the entire length of a number from the first byte alone, allowing O(1) memory allocation versus LEB128’s byte-by-byte scanning.
- Canonicality attacks on formats like ASN.1 and PKCS#1 show what happens when encoding validation is an afterthought — Bijou64 takes a different path.
The Security Problem Hidden Inside Variable-Length Integer Encoding
A new variable-length integer encoding called Bijou64, published by the researchers at Ink & Switch, started life as a security project and accidentally became a performance story too. The engineers weren’t trying to beat LEB128 at its own game. They were trying to fix a structural flaw that affects nearly every popular varint format in production today — and the fix turned out to be faster as well.
The core problem is deceptively simple. Binary protocols frequently need to encode integers that are usually small but occasionally large. You don’t want to burn 8 bytes on every integer when most of them are tiny. Variable-length encodings solve the storage problem elegantly. But most implementations of variable-length integer encoding introduce a different problem: the same number can be written down in multiple ways, and that ambiguity has a long history of being weaponised.
Why LEB128’s Non-Canonical Forms Are a Real Threat
LEB128 is the most widely deployed variable-length integer encoding in the world. It shows up in WebAssembly, DWARF debug info, Protocol Buffers, and dozens of other formats. The basic idea is clean enough: encode a number as a sequence of 7-bit segments, using the high bit of each byte as a “more data follows” flag. Small numbers fit in one byte. Larger ones spill into two, three, or more.
But here’s where it gets uncomfortable. The number zero can be encoded as the single byte 0x00. It can also be encoded as 0x80 0x00. Or 0x80 0x80 0x00. The byte 0x80 is 1 0000000 in binary — it sets the continuation bit but contributes zero value bits. You can stack as many of them as you like before the final zero byte, and every decoder that doesn’t explicitly check for this will happily hand you back zero each time.
This isn’t just a quirk of zero. Nearly every number in LEB128 has multiple valid encodings. For most applications, that’s tolerable. For anything involving cryptographic signatures, it’s a potential disaster. When you sign a message, you sign specific bytes. If two different byte strings decode to the same integer, an attacker who knows this can potentially swap one for the other after signing. The signature still verifies. The data has changed.
The standard fix is canonicality enforcement: define one “correct” encoding for each number, and reject everything else at the decoder. This works in theory. In practice, it generates a category of security vulnerability that the industry has seen repeatedly — the canonical check gets separated from the core parsing logic, quietly removed during an optimisation pass, or simply never ported when the library gets reimplemented in a new language. It doesn’t break round-trip tests. It doesn’t break benchmarks. It only breaks under adversarial input, which rarely makes it into test suites.
The Ink & Switch team point to ASN.1 as the textbook example. The abstract syntax notation that underpins X.509 certificates, LDAP, and much of the internet’s security infrastructure has been plagued by canonicality-adjacent attacks for decades. PKCS#1 signature forgery attacks follow the same pattern: the spec says reject non-canonical input, one implementation misses the check, and suddenly you have a practical attack. Bijou64 is designed to make this entire bug class structurally impossible. Any variable-length integer encoding that relies solely on runtime checks to enforce canonicality faces the same underlying fragility.
How Bijou64 Makes Canonicality a Property of the Format Itself
The insight driving Bijou64 is that you don’t need a canonicality check if the format physically can’t produce non-canonical encodings. The goal is to design a variable-length integer encoding where each number has exactly one valid byte representation — the same way our everyday decimal system works. You wouldn’t write the number five as 005 in a context that treats leading zeros as meaningful.
Bijou64 achieves this through two interlocking tricks.
First Byte as Length Tag
The first byte of any Bijou64-encoded integer does double duty. Values from 0 to 247 are self-contained — the byte is the number, no further reading required. Values from 248 to 255 switch into a different mode entirely: they act as a tag, indicating how many additional data bytes follow. Tag 248 (0xF8) means one more byte. Tag 249 means two more. All the way up to tag 255 (0xFF), which signals eight additional bytes for the largest 64-bit values.
This is already a meaningful improvement over LEB128 from a pure performance standpoint. LEB128 decoders have to keep scanning bytes until they see one without the continuation bit set — the number of bytes to read is O(n) relative to the number’s size. With Bijou64, you know the complete length after reading a single byte. Memory allocation is O(1). The decoder can be implemented as a simple lookup table indexed on that first byte, which is exactly what the Ink & Switch team did. This structural choice is what separates Bijou64 from a conventional variable-length integer encoding that pushes length discovery into the decode loop.
Offsets That Seal Off Duplicate Representations
The tag system alone doesn’t solve canonicality. Without something more, you could have both 0xF8 0x00 (tag for one extra byte, followed by zero) and plain 0x00 both mapping to the value zero. That’s the same problem LEB128 has, just in a different costume.
Bijou64 closes this gap with offsets. Every multi-byte encoding is shifted upward by the count of values already covered by shorter encodings. The one-byte range handles 0–247, so any two-byte encoding must represent values starting at 248. The byte 0xF8 0x00 doesn’t decode to zero — it decodes to 248, because zero is already spoken for. Three-byte encodings start at 504 (248 plus the 256 values coverable in two bytes). The offset for each length tier follows a predictable pattern that falls out naturally from the structure, making the lookup table trivial to generate and verify.
The one genuine edge case is at the very top. The nine-byte encoding (tag plus eight data bytes) has enough headroom to represent numbers larger than the maximum unsigned 64-bit integer, thanks to the offset arithmetic. Bijou64 targets u64, so the decoder applies a single bounds check when it encounters tag 255. Crucially, this isn’t a canonicality check — every in-range value still has exactly one valid encoding. It’s just a range cap, the kind of bounds check that belongs in any decoder regardless of the encoding scheme.
Variable-Length Integer Encoding Performance: What the Numbers Actually Show
The Ink & Switch team are careful not to oversell the performance story. Bijou64 does more work than reading a plain 64-bit integer — there’s tag parsing, offset arithmetic, and the lookup table — so it’s naturally slower than fixed-width encoding for workloads where integers are always large. That’s expected and unavoidable.
Where it gets interesting is the comparison against LEB128 on realistic data. Because LEB128 works in 7-bit chunks and needs a continuation-bit scan, it pays a cost on both encoding and decoding that Bijou64 avoids. The first-byte length prefix means Bijou64 can skip directly to the right branch of its decode logic without scanning. On typical distributions of small integers — which is precisely the use case variable-length integer encoding is designed for — Bijou64 holds up well.
For security-focused binary protocols in particular, the performance trade-off inverts in Bijou64’s favour when you account for the cost that goes away: the canonicality validation that a responsible LEB128 implementation must perform on every decode. That check isn’t free. Bijou64 doesn’t need it at all.
What This Means for Protocol Designers
Bijou64 isn’t pitching itself as a universal replacement for LEB128. The Ink & Switch team are explicit that LEB128 is a sensible choice for many projects, and Bijou64 exists to solve a specific problem in a specific context — signed binary protocols where non-canonical encodings represent a real attack surface. Choosing the right variable-length integer encoding for your protocol is ultimately a question of what invariants you can afford to enforce structurally versus at runtime.
But the design philosophy is worth paying attention to more broadly. The security industry has spent years learning that runtime checks for structural invariants are fragile. They get stripped by optimisers. They get omitted in ports. They get skipped under time pressure. The more robust approach — when it’s achievable — is to make the invalid state physically unrepresentable in the format itself. Bijou64 is a clean example of what that looks like in practice for variable-length integer encoding.
As more infrastructure moves toward content-addressed storage, cryptographically signed data, and local-first applications where data syncs between untrusted peers, the pressure to get encoding canonicality right is only going to grow. The question protocol designers will increasingly face isn’t whether to enforce canonical forms — it’s whether to do it with a check that can be deleted, or with a structure that makes deletion irrelevant. A well-designed variable-length integer encoding should make the safe path the only path.




