- Apple’s PICO image codec cuts bitrate by 2.3–3× compared to AV1, AV2, VVC, and JPEG-AI in large-scale user studies.
- The PICO image codec encodes a 12MP photo in 230ms and decodes it in 150ms on an iPhone 17 Pro Max.
- Unlike most learned codecs, PICO ships with cross-platform robustness guarantees — a rare and practically important distinction.
- Apple searched over millions of model configurations to jointly optimise perceptual quality and on-device runtime.
- Apple’s PICO image codec cuts bitrate by 2.3–3× compared to AV1, AV2, VVC, and JPEG-AI in large-scale user studies.
- The PICO image codec encodes a 12MP photo in 230ms and decodes it in 150ms on an iPhone 17 Pro Max.
- Unlike most learned codecs, PICO ships with cross-platform robustness guarantees — a rare and practically important distinction.
- Apple searched over millions of model configurations to jointly optimise perceptual quality and on-device runtime.
Apple’s PICO Image Codec Takes Aim at the Compression Status Quo
Apple has quietly published research that could reshape how images are compressed and stored on devices. The PICO image codec — short for Perceptual Image Codec — is a machine-learning-based compression system that the company’s researchers claim beats every major traditional and learned codec on perceptual quality metrics, while still being fast enough to run in real time on a phone. That’s a combination the field has been chasing for years.
The work comes from a team inside Apple Machine Learning, led by researchers including Kedar Tatwawadi, Parisa Rahimzadeh, Zhanghao Sun, and Oren Rippel, and is published as arXiv preprint arXiv:2605.05148. The paper’s full title — What Matters in Practical Learned Image Compression — signals that this isn’t just a benchmark-chasing exercise. Apple is explicitly asking a harder question: what does it take for a learned codec to actually ship?
What Makes the PICO Image Codec Different
Learned image codecs aren’t new. Researchers have been publishing neural compression systems for the better part of a decade, and several — including models from Google, Meta, and various academic groups — have demonstrated impressive rate-distortion performance on standard benchmarks. The problem is that almost none of them have made it into production. They’re too slow, too power-hungry, platform-specific, or optimised for metrics that don’t actually reflect what human eyes care about.
The PICO image codec takes a different approach from the ground up. Rather than optimising for PSNR or MS-SSIM — the mathematical distortion measures that dominate academic compression research — the team optimised directly for perceptual quality as measured by actual human observers. The results were validated through large-scale subjective user studies, the kind of methodology you’d typically associate with streaming video research at Netflix or YouTube, not image codec papers.
The headline numbers are striking. Apple’s team reports 2.3–3× bitrate savings against AV1, AV2, VVC, ECM, and JPEG-AI at equivalent perceptual quality. Against the best existing learned codec alternatives, the PICO image codec still delivers 20–40% bitrate savings. Those aren’t marginal gains — they’re the difference between storing two photos or three on the same amount of flash storage, or halving the bandwidth needed to deliver high-quality images over a mobile connection.
Speed That Actually Matters in the Real World
Raw compression efficiency is only half the story. The PICO image codec encodes a 12-megapixel image in approximately 230 milliseconds and decodes it in around 150 milliseconds on an iPhone 17 Pro Max. Those numbers deserve some context: most top-performing ML-based codecs can’t hit that speed even when running on an NVIDIA V100 GPU — a data centre card that costs thousands of dollars and draws 300 watts. The PICO image codec is doing it on a phone, on battery.
That’s the kind of gap that makes a technology transition feel real rather than theoretical. If Apple can decode a compressed image in 150ms on the A-series silicon inside an iPhone, the door opens to using this codec in actual camera pipelines, photo libraries, and content delivery — not just in research demos.
The encoding figure of 230ms is also worth dwelling on. Camera capture workflows are latency-sensitive; users notice when the shutter-to-preview delay creeps above a few hundred milliseconds. Fitting a neural codec into that window, at 12MP resolution, while running other compute tasks in parallel, is a genuine engineering achievement. It suggests Apple has done serious work on model architecture, quantisation, and hardware-aware optimisation — not just trained a big network and reported benchmark numbers.
The Practicality Problem in Learned Compression
There’s a broader story here about why learned image compression has taken so long to move from research to deployment. The academic community has produced impressive work — CompressAI, the models from Ballé et al. at Google, and more recent transformer-based approaches have all pushed rate-distortion curves in interesting directions. But production deployment requires more than good numbers on Kodak or CLIC benchmark datasets.
Real-world codecs need to be deterministic across hardware. They need to produce the same bitstream whether you encode on an iPhone, a Mac, a Windows PC, or a Linux server, and decode it correctly on any of them. This cross-platform robustness is something traditional codecs like JPEG, HEIC, and AV1 deliver as a matter of course — it’s baked into their specifications. Most learned codecs simply don’t guarantee it, because floating-point arithmetic behaves differently across GPU architectures and even across driver versions.
Apple explicitly calls this out as a distinguishing feature of the PICO image codec. Cross-platform robustness guarantees are built in — which suggests the team has either constrained the model architecture to avoid non-deterministic operations, or implemented a bitstream verification layer that catches divergence. Either way, it’s the kind of detail that separates a research prototype from something that could land in iOS or macOS.
How Apple Found the Right Model Configuration
One of the more interesting methodological claims in the paper is the scale of the architecture search. The team reports searching over millions of model configurations to jointly optimise perceptual quality and on-device runtime. That’s not a grid search over a handful of hyperparameters — it implies a systematic neural architecture search (NAS) process, likely running on Apple’s internal compute infrastructure, specifically targeting the performance characteristics of Apple Silicon.
This approach — co-designing the model and the hardware target — is something Apple has done successfully in other domains. The company’s custom chips, from the M-series to the Neural Engine in A-series processors, are designed with specific inference patterns in mind. Tailoring the PICO image codec’s architecture to exploit those units efficiently, rather than porting a general-purpose network and hoping for the best, is exactly the kind of vertical integration that gives Apple an advantage competitors running on commodity hardware can’t easily replicate.
The decision to search jointly over quality and runtime, rather than quality alone, is also telling. It reflects a pragmatic engineering philosophy: a codec that scores 0.5 dB better on a perceptual metric but takes 2 seconds to decode isn’t useful. The Pareto frontier between quality and speed is where real products live, and Apple apparently mapped it exhaustively.
What This Means for the Industry
If Apple ships the PICO image codec — or something derived from it — into a future version of iOS, the implications ripple outward quickly. iPhone cameras are used to capture an estimated 1.5 trillion photos per year. Even a modest reduction in average image file size translates to meaningful savings in iCloud storage costs, cellular data consumption, and backup times for hundreds of millions of users.
For the broader compression ecosystem, Apple publishing this research publicly is a signal worth watching. The company rarely shares this kind of work unless it’s either already in a product or heading there soon. The fact that the paper includes specific benchmark numbers tied to a named, unreleased device — the iPhone 17 Pro Max — suggests the PICO image codec is at least deep in the development pipeline, if not already integrated into internal builds.
Other platform owners will be paying attention. Google has its own learned codec research through teams at DeepMind and Google Research. Meta has published work on neural image and video compression. Microsoft has invested in codec research through Azure Media. None of them have announced anything with this combination of perceptual quality benchmarks, on-device speed, and cross-platform deployment guarantees. Apple, characteristically, appears to have done the work quietly and then shown up with receipts.
The learned compression era has been perpetually five years away from production. The PICO image codec looks like the clearest evidence yet that the wait might finally be over.
Source: https://apple.github.io/ml-pico/

