Apple's PICO AI Codec Shrinks Images to One-Third the Size

Table of Contents

Apple’s PICO AI Codec Shrinks Images to One-Third the Size

For more than three decades, image compression technology has followed a familiar path: reduce file size while preserving mathematical measures of image fidelity. From JPEG to HEVC, AV1, and VVC, generations of codecs have focused on optimizing numerical metrics such as Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM).

Yet a fundamental problem has persisted throughout this evolution: the human eye does not perceive images the way mathematical formulas do.

A photo can achieve an excellent PSNR score while appearing flat, blurry, or unnatural to viewers. Conversely, another image with lower objective metrics may look richer, sharper, and more realistic. This disconnect has long been one of the most difficult challenges in image compression research.

Apple’s latest research project, PICO (Perceptual Image Codec), directly tackles this problem. Rather than optimizing for traditional metrics, PICO is designed around a different question:

What if image compression were optimized for human perception instead of pixel accuracy?

The results suggest a major shift may be underway in how future images are compressed, stored, and transmitted.

📸 The Industry’s Shift Toward AI-Based Compression
#

In February 2025, the Joint Photographic Experts Group (JPEG) officially released JPEG AI, the first international image compression standard built around end-to-end machine learning.

The announcement marked a significant milestone.

For decades, image codecs were largely handcrafted systems built from:

Transform coding
Quantization
Entropy coding
Carefully engineered heuristics

JPEG AI represented the industry’s acknowledgment that neural networks had matured enough to become part of a standardized compression framework.

However, even JPEG AI remains heavily influenced by traditional optimization goals and evaluation methodologies.

Apple’s researchers argue that the next frontier is not simply AI-powered compression—it is perceptual compression, where the ultimate judge is human vision itself.

🧠 Why Human Perception Is Harder Than Mathematical Accuracy
#

At its core, image compression is a process of selective forgetting.

Every compression algorithm must decide:

Which information to preserve
Which information to discard
How to make losses as invisible as possible

Traditional codecs minimize measurable pixel differences.

Human vision, however, operates differently.

People are especially sensitive to:

Text readability
Fine textures
Object boundaries
Facial features
Visual consistency

A mathematically accurate image can still feel “wrong” if these perceptual cues are degraded.

This mismatch explains why optimizing for PSNR often fails to maximize perceived image quality.

🚧 The Challenges Facing Learned Compression
#

Neural compression systems have been studied for years, but practical deployment has remained difficult.

Researchers have repeatedly encountered three major obstacles:

Performance Bottlenecks
#

Many high-quality neural codecs rely on computationally expensive entropy coding mechanisms.

Hallucinated Details
#

Perceptual models sometimes invent textures or structures that never existed in the original image.

Processing Artifacts
#

Tiled processing methods often introduce visible seams and inconsistencies.

Apple’s PICO project was designed specifically to address these challenges.

⚙️ Innovation #1: One-Shot Context Modeling
#

One of the most difficult aspects of image compression is entropy coding.

To compress efficiently, the codec must estimate the probability distribution of image data with high accuracy.

Traditional autoregressive approaches perform this estimation sequentially:

Analyze one region
Predict the next
Repeat continuously

While effective, this process is inherently slow.

Apple’s solution is a One-Shot Context Model.

Instead of repeatedly calculating contextual information, PICO predicts the most important entropy parameters in a single forward pass through the network.

This approach offers several benefits:

Near-autoregressive accuracy
Significantly faster execution
Better scalability on mobile hardware

According to Apple’s experiments, removing this component reduces overall performance by more than 10%, demonstrating its critical role in the system.

🔤 Innovation #2: Protecting Text Fidelity
#

One of the most common failures of perceptual image generation systems involves text.

Humans are exceptionally sensitive to textual distortions.

A tiny alteration in a single letter can immediately attract attention and make an image appear incorrect.

Generative models often struggle with this requirement because they prioritize visual realism rather than exact reconstruction.

To solve this issue, Apple introduced TextFidelityLoss.

How It Works
#

The system:

Detects text regions automatically.
Applies stricter reconstruction constraints.
Limits GAN-based hallucinations.
Preserves textual accuracy.

This targeted approach dramatically reduces reconstruction errors in text-heavy regions.

Experimental results show that text reconstruction errors were reduced by approximately 50%.

🧩 Innovation #3: Eliminating Tile Boundary Artifacts
#

Mobile devices must process images efficiently.

To achieve this, PICO divides images into tiles measuring approximately 504 × 504 pixels.

Tiling improves computational efficiency but introduces a new problem.

When tiles are reconstructed independently, visible seams may appear where adjacent regions meet.

These artifacts often manifest as:

Color inconsistencies
Brightness shifts
Boundary discontinuities

Apple addressed this challenge through a custom loss function called TilingArtifactLoss.

By enforcing consistency across multiple spatial frequencies, the system learns to maintain smooth transitions between neighboring tiles.

The result is a significant reduction in visible stitching artifacts.

📊 Measuring What Humans Actually Prefer
#

One of the most interesting aspects of the PICO project is its evaluation methodology.

Rather than relying exclusively on traditional benchmarks, Apple conducted extensive human preference testing.

Large-Scale Human Evaluation
#

The study involved:

610 screened participants
Color vision verification
Compression artifact recognition testing
Blind pairwise comparisons

Participants viewed reconstructed images from different codecs without knowing which codec produced which result.

The evaluation generated:

74,925 image comparisons
Bayesian ELO-style rankings
Human-centered quality assessments

This approach directly measures what matters most: how images are perceived by real people.

🏆 Compression Results
#

The results are particularly striking.

For equivalent perceived image quality, PICO reportedly requires only:

30% to 43% of the bitrate used by leading traditional codecs
One-third to one-half of the storage space required by competing standards

Compared against:

AV1
AV2
VVC
ECM
JPEG AI

PICO consistently achieved substantially lower file sizes while maintaining comparable visual quality.

Even when compared with state-of-the-art learned perceptual codecs such as HiFiC and MRIC, PICO reportedly achieved an additional 20% to 40% reduction in bitrate requirements.

📱 Real-Time Performance on Smartphones
#

Advanced neural compression systems often perform well in research environments but struggle in practical deployment.

Apple specifically focused on real-world usability.

On an iPhone 17 Pro Max, PICO reportedly achieves:

Encoding
#

12MP image
Approximately 230 milliseconds

Decoding
#

Approximately 150 milliseconds

These numbers are particularly impressive given that many competing machine-learning codecs require powerful server-grade GPUs to achieve similar performance.

The results demonstrate that perceptual neural compression is becoming practical for consumer devices.

📉 The Surprising Trade-Off: Lower PSNR
#

Perhaps the most revealing result is that PICO does not excel at traditional benchmarks.

In terms of PSNR, several competing codecs outperform it.

At first glance, this appears contradictory.

However, it actually reinforces the paper’s central thesis:

Optimizing for mathematical accuracy and optimizing for human perception are fundamentally different objectives.

PICO deliberately sacrifices some pixel-level fidelity to preserve visual characteristics that humans care about more.

The research suggests that future compression systems may increasingly prioritize perceptual quality over conventional numerical metrics.

🎨 Not Perfect for Every Image Type
#

Apple’s researchers openly acknowledge the limitations of the system.

PICO performs less effectively on highly structured synthetic content, including:

Cartoons
Diagrams
Technical illustrations
Computer-generated graphics

These images often benefit from traditional rule-based compression methods because their structure is highly predictable.

Perceptual generation techniques are most effective when handling natural photographs and visually complex scenes.

🌊 From WaveOne to Apple
#

The corresponding author of the paper, Oren Rippel, is a familiar name within the compression research community.

His work first gained significant attention through WaveOne, a startup focused on neural image and video compression.

WaveOne’s early research demonstrated that machine learning could outperform traditional codecs while maintaining practical performance.

Subsequent projects included:

Real-Time Adaptive Image Compression
ELF-VC neural video compression
High-efficiency learned codecs

Apple later acquired WaveOne’s core team, bringing years of compression expertise into its machine learning organization.

PICO represents one of the first major public outcomes of that effort.

🔮 A Glimpse Into the Future of Image Compression
#

The significance of PICO extends beyond a single codec.

For decades, image compression research has largely focused on making benchmark numbers look better. Improvements were measured through PSNR curves, bitrate reductions, and increasingly sophisticated engineering techniques.

PICO represents a shift in philosophy.

Instead of asking:

“How closely does the reconstructed image match the original pixels?”

it asks:

“How closely does the reconstructed image match what humans perceive as visually authentic?”

That distinction may ultimately prove more important than any compression ratio.

As smartphones continue capturing larger images and cloud services handle ever-growing volumes of visual data, perceptual compression could become one of the most important enabling technologies of the AI era.

Most users may never know whether a future photo was compressed using JPEG, JPEG AI, or PICO. Yet behind every shared image, uploaded photo, and cloud backup, increasingly sophisticated AI systems may be making human-centered decisions about which details deserve to be remembered—and which can quietly disappear without anyone noticing.