Expanding the Potential of LLMs with 1-Bit Quantization: The Cutting Edge of Speed and Memory Efficiency

Hello, this is Sakai from Artificial Intelligence Laboratory of Fujitsu Research. In this blog, I’ll introduce 1-bit quantization in an easy-to-understand way. The background of this technology lies in the growing size of generative AI models and the accompanying challenges in computational resources. Our AI research team has developed a groundbreaking solution—1-bit quantization—and even released it as open-source software (OSS). This article explains the background and the technology in simple terms.

Why Is Quantization Important?

Generative AI models, especially large language models (LLMs), now have hundreds of billions to trillions of parameters. At this scale, the memory and computational cost for inference and training become enormous. This is where quantization comes in.

Quantization is a technique that reduces memory usage and computational load by converting model weights and operations into lower precision. Typically, 8-bit or 4-bit quantization is used, but today’s topic is an astonishing 1-bit. You might think, “1-bit? That’s almost no information!”—but here’s where the cleverness lies.

What Is 1-Bit Quantization?

In 1-bit quantization, only the sign of the weight (positive or negative) is retained. In other words, weights are rounded to either +1 or -1. This dramatically reduces memory usage. For example, converting weights from 16-bit floating point to 1-bit theoretically reduces size to 1/16. This is impactful enough to make it possible to run massive LLMs on a laptop instead of a large server!

Benefits

Memory Reduction: Enables running ultra-large models on smaller devices.
Speed Improvement: Simplified operations lead to faster inference.
Benefit

Challenges

Accuracy Degradation: Extreme reduction in information can degrade model performance.
Training Difficulty: Learning with 1-bit weights is highly challenging.
Challenges

Fujitsu’s Approach

Despite its clear potential, 1-bit quantization has long been considered a difficult goal due to these challenges. Fujitsu overcame these hurdles by developing new algorithms and announced this breakthrough in a press release on September 8, 2025. The key lies in two proprietary algorithms: QEP and QQA.

QEP: A novel quantization algorithm based on theoretical insights that propagates quantization error across layers to prevent error amplification (accepted at NeurIPS 2025).
QQA: Quasi-Quantum Annealing, inspired by quantum mechanics’ interplay between continuous and discrete states, leveraging Fujitsu’s world-class optimization technology (accepted at ICLR 2025).

Result

Table presents the results showing how the proposed method, QEP, outperforms conventional Layer-wise PTQ and existing correction techniques in low-bit quantization. We evaluated multiple large language models at extremely low bit widths—2-bit, 3-bit, and 4-bit. With traditional methods, accuracy degradation significantly as the bit width decreased, and even existing correction techniques offered only limited improvement. In contrast, QEP, with its mechanism for compensating errors across layers, achieved the best results for all models and bit widths, delivering a substantial accuracy boost at 2-bit compared to conventional approaches.

Furthermore, by leveraging QQA, we achieved world-leading performance in 1-bit quantization.

The Future Enabled by 1-Bit Quantization

1-bit quantization is not just a “compression technique”—it has the potential to transform next-generation AI infrastructure.
We see breakthroughs in areas such as:

Edge AI Applications: Running massive models on smartphones and IoT devices.
Democratization of Large Models: Powerful models accessible even in resource-constrained environments.
Hardware Synergy: Expectation of ultra-fast processing on dedicated chips.

Summary

1-bit quantization pushes memory reduction and speed optimization to the extreme.
Fujitsu’s new approach overcomes the biggest challenge: maintaining accuracy.
Rapid progress is expected in edge AI and widespread adoption of large-scale models.

In the era of giant language models, low-precision techniques like this are becoming increasingly important.
The next AI revolution might just start with “1-bit.”
And yes, this technology is available as OSS—so go ahead and start quantizing!

Reference
Press Release
Fujitsu Research Portal

fltech - Technology Blog of Fujitsu Research

A technology blog where Fujitsu researchers talk about a variety of topics

One Bit Quantization Technology