Optimizing AI performance on Synaptics’ Astra™ platform with extreme low-bit quantization.
As AI continues to move from the cloud into everyday devices, the ability to run models efficiently on the Edge is becoming increasingly important. Whether it’s voice interfaces or real-time data processing, Edge AI promises a wide range of capabilities. Delivering those capabilities within the constraints of embedded systems, however, remains a challenge.
ENERZAi has partnered with Synaptics to address these challenges. Known for their advanced Edge processing platforms, Synaptics provides a foundation for deploying optimized AI models. Together, we’re focused on making high-performance AI more practical for real-world Edge applications.
Making AI Inference Lighter and More Efficient
ENERZAi is focused on improving inference performance through model compression and optimization. Our software engine, Optimium, is designed to run trained models on devices with limited compute, memory, and power. A key part of this approach is extreme low-bit quantization. While many AI systems use 8-bit or 4-bit quantization to reduce model size, our method reduces to just 1.58 bits. This allows for significantly smaller models and faster inference.
Deploying Whisper on the Synaptics Astra SL1680 Platform
In our work with Synaptics, we applied 1.58-bit quantization to OpenAI’s Whisper small model and deployed it on the Astra SL1680 processor. With its quad-core 2.1GHz Arm® Cortex®-A73, Astra provides the right balance of compute and efficiency for Edge AI applications.
The results highlighted how optimized inference and advanced quantization can work together:
- The quantized model achieved a Word Error Rate (WER) of 6.38 percent, compared to 5.99 percent for the FP16 baseline
- 4x reduction in peak memory usage compared to FP16
- 2x Inference latency reduction for a 9-second audio input as compared to the full-precision version
These gains are significant for real-world Edge applications, enhancing system stability and user experience, especially in environments where multiple AI workloads need to run in parallel.
Partnering to Advance AI at the Edge
Synaptics and ENERZAi’s partnership advances Edge AI, combining compression technology with the robust capabilities of the Optimium engine. The versatile CPU, GPU, and NPU subsystems within the Astra SL1680 make Edge AI more responsive, efficient, and deployable across a range of applications.
For more details, read the full solutions brief on Running Extreme Low-Bit Models on IoT Edge Devices here: Running_Extreme_Low-Bit_Models_on_IoT_Edge_Devices_4.pdf