Meta Encodec uses AI to compress audio files significantly more than MP3

[11:20 Mon,21.November 2022 by Rudi Schmidts]

This sounds exciting: Meta / Facebook Research have presented an AI-based audio codec called Encodec, which (at least on the research paper) is really something. Compared to MP3 files, this codec is supposed to achieve a compression that is better by a factor of 10, especially at very low data rates and with comparable quality.

This would mean that MP3 audio files could be reduced to a tenth of their data storage requirements on average. This in turn would have an enormous impact on offline storage and audio streaming.

The structure of Encodec is strongly reminiscent of typical VACs/GANs. The compressor generates samples that a following discriminator classifies as real or reconstructed. The compression model then modifies its output until the discriminator considers all samples to be real. At the same time, the discriminator also learns to distinguish "real" from "reconstructed" more and more reliably. This interplay subsequently perfects the audio quality with minimal data input in the model.

The structure of Encodec (image source: Meta)

According to Meta, Encodec can thus reconstruct audio with a low bit rate (64 kb/s) without loss of quality and also has potential for further improvements. For example, they also trained a transformer-based speech model that could save another 40 per cent bandwidth while maintaining the same quality, if latency was not crucial, as it is in streaming. In other words, if the encoder does not have to work in real time, even greater compression gains are possible for MP3 audio applications.

Interestingly, special hardware is also not necessary for the application. A single CPU core is said to be sufficient for encoding and decoding with the new method in real time.

And that is still not enough. Meta has announced that it will also use AI to compress video more effectively in an upcoming research project.

deutsche Version dieser Seite: Meta Encodec komprimiert mit KI Audio-Dateien deutlich stärker als MP3