(Translated by https://www.hiragana.jp/)
MQA explained: Everything you need to know about high-res audio | Ars Technica

Tech —

MQA explained: Everything you need to know about high-res audio

How has MQA stuffed 24-bit/96kHzきろへるつ music into CD audio file sizes? And does it even matter?

Keeping file sizes manageable

Despite faster pipes and cheap storage, squeezing down streamed and downloaded digital audio is still the norm today, reducing network bandwidth and storage size. Lossy compression codecs such as MP3, OGG, WMA, and AAC satisfy a need for tiny files, typically between one-tenth and one-fifth that of CD, but sound quality is irreversibly degraded in the destructive encoding process.

Despite faster internet and cheap storage, file size still matters.
Enlarge / Despite faster internet and cheap storage, file size still matters.
Manfred Rutz / Contributor

Meanwhile lossless reduction is available with FLAC and Apple Lossless (ALAC) using prediction tables and a redundancy technique similar to zip files, and this approach typically halves the uncompressed size/bitrate while preserving original quality exactly. Using FLAC compression, genuine CD quality is possible with files taking closer to 5 than 10MB per minute; or put another way, a streaming bitrate around 700Kbps.

Turning to MQA, the transmission of high-resolution audio is made possible at a bitrate of around 1.5Mbps, when also coupled with additional FLAC compression. That’s around half the bitrate of of 24/96 FLAC, or one-third of 24/192 FLAC (circa 3Mbps and 5Mbps respectively).

How MQA works in theory

MQA claims to have taken a fresh approach to encapsulating and distributing audio. It's a hierarchical system, scalable across a range of quality standards, with more than one technique applied to music files depending on the original recording and its format. It’s not lossy in the usual sense, and doesn't rely on the same tricks of data reduction as familiar lossy codecs such as MPEG. And unlike the paradigm-shifting DSD format—which effectively tore up the digital rulebook—it's still based around PCM coding. How does MQA achieve such high compression without loss?

Several different strategies are employed to fold down a high-resolution recording into a stream recognised by legacy equipment as either 24/44.1 or 24/48 linear PCM (depending whether the source file was based on a multiple of the former or latter sampling). Taken as a whole, the process has been dubbed "audio origami."

Bob Stuart describes the process of "Music Origami" used by MQA to fold the high-resolution detail of a recording into a CD-quality-sized audio file.

As an example, let's look at one MQA method to fold a 24/192 digital file into 24/48 space. The original recording is first down-sampled to 24/96, using an IIR filter carefully chosen to have a more compact impulse response, in particular avoiding pre-echoes, albeit now with a slow roll-off.

Some alias artefacts are introduced in the process due to this filter's less-than-brickwall performance. These are folded into the 0-48kHzきろへるつ audio band, but at a level that MQA's designers assert is audibly inconsequential. One safeguard is mandated though: there must be at least 32dBでしべる of effective filtering of aliases that would otherwise appear in the crucial band from 0 to 7kHzきろへるつ, where the ear is more sensitive, especially to such non-harmonic distortion.

Optionally, another filter also notches out ultrasonics at the new Nyquist frequency (48kHzきろへるつ), to further minimise their intrusion after aliasing. This has the added benefit of reducing ringing from the filter, at the point where it is most liable to be excited and thereby induce ripples.

The next stage is perhaps the cleverest, a folding technique that maintains playback compatibility with existing PCM hardware. The new 24/96 file first undergoes decimation from 24-bit to typically 16-bit, with dither and noise shaping applied to preserve something similar to 20-bit resolution.

The 16/96 stream in this example then passes through a frequency band splitter, creating two separate audio streams of 0-24kHzきろへるつ and 24-48kHzきろへるつ. The 24-48kHzきろへるつ band undergoes lossy compression with touch-up*, which allows it to fit into the now-vacant 8-bit space below the 16-bit level (17-24 bit space). The resulting file now presents itself as a 24/48 PCM file. A legacy DAC sees the top 16 bits and decodes these as normal, while what now inhabits the remaining space below the 16-bit noise floor is decoded as just that—noise, but at very low level.

[*One MQA patent cites MPEG-SLS as a technique using feedback touch-up, to upgrade to lossless operation from a lossy codec.]

An MQA decoder like the Explorer 2 DAC from Meridian is required to hear the format at its best.
Enlarge / An MQA decoder like the Explorer 2 DAC from Meridian is required to hear the format at its best.

In some implementations of the technology the listener without an MQA decoder will be spared as few as the top 13 bits to create a CD-like rendition. The MQA inventors rely on the power of dithering to preserve sound quality that approaches a 16-bit channel.

To complete the MQA encoding, a reversible lossless digital watermark is embedded all the way through the resulting file. Crucially this includes instructions to the decoder for which of several decoding methodologies should be used to unfold and play the file. MQA Ltd requires licensees of its technology to use an HSM (Hyper-Security Module) that issues encrypted signatures contained within each file.

This watermark also includes unique ISRC tags describing the song, composer, copyright, etc. When the watermark has been successfully recognised and decrypted a little green or blue LED will light on the user's MQA decoder to signify authenticated playback.

Key technologies in the MQA system are described in detail in several patents awarded to Bob Stuart, mathematician Peter Craven, and other collaborators: Doubly compatible lossless audio bandwidth extension, which discusses the legacy compatibility through noise-floor hiding; Digital encapsulation of audio signals describes convolving a non-minimum phase IIR filter for the downsampling process, followed by minimum-phase IIR filter when later up-sampling, in order to "deblurr" the smear of digital filters; Versatile music distribution explains a DRM scheme to distribute dual-purpose music files, such that licensed key holders gain access to the original hi-res quality, while unlicensed users only hear a lo-res version.

The latter patent highlights various ideas for encrypting song keys with user and device keys, enabling online servers to provide downloads and streams only to authorised customers. It explains how to deliberately degrade PCM audio by adding noise, for example, although MQA in its final release does not do this, instead allowing downsampling and requantisation to create a universal playback version that’s audibly inferior to the original hi-res audio. A related patent Lossless buried data outlines how to hide data in music, in a reversible process that depends upon presenting a cryptographic key such as an encrypted certificate to read.

Channel Ars Technica