Prof. M.O.J. Hawksford
University of Essex, Centre for Audio Research and Engineering
The Acoustic Renaissance for Audio (ARA) have proposed  a multi-channel, high-resolution audio encoding format for use with the next generation of compact disc with further introductory discussion openly published [2, 3, 4]. The ARA proposal document has already been widely circulated to the audio industry and the following text assumes familiarity with the ARA proposal.
This report is prepared in response to a proposal to import bitstream code directly onto high-density optical discs. Although offering certain philosophical and economic merits we believe that there are fundamental flaws and significant system limitations in using bitstream technology for audio data storage. Specifically, bitstream fails to address the future technical aspirations required by the audio industry where advanced digital processing will be used to improve accuracy in electrical-to-pressure transduction and also three-dimensional sound reproduction. We therefore present a discussion of the reasons for preferring a system based upon PCM rather than bitstream coding.
Implicit to the ARA proposal is the use of uniform sampling and uniform amplitude quantization with dither, where specifications up to 24 bit at a sampling rate of 96 kHz are supported, a process designated linear PCM. It is a fundamental premise of our proposal that there will be no form of lossy perceptual coding employed and that each channel of the system will be bit-transparent from input to output. The only legitimate concession to psychoacoustics is made in the limitation in bandwidth and in dynamic range together with the option for using psychoacoustically motivated noise shaping to enhance the subjective resolution of low level noise, a process considered by the ARA committee to be a completely linear process.
We maintain that correctly implemented linear PCM implies only distortion attributable to band limitation and non-correlated random noise . In making this statement it is understood that a uniform quantizer with optimal dither is a completely linear (but noisy) process and that the use of an error feedback loop that encapsulates the pre-quantizer dither sequence to achieve spectral shaping of the noise is also a completely linear process. In such systems there is no correlation between signal and spectrally shaped noise, and as such fully meets the aspirations of the audiophile community.
The ARA proposal strongly supports the use of lossless or transparent data compression a process also termed lossless packing. Such techniques employ a predictive algorithm to encode efficiently the PCM data stream and to offer bit transparency across encoder and decoder. Algorithms exist which can track both short-term average and peak bit demands and can offer efficiencies in the order of a 2:1 data saving. It is also a characteristic of data compression that there is reduced correlation between bit patterns and audio data which should facilitate reduced levels of correlated jitter [6,7], which is a critical factor in high-resolution digital audio systems.
The ARA is in favour of extended bandwidth in digital audio and supports the work by Pioneer through their experiments in 96 kHz sampling. It can be shown that when such over-sampling is employed, transparent data compression gains in efficiency, where we estimate that the actual data rate can increase by as little as 1.3 compared with uncompressed audio data sampled at 48 kHz. This extension in audio bandwidth mirrors one of the principal advantages of bitstream coding. However, by reducing the audio band below 48 kHz there can be gains in compression efficiency; this does not occur with bitstream.
To summarise, a linear PCM system encapsulates the following processes:
In drawing comparisons with alternative coding strategies they should be benchmarked against the above attributes, which when correctly implemented result in a linear communications channel.
Bitstream coders require high over-sampling ratios in order to achieve an acceptable performance with a one-bit code. Conservative estimates suggest a minimum of 64 * Nyquist with a 5th-order architecture in the encoder, that is 64 * 48000 = 3.072 Mbit/s. For a high-resolution system it should compare with at least 20-bit PCM, in practice this implies a sampling rate significantly above 64 * Nyquist. A single high-resolution PCM channel with a 96 kHz sampling rate and using transparent data compression requires a bit rate of approximately 20 * 48000 * 1.3 = 1.25 Mbit/s. The PCM code is substantially more data efficient and also offers linearity together with a bandwidth extension normally considered to be a principal attribute of bitstream.
There is no opportunity to use transparent data compression with bitstream codes as is the case with PCM when conveying typical music signals. This contributes directly to code inefficiency, and the ability to reduce correlation between digital data and audio information is lost . Indeed, it is a primary attribute of bitstream that there is high correlation between bit pattern and audio data as signal recovery is implemented by processing the bitstream directly with a low-pass filter.
The discussion in 3.1, 3.2 indicate that bitstream is not an efficient code and therefore is extremely wasteful of disc storage. Although bitstream may be appropriate for a simple two-channel system employed with a high-capacity disc, the capacity limitation becomes unacceptable when the needs for multi-channel are included. The ARA document  should be consulted at this juncture to gain familiarity with the comprehensive multi-channel + 2-channel format that is proposed for the new high-density optical disc. This is especially relevant when the needs for compatibility with DVD is considered. We believe a disc that offers no DVD compatible attributes, and that does not support three-dimensional sound reproduction will have a limited and short-lived appeal in the next millennium.
There is a fundamental problem in guaranteeing exact linearity using bitstream coders based upon delta-sigma modulation, although we recognise the advances currently being made by chaotic architectures and the inclusion of dither. The problem arises because a 2-level quantizer, even with dither, cannot be considered to be linear, unlike the multi-level quantizer with dither. Consequently, linearization must depend on the use of negative feedback (i.e. noise-shaping feedback) to achieve an acceptable performance. Even then performance cannot be guaranteed for all signals, where at low level, correlated distortions (idle-channel sequences) can exist although they may be below the system noise level. At higher signal levels, there can be signal-dependent stability constraints which is a particular problem in higher-order coders. In such schemes, because of correlation, it is difficult to completely eliminate modulation noise. We accept that excellent performance is achieved by bitstream techniques, although in practical systems as we shall discuss, there would be the need for multiple cascading of bitstream converters with the potential for a build-up in distortion compared with PCM.
An alternative bitstream converter has been reported  that uses a combination of a linear quantizer with dither to guarantee linearity, together with 4th-order noise shaping. Conversion to a 1-bit code (typically from a 4-bit code) is then performed by an open-loop, optimal code conversion table which minimises spectral modulation. Potentially the system produces high resolution with no low-level correlated distortion. However, the bit rate is again very much greater than PCM, and although solving the problems of correlated and idle-channel distortion it is too bit inefficient.
Although there is a certain elegance to the bitstream approach which is attractive for a simple recording chain of a back-to-back ADC and DAC, this elegance is lost when the needs of the recording studio are considered, even when this is a relatively 'direct' audiophile process. Bitstream signals do not match the needs of signal processing operations such as addition, gain change and convolution. At the heart of all these mathematical operations is the need to convert signals at some point into a multi-bit format. The multi-bit signals then have to be re-translated to a bitstream code using a further noise shaper configured around a 2-level comparator . It is highly probable that such cascades of essentially non-linear processing will have audible consequences that will not withstand the scrutiny of the audiophile fraternity. It is, therefore, inconceivable that the recording industry would re-equip with processors and signal distribution systems that employ a bitstream format.
A PCM based system is more efficient for signal processing, high word lengths can be maintained (24 bit) and where re-quantization is employed, optimal dither and noise shaping can be used to guarantee linearity, a fundamental requirement of a high-resolution system. The lower bit rates inherent with PCM are also welcome for efficient signal distribution, as is the greater ease of multiplexing and frame synchronisation.
As the majority of recordings require some degree of signal processing, the simplicity of the bitstream approach is lost, as effectively the bitstream code would be computed at the output of a multi-bit recording complex.
The modern approach to sound reproduction is to use digital signal processing to enhance the performance and obtain greater accuracy in the reproduction chain. Processes such as digital equalization of linear loudspeaker errors have been reported  and the use of DSP in the implementation of loudspeaker crossover networks has already reached the marketplace, as pioneered by Meridian Audio UK in their range of audiophile and home-cinema products. To implement such systems based upon bitstream code is difficult and inefficient and is again at variance with the simple philosophy of bitstream.
It would appear that the bitstream approach only caters for an analogue world, where the cost of the player can be minimised as the need for over-sampling filters is removed and the actual DAC is very basic. This incompatibility is seen as a severe limitation where the bitstream audio data would be neither compatible with advanced digital replay equipment or with the broader application to three-dimensional sound reproduction which will also require digital processing to optimise performance. Such an approach does not match the ethos of an advanced high-resolution audio system that should be designed to match the future needs of the industry.
There is also the question of compatibility with other systems such as DAB, LaserDisc, satellite, all of which would require format conversion to/from PCM. Possibly the most significant of all is the incompatibility with computer systems, which are set to dominate the entertainment market where the digital accessing of audio data off high-density optical discs will be required.
In investigating the requirements of a high-resolution audio system, it is evident that jitter performance is paramount - an area already given wide discussion in the technical literature. A bitstream code that simultaneously contains high-amplitude and high-frequency noise is susceptible to jitter - where intermodulation with timing jitter can fold signal energy into the audio band and thus compromise performance. In this area, it is believed that bitstream is inherently more jitter-susceptible than multi-bit systems.
The use of bitstream suggests it would not be appropriate to use multi-bit ADCs and DACs as these would require signal format conversion. Consequently we believe the Sony proposal supports only a bitstream world of converters where although excellent results are achievable, this neglects the performance advantage offered by modern multi-bit converters that now find favour in many advanced audiophile products.
In addition, external digital interfacing in a bitstream system - especially where say 6 channels of data are needed - requires new interfaces that operate at substantially higher sampling rates. It is evident that in the area of interfacing, the more code-efficient PCM format offers a clear advantage.
This document has reviewed some of the salient features that we believe should be considered when deciding whether to use a bitstream or a PCM digital format. We believe the inherent advantages of a linear PCM system to be overwhelming both in the guaranteed performance parameters and in the convenience by which signal processing can be performed.
We recognise the advantage of bitstream in a basic system and the natural extension of ultrasonic bandwidth, however this is easily lost in post-processing where there is a danger of intermodulation with high-frequency audio and out-of-band shaped noise. However, the use of transparent data compression enables an efficient extension of the audible bandwidth, which moves the argument back in favour of PCM. We of course accept that bitstream converters have an important role in PCM-based digital audio.
Fundamentally, the next generation of audio disc should embody what we call the 'third paradigm' of audio, namely three-dimensional sound. We must also ensure a performance envelope that ideally extends beyond what is theoretically necessary or what today's technology can achieve. An advanced system should handle multi-channel information and set performance goals to which designers can aspire. It may be several years before the full potential of the system is realised in terms of both sound quality and the use of three-dimensional sound.
We believe the ARA proposal sets the guidelines for a clear evolutionary future, that gives compatibility with Red Book CD and DVD, with high-resolution 2-channel audio and finally high-resolution three-dimensional sound encoded into a linear format uncompromised by present-day assumptions of perception.
PCM with transparent data compression offers an uncompromised efficient code that is fully compatible with advanced recording and replay processes.
We consider that these advantages of PCM far outweigh the basic advantages of bitstream and we therefore recommend a losslessly-packed linear PCM system to you for formal adoption.
1 'A Proposal for the High-Quality Audio Application of High-Density CD Carriers' Acoustic Renaissance for Audio, April 1995
2 'Digital Frontiers', HiFi News, vol.40, no.2, pp. 58-59 &106, February 1995
3 'High Definition Audio', Stereo Sound, Japan, April 1995
4 'Extended-definition Digital Audio Systems for High-capacity CD', Hawksford, M.O.J., IEE colloquium on Audio Engineering, pp 2-1 to 2-12, 1st May 1995, digest no. 1995/089
5 Vanderkooy, J. and Lipshitz, S.P., 'Digital Dither: Signal Processing with Resolution Far Below the Least Significant Bit' AES 7th International Conference - Audio in Digital Times, Toronto 1989, pp 87-96
6 Is the AES/EBU/SPDIF Digital Audio Interface Flawed?, Dunn, C. and Hawksford, M.O.J., 93rd AES Convention, San Francisco, preprint 3360, October 1992
7 'Digital-to-analogue converter with low inter-sample transition distortion and low sensitivity to sample jitter and trans-resistance amplifier slew rate', Hawksford, M.O.J., JAES, vol. 42, no. 11, pp 901-917, November 1994
8 'A Comparison of Two-stage 4th-order and Single-stage 2nd-order Delta-Sigma Modulation in Digital-to-Analogue Conversion', Hawksford, M.O.J., IEE Conference on Analogue to Digital and Digital to Analogue Conversion, Conference publication 343, pp 148-152, Swansea, September 1991
9 'The one-bit alternative to Audio processing and Mastering', Angus, J., Proceedings AES Conference 'Managing the bit budget', London May 1994
10 'Efficient filter Design for Loudspeaker Equalization', Greenfield, R. and Hawksford, M.O.J. JAES, vol. 39, no. 10, pp 739-751, November 1991
See ARA document  for further detailed references.
Back to ARA Home Page