Confidentiality. Version 1.2, 23 June 1995. This document has been prepared by the Technical Subcommittee of Acoustic Renaissance for Audio. An earlier version (1.1, released 12 April 1995) was circulated to members of Acoustic Renaissance for Audio and to members of the syndicates developing high-density CD and related items. With this revised version, the circulation is extended to other members of the audio community. This circulation amounts to public-domain distribution. In order that the issue be correctly reported, we request prior notification or consultation if any of this material is to be discussed in the Press.
1995 J. R. Stuart for Acoustic Renaissance for Audio
This document has been prepared as a serious proposal to all concerned in the future of high-density CD formats applied to pure audio.
Its authors present here some consensus views and arguments leading to three important outcomes.
The first is a proposal for a new data type for the compression layer in the MPEG system stream encoded on the high-density discs. This data type carries losslessly compressed linear PCM (packed) audio, and is quite distinct from the currently recognised streams carrying lossy-compressed (data-reduced) channels such as MUSICAM, MPEG Audio, PASC or AC-3.
The second is a working proposal for a disc format, which in its essentials is:
Our proposed format allows from two to eight channels of high-quality linear PCM audio, each of up to 24-bit at 48kHz or 20-bit at 96kHz sampling rates or a combination of both.
The proposed format gives the disc producer the flexibility to make trade-offs between the number of active channels, their bandwidth, their precision, and overall playing time. This flexibility is illustrated in section 11.1.
Compatibility for two-speaker stereo is provided by always assigning two of the eight channels to a two-channel mix.
For more advanced applications, the remaining six channels can be used for surround-sound mixes - including those carrying Ambisonics information. In this way record producers can simultaneously issue their preferred mixes for two to six channels.
We recognise that there are several benefits to including the lossy-compressed version of the audio on the disc alongside the high-quality audio versions. In particular, this approach allows minimum changes in the player architecture and gives very good compatibility for budget systems.
Some suggestions are made about using the high-quality packed (losslessly compressed ) audio with MPEG2 video for some applications - such as high-quality music-videos - or with other data such as text and graphics. In essence, we see no reason why the disc format should not permit four types of data simultaneously in the compression layer, namely high-quality audio, lossy-compressed audio, MPEG1/2 video, and associated data.
The final outcome is an expression throughout the document of concerns, items we considered and items we recommend be incorporated into any resulting standard.
A glossary is included in section 15. Back to contents
In January 1995 two syndicates released information about high-density CD formats under development. Both the Sony/Philips and Toshiba/WEA syndicates propose a CD carrier for movie delivery based on MPEG2 variable-rate picture coding and a multichannel/multilingual compressed audio code, using either, MUSICAM (MPEG) or Dolby AC-3.
Both syndicates indicated a willingness to discuss other applications of the disc with interested parties; these applications include broadcast, multimedia and pure audio.
This document has been written to appeal to both syndicates on the subject of a high-quality pure-audio application of high-density CD, and to raise awareness in the audio community of the issues involved. We refer to these formats as HQAD (High Quality Audio Disc) to distinguish them from the current Red Book CD-DA standard. Back to contents
The purpose of this document is to present the consensus opinion of a sector of the audio industry on suitable format(s) for a high-density CD intended to deliver sound only (HQAD).
The following were important starting points in our thinking:
This document is a consensus proposal from the Technical Subcommittee of Acoustic Renaissance for Audio, a free body dedicated to advancing audio quality.
The members of this committee, their advisors and all affiliations are appended in section 13.
In particular, opinions have been obtained from these industry sectors:
Indications from the audio-recording community are:
Indications from the player-manufacturing community are:
The audio community is putting considerable effort into pushing the limits of resolution of the current CD-DA channel, using techniques such as noise-shaping, pre-emphasis and buried-data techniques.         
An overwhelming requirement of the audio community is to have a carrier that does not use data-reduction methods which essentially throw away those parts of the audio data that are argued to be inaudible, such as MPEG, PASC or AC-3. The HQAD should use linear PCM encoding as a basis.
There is a general consensus that 16-bit 44.1kHz linear PCM is inadequate. The question is, how do we balance pushing the envelope outwards?
It would seem to be reasonable to design the channel according to the capabilities of the human receiver, and an obvious set of parameters would accurately encode the entire auditory range of the listener, namely:
Here are some relevant factors.
There is a requirement to sample at a higher rate than that used in the CD-DA, justified as follows:
In addition, we have to remember that the major investment in recording, production and broadcast is in machinery sampling at 48kHz.
While there may be some disagreement about the desirable frequency or dynamic ranges of an audio carrier, no-one can be in any doubt that there are considerable and immediately obvious benefits to reproducing sound from more than just the frontal horizontal quadrant.
There is an increasing awareness of and demand for the increased realism that results from taking 'stereo' beyond two speakers. There is currently considerable interest in surround-sound techniques, and this is fuelled by the better source in CD-DA and DSP. 
One of the aims of this proposal is to provide a full-dynamic-range option for those consumers who are in a position to exploit it. However there are circumstances in which, due to high levels of background noise or restricted loudness, the dynamic range may usefully be reduced. Under these circumstances the decoding equipment could control the dynamic range in a manner sympathetic to the programme material.
It has been shown in  and  that such a control can be provided by analysing the audio during the production process. The full dynamic range is conveyed on the disc, along with control data which can be used to apply dynamic-range reduction when required. This broadcast technique also has advantages in the HQAD application:
Such a process has already been developed and included in the specification of the EU147 Digital Audio Broadcasting (DAB) system. Back to contents
The following DVD audio features are assumed:
The pure audio disc (HQAD) needs to be able to work simply with these speaker layouts. We suggest that five full-bandwidth channels are the minimum required.
We strongly recommend that music should not be recorded to layouts with a mono subwoofer, since single-subwoofer replay is very inferior in terms of energy response and spaciousness. The subwoofer feed should be generated by the end-user's equipment in the equivalent of the surround decoder function. This function will be present in equipment capable of decoding Dolby Surround and should be user-selectable according to customer preference or the capabilities of the loudspeakers.
We have determined that it is possible to take an Ambisonic W, X, Y (and Z if necessary) set and to 'decode' these to provide signals for recording onto the HQAD, and therefore, for reproduction via a standardised five-speaker arrangement.
By this method a soundfield recording or mix can be played simply on a standard 5.1 speaker layout.
For more advanced or higher-performance installations, the five feeds can be decoded back to W, X, Y (and Z) for re-operation into other layouts.
In the current cinema 5.1 systems, five full-bandwidth channels are augmented by a 0.1 (200Hz bandwidth) bass-effects channel. We suggest that this be used as a channel which adds low-frequency power-handling for special contemporary or experimental material. (It is not required for normal acoustic recordings.)
This sixth channel should be defined as a full-bandwidth channel. It could then be flagged in associated data/subcode as an Effects channel, to be used:
In the event that this channel is unused, or carries only low-frequency information, the data rate on the disc will automatically be reduced by the packing method proposed. Therefore, on material not requiring height information, longer playing times or higher resolution can be chosen at the producer's discretion.
In all audio systems there is an issue of down-compatibility.
In considering how two-speaker systems will play surround recordings, we see a number of options and pointers. One less desirable option for two-speaker listening is to use the CD-DA version of the recording. We feel that this option penalises the two-speaker listener, who does not gain access to the highest sound quality. In any event, producers will need to provide a Lt, Rt mixdown for the CD-DA release, and we have examined alternative methods for making this available along with the high-quality surround mix.
We have considered the following options for providing a two-speaker feed:
We are firmly in favour of option 3, so long as packing (lossless compression) is used and the options for full surround are not ignored. Although many sophisticated schemes can be considered for downmixing matrices operating in the players, this strategy will always lead to more working difficulties in production and in replay-hardware design.
However, option 2 remains useful when minimising audio data rate is paramount, as in the case of sending three or five high-quality audio channels with MPEG-2 video.
It would be very advantageous to evolve to a single inventory disc, by releasing HQAD discs with high-density information on one layer and a conventional CD-DA on the other.
Both the MMCD and SD proposals permit such mixed-mode discs.
In both cases the CD-DA would be placed on the back layer (furthest from the objective) while the HQAD would be on the nearer layer of a two-layer disc. Back to contents
Although on the face of it 55kHz is the minimum sampling rate necessary to encode all audible sounds, we recommend a specification that is based on multiples of the 48kHz rate found to be standard in professional audio and in AC-3. We see an advantage in permitting the development of quality improvements that highersampling rates would bring, and suggest that 48kHz and 96kHz are the only options required.
Although at first sight 96kHz may appear to be grossly wasteful of data rate, this is not the case if packing (lossless compression) is used. In section 10.2 we point out that the packed data rate for 96kHz sampling may typically be only 30% greater than that for the same material sampled at 48kHz, and can be less if the full 40kHz bandwidth offered by 96kHz sampling is reduced.
Having carefully considered all the factors, and assuming lossless packing to be used, we conclude that it is not necessary to cater for compromise sample rates such as 60kHz, 66.15kHz, 72kHz or 88.2kHz.
We further propose that MPEG packed-audio streams be defined for 48kHz and 96kHz and that these are both always present on the disc. In the majority of cases, one or the other will be sent null data, but there are circumstances in which both may be required and this structure allows for more flexible evolution to 96kHz operation on the part of hardware and software providers. Back to contents
In view of the known dynamic range of human hearing, recording spaces and analogue electronics, we feel comfortable in recommending channels that can obtain the audible equivalent of 21.5-bit precision when noise-shaping or a combination of noise-shaping with pre- and de-emphasis is used.
As a guideline, this would imply a requirement of:
Because pre-emphasis is not helpful to packed audio channels, and because a 14-bit specification is unlikely to find favour, 16 bits at 96kHz is the minimum practical alternative - one which fits in well with existing machinery and interfaces. Back to contents
Decisions about numbers of channels, precision and sampling rate converge on a 'bit budget'.
Linear and psychoacoustically correct coding methods are known which can improve the performance of linear-PCM channels. The two principal methods are noise-shaping and pre- /de-emphasis.    
Noise-shaping can be made open-choice at the discretion of the recording producer.
Pre- /de-emphasis requires standardisation. Although there are better choices, the standard CD-DA 50/15s will have to be provided by any player capable of playing CD-DA as well as HQAD, and so should not be ruled out as an option.
We strongly recommend that the high-quality audio channels be losslessly coded (packed). Signal processing has advanced to the state where the data-reduction benefits of such coding are too good to pass by. Unlike perceptual or lossy data reduction, lossless coding does not alter the final decoded transmitted signal in any way, but merely 'packs' the audio data more efficiently into a smaller data rate.
Existing lossless audio data compression systems are optimised for reducing average data rate, but not for reduction of peak data rate or for optimum results at high sampling rates such as 96kHz. We have determined simple-to-decode methods optimised for these latter requirements.
The process of packing PCM becomes more efficient as sampling rate is increased. For example, packed 96kHz audio does not double the data rate of packed 48kHz as you would expect; the increase is more like 30%.
Packing offers the opportunity to make a much better product. It allows us to convey more precision on more channels, but also gives a lot of open-ended flexibility to the user - as can be seen in some of the examples quoted in Table 3.
We are aware of relatively simple-to-decode packing and unpacking techniques that should allow the lossless data compression shown in the table below for five or more associated channels. We anticipate that higher compression rates can be obtained with development over a relatively short period.(Compression is shown as the saving in bits per sample per channel).
Table 1 Data-rate reduction: bits/sample/channel Sampling kHz Peak Average 48 0 6 96 5 8
Different musical material compresses by different amounts with lossless packing, with material having narrow dynamic range and high treble energies compressing less well. Lossless coding algorithms can be chosen with less compressible material in mind, giving an overall improvement in degree of data-rate reduction. The degree of data-rate reduction will be greater for highly compressible audio material such as most classical music - for which absolute disc duration is most critical.
We have determined that, for packed channels, the use of pre- and de-emphasis gives no advantage in coded-data rate for a given noise performance. Therefore, pre- and de-emphasis are only of benefit when used with linear PCM channels that are not losslessly coded, and should not be used with losslessly coded channels.
Psychoacoustic noise-shaping of the PCM audio channels may be used, along with lossless coding, to create a packed channel with perceptual improvements of about 3 bits at 48kHz and 5.5 bits at 96kHz.
For 96kHz recordings some noise-shaping is encouraged, in order to optimise the subjective dynamic range and overall data rate.
It is possible to design the lossless-coding specification in such a way that at the mastering stage the record producer can make a personal trade-off between playing time, frequency range, number of active channels and precision. The packed channel can convey this choice implicitly in its control data, and the system operation will be transparent to the user.
This scenario has the following benefits:
By way of examples:
The technical standard for lossless coding can specify the maximum input and output wordlength, possibly as either 20 or 24 bit. In addition, the standard can be arranged so that choices regarding input wordlength, number of active channels and bandwidth are automatically handled by the coding, without manual intervention by producer or end-user.
Consideration should also be given to using the proposed high-quality packed audio alongside MPEG2 video data and the compressed audio AC-3/MUSICAM.
It seems likely that there are a number of applications that will benefit from different compromises between audio quality, video quality and playing time from those made in the movie versions of the discs.
For example, slowly-moving or graphical video data could accompany high-quality surround recordings. Alternatively, for some types of music, such as opera, MPEG-2 pictures could accompany a high-quality sound track using two channels of packed audio (e.g. Lt and Rt at 48kHz and 20-bit nominal).
Within a common standard, producers could choose between a number of viable high-quality options.
Figures 2 and 3 show outline player architectures. Back to contents
Table 2 estimates the bit budget for HQAD and compares it to CD-DA. Within the constraints of close to 74 minutes' playing time and 9Mb/s (11Mb/s) peak data rate for SD (MMCD), several options exist, including those shown.
We have allocated eight channels to audio, 384kb/s to a parallel lossy-compressed AC-3 or MUSICAM channel, and a 176kb/s channel to a parallel data or subcode channel.
In the table above there are three columns describing data rate in Mb/s. The first, labelled 'Input', is the worst-case rate of data in the uncompressed recording being fed to the mastering process. The second column, labelled 'Ch Peak', gives the expected maximum data rate in the packed channel - i.e. on the disc. The last column shows the average disc data rate and is used to compute playing time.
These figures assume a lossless packing scheme optimised for peak rate reduction. They also assume 48kHz 20-bit or 96 kHz 16-bit signals having relatively moderate compressibility and the compression ratios given in Table 1 .
If one or more of the channels is unmodulated, e.g. if the C channel or the E channel or the Ls and Rs channels are not used, or are modulated with a highly compressible signal such as a bass-effects channel, then the data rates will be smaller than those shown, and playing times will be longer.
Table 2 shows nominal capacity for the proposed HQAD. However, the use of lossless 'packing' gives a very flexible structure to the disc. By specifying a mastering system which can accept:
we effectively construct a carrier in which the producer can make the trade-off between numbers of channels, frequency range, precision and playing time.
The mastering process can embed precision information in the data stream, which has the added benefit that the standardisation process does not need to anticipate all the options - neither is subcode required to control the replay process.
The packing process can effectively provide a continuum of sampling rates between 48kHz and 96kHz, providing the input to the compression process is effectively low-pass filtered - the less information there is at high frequencies, the higher the compression ratio becomes.
In the table above, we illustrate some extremes of this flexible use. The duration is calculated on the basis of MMCD capacity.
The basic data shall be, at the disc producer's discretion:
These options shall be recognisable in co-temporal subcode and/or in a header at the start of the disc. (There are significant advantages to having both.)
Two disc-maker options permitted:
Normally 20 bits would be used at 48kHz and maybe only 16 bits at 96kHz, however, the producer has the option to use more data with material that compresses well, or when some channels are not used. (Obviously, limited by the maximum allowed data rate in the packed channel.) See the examples in Table 3.
It is recommended that the MPEG-compatible packed audio stream should have, in all eight channels, separate fields for 48kHz-sampled and 96kHz-sampled signal components. These fields may be set to zero or to null status if not used, and will then occupy virtually no data rate in packed form. In the basic use, as described in section 0, either one or other of the fields would be null.
Not permitted for packed channels.
By provision of Lt and Rt at high precision.
By provision of a lossy-compressed AC-3, MUSICAM or similar mix.
Several requirements are highlighted:
To allow a simple standard method of downmixing the surrounds (Ls and Rs) into three front speakers, additional support is suggested in the data channels or header of 12.10 as follows:
The header information of section 12.10 should include an indicator of the reproduced sound-pressure level by defining the acoustic gain required in the playback system. A code should be present that indicates 'not known'.
It is possible that players could use this information to 'level' loudness on successive recordings. Back to contents
Discussion-group members, with their relevant affiliations, are as follows.
Tony Griffiths. Technical Director Decca Recording Company. Fellow Audio Engineering Society, Member Acoustic Renaissance for Audio, Chairman Technical Subcommittee National Sound Archive, Member IEE, Member Royal Television Society.
Professor Malcolm Hawksford. University of Essex. Fellow Audio Engineering Society, Fellow Institute of Acoustics, Fellow IEE., Member Acoustic Renaissance for Audio.
David Meares. R&D Manager (Audio & Acoustics), BBC Research & Development Department. Fellow Institute of Acoustics, Member Acoustic Renaissance for Audio, Member IEE.
Bob Stuart. Chairman and Technical Director, Meridian Audio Ltd. Visiting Fellow Essex University, Fellow Audio Engineering Society, Member Acoustical Society of America, Chairman Acoustic Renaissance for Audio, Member XtraBits, Member Technical Subcommittee National Sound Archive, Member IEE and IEEE.
Peter Craven. Consultant. Member Audio Engineering Society, Member XtraBits.
Michael Gerzon. Consultant. Gold Medallist and Fellow Audio Engineering Society, Member XtraBits, Member Acoustic Renaissance for Audio.
Hiro Negishi. Director D&D Centre, Canon Inc. Member Audio Engineering Society, Member Institute of Acoustics, Founder Acoustic Renaissance for Audio.
Francis Rumsey. University of Guildford. Member Audio Engineering Society, Member Acoustic Renaissance for Audio.
Chris Travis. Division Ltd. Member Audio Engineering Society. Back to contents
1 Akune, M., Heddle, R.M. and Akagiri, K. 'Super Bit Mapping: Psychoacoustically Optimized Digital Recording', AES 93rd Convention San Francisco, preprint 3371 (1992)
2 Craven, P.G. and Gerzon, M.A. 'Compatible Improvement of 16-Bit Systems Using Subtractive Dither' AES 93rd Convention San Francisco, preprint 3356 (1992)
3 Dunn, J. 'High Dynamic Range Audio Applications for Digital Signal Processing', 93rd AES Convention, San Francisco, preprint 3434 (Oct. 1992)
4 Fielder, L. 'Dynamic Range Issues in the Modern Digital Audio Environment' Proceedings AES UK Conference 'Managing the Bit Budget', 3-19 (May 1994)
5 Hawksford, M.O. 'Digital Frontiers', HiFi News, 40, no.2, 58-59 and 106 (Feb. 1995)
6 Gerzon, M.A. 'Periphony: With-Height Sound Reproduction' J. Audio Eng. Soc., 21, 2-10 (Jan/Feb 1973)
7 Gerzon, M.A. 'Ambisonics in Multichannel Broadcasting and Video' J. Audio Eng. Soc., 33, 859-871 (Nov 1985)
8 Gerzon, M.A. 'Optimal Reproduction Matrices for Multispeaker Stereo' J. Audio Eng. Soc., 40, 571-589 (July/Aug. 1992)
9 Gerzon, M.A. 'Problems of error-masking in audio data compression systems' AES 90th Convention, preprint 3013 (Feb. 1991)
10 Gerzon, M.A., Craven, P.G., Stuart, J.R. and Wilson, R.J. 'Psychoacoustic Noise Shaped Improvements in CD and Other Linear Digital Media' 94th AES Convention, Berlin preprint 3501 (March 1993)
11 Gerzon, M.A. 'Hierarchical System of Surround Sound Transmission for HDTV', 92nd AES Convention, Vienna, preprint 3339 (March 1992)
12 Gerzon, M.A. and Barton, G.J. 'Ambisonic Decoders for HDTV', 92nd AES Convention, Vienna, preprint 3345 (March 1992)
13 Gilchrist, N.H.C. 'DRACULA: dynamic range control for broadcasting and other applications' 18, 36-47, Tonmeistertagung, Karlsruhe (article in English) (1994).
14 Hoeg, W., Gilchrist, N., Twietmeyer, H. and Juenger, H. 'Dynamic Range Control (DRC) and music/speech control (MSC)' EBU Technical Review, 56- 70, (Autumn 1994)
15 Komamura, M. 'Wideband and wide dynamic-range recording and reproduction of digital audio' AES 96th Convention, Amsterdam, preprint 3844 (1994)
16 Meares, D.J. 'Perceptual Attributes of Multichannel Sound' Proceedings of AES 12th International Conference 'The Perception of Reproduced Sound', 171-179 (June 1993)
17 Meares, D.J. 'High definition sound for high definition television', Proceedings of the AES 9th International Conference 'Television Sound, Today and Tomorrow', Detroit, 187- 215 (Feb. 1991)
18 Meares, D.J. and Stoll, G. 'Sound systems in Digital Television', Technical Module of DVB, Document 1172, EBU (1993)
19 Ohashi, T., Nishina, E., Kawai, N., Fuwamoto, Y. and Imai, H. 'High Frequency Sound Above the Audible Range Affects Brain Electrical Activity and Sound Perception', 91st AES Convention, New York, preprint 3207 (Oct. 1991)
20 Ohashi, T., Nishina, E., Fuwamoto, Y. and Kawai, N. 'On the Mechanism of 'Hypersonic Effect', Proceedings Int'l Computer Music Conference, Tokyo, 432-434 (1993)
21 Oomen, A.W.J., Groenwegen, R.G., van der Waal, R.G. and Veldhuis, R.N.J. 'A Variable-Bit-Rate Buried-data Channel for Compact Disc' J. Audio Eng. Soc., 43, 23-28 (Jan./Feb. 1995)
22 Stuart, J.R. and Wilson, R.J. 'A search for efficient dither for DSP applications' AES 92nd Convention, Vienna, preprint 3334 (1992)
23 Stuart, J.R. 'Noise: Methods for Estimating Detectability and Threshold' J. Audio Eng. Soc., 42, 124-140 (March 1994)
24 Stuart, J.R. and Wilson, R.J. 'Dynamic Range Enhancement Using Noise-shaped Dither Applied to Signals with and without Pre-emphasis' AES 96th Convention, Amsterdam, preprint 3871 (1994)
25 Stuart, J.R. 'Auditory modelling related to the bit budget' Proceedings of AES UK Conference 'Managing the Bit Budget', 167-178 (1994)
26 Stuart, J.R. 'Perceptual issues in Multichannel environments' 97th AES Convention, San Francisco (1994)
27 Vanderkooy, J. and Lipshitz, S.P. 'Digital Dither: Signal Processing with Resolution Far Below the Least Significant Bit' AES 7th International Conference - Audio in Digital Times, Toronto, 87-96 (1989) Back to contents
Dolby, Dolby Surround, Pro Logic and AC-3 are trademarks of Dolby Licensing Inc.
MUSICAM, PASC, MPEG and DVD are registered trademarks.
HDCD is a trademark of Pacific Microsonics.
Ambisonics is a registered trademark of Nimbus Records Ltd.
MMCD is a registered trademark of Philips/Sony licensing.
SD is a registered trademark. Back to contents
Back to contents
Codename SD, TAZ
Disc diameter 120mm
Disc thickness 1.2mm (two 0.6mm bonded)
Memory capacity 5GB per side
Track pitch 0.725 micrometer
N.A Aperture 0.6
Error correction RS-PC
Modulation Not 8-14 (4 to 9)
Picture code MPEG-2 variable rate 1-9Mb/s
Audio code AC-3. Min. 3 language, 4 subtitle channels. Up to 8 language + 32 sub
Other features Multiple aspect ratio, parental lock-out, 2-layer(sided) format
Disc diameter 120mm
Disc thickness 1.2mm
Memory capacity 3.7GB per layer
Track pitch 0.84 micrometer
Error correction RS-PC 'CIRC Plus'
Modulation EFM Plus
Sector size 2048 bytes
Program area 23/58 mm
Picture code MPEG-2 variable rate 1-11.2Mb/s, avge. 3.7Mb/s
Audio code MPEG-2 Layer 2
Other features 2-layer single-sided format
Following the CD decoder we expect to see a stock audio MUSICAM or MPEG audio decoder which outputs two channels to on-board Lt and Rt DACs running at 48kHz. Back to contents
Original version released 12 April 1995.
The embargo is released and text updated.
Back to contents
Back to ARA Home Page