AProposal for the High-Quality Audio Application of High-Density CD Carriers

Technical Subcommittee
Acoustic Renaissance for Audio

Contents

Summary
Part 1
Background Introduction
Purpose of this document
Extent of discussion and authorship
Requirements of software producers
Requirements of HQAD player makers
Requirements of the audio community
Part 2
Suggestions and Possible Strategies
Number of channels
Sampling frequency
Precision
Channel coding
Part 3
Proposals
The HQAD bit budget
Specification keypoints for High-Quality Audio Disc
Part 4
Supporting information
Who is taking part in this discussion?
Bibliography and references
Glossary
Acknowledgements
Diagrams
Appendix
High-density formats, facts and assumptions
Document Revision History

A Proposal for the High-Quality Audio Application of High-Density CD Carriers

Technical Subcommittee

Acoustic Renaissance for Audio

Confidentiality. Version 1.2, 23 June 1995. This document has been prepared by the Technical Subcommittee of Acoustic Renaissance for Audio. An earlier version (1.1, released 12 April 1995) was circulated to members of Acoustic Renaissance for Audio and to members of the syndicates developing high-density CD and related items. With this revised version, the circulation is extended to other members of the audio community. This circulation amounts to public-domain distribution. In order that the issue be correctly reported, we request prior notification or consultation if any of this material is to be discussed in the Press.


1995 J. R. Stuart for Acoustic Renaissance for Audio


0 Summary

This document has been prepared as a serious proposal to all concerned in the future of high-density CD formats applied to pure audio.

Its authors present here some consensus views and arguments leading to three important outcomes.

The first is a proposal for a new data type for the compression layer in the MPEG system stream encoded on the high-density discs. This data type carries losslessly compressed linear PCM (packed) audio, and is quite distinct from the currently recognised streams carrying lossy-compressed (data-reduced) channels such as MUSICAM, MPEG Audio, PASC or AC-3.

The second is a working proposal for a disc format, which in its essentials is:

Our proposed format allows from two to eight channels of high-quality linear PCM audio, each of up to 24-bit at 48kHz or 20-bit at 96kHz sampling rates or a combination of both.

The proposed format gives the disc producer the flexibility to make trade-offs between the number of active channels, their bandwidth, their precision, and overall playing time. This flexibility is illustrated in section 11.1.

Compatibility for two-speaker stereo is provided by always assigning two of the eight channels to a two-channel mix.

For more advanced applications, the remaining six channels can be used for surround-sound mixes - including those carrying Ambisonics information. In this way record producers can simultaneously issue their preferred mixes for two to six channels.

We recognise that there are several benefits to including the lossy-compressed version of the audio on the disc alongside the high-quality audio versions. In particular, this approach allows minimum changes in the player architecture and gives very good compatibility for budget systems.

Some suggestions are made about using the high-quality packed (losslessly compressed ) audio with MPEG2 video for some applications - such as high-quality music-videos - or with other data such as text and graphics. In essence, we see no reason why the disc format should not permit four types of data simultaneously in the compression layer, namely high-quality audio, lossy-compressed audio, MPEG1/2 video, and associated data.

The final outcome is an expression throughout the document of concerns, items we considered and items we recommend be incorporated into any resulting standard.

A glossary is included in section 15. Back to contents

Part 1: Background

1 Introduction

In January 1995 two syndicates released information about high-density CD formats under development. Both the Sony/Philips and Toshiba/WEA syndicates propose a CD carrier for movie delivery based on MPEG2 variable-rate picture coding and a multichannel/multilingual compressed audio code, using either, MUSICAM (MPEG) or Dolby AC-3.

Both syndicates indicated a willingness to discuss other applications of the disc with interested parties; these applications include broadcast, multimedia and pure audio.

This document has been written to appeal to both syndicates on the subject of a high-quality pure-audio application of high-density CD, and to raise awareness in the audio community of the issues involved. We refer to these formats as HQAD (High Quality Audio Disc) to distinguish them from the current Red Book CD-DA standard. Back to contents

2 Purpose of this document

The purpose of this document is to present the consensus opinion of a sector of the audio industry on suitable format(s) for a high-density CD intended to deliver sound only (HQAD).

The following were important starting points in our thinking:

3 Extent of discussion and authorship

This document is a consensus proposal from the Technical Subcommittee of Acoustic Renaissance for Audio, a free body dedicated to advancing audio quality.

The members of this committee, their advisors and all affiliations are appended in section 13.

In particular, opinions have been obtained from these industry sectors:

4 Requirements of software producers

Indications from the audio-recording community are:

5 Requirements of HQAD player makers

Indications from the player-manufacturing community are:

6 Requirements of the audio community

The audio community is putting considerable effort into pushing the limits of resolution of the current CD-DA channel, using techniques such as noise-shaping, pre-emphasis and buried-data techniques. [10] [2] [15] [21] [1] [24] [25] [22] [27]

An overwhelming requirement of the audio community is to have a carrier that does not use data-reduction methods which essentially throw away those parts of the audio data that are argued to be inaudible, such as MPEG, PASC or AC-3. The HQAD should use linear PCM encoding as a basis.

There is a general consensus that 16-bit 44.1kHz linear PCM is inadequate. The question is, how do we balance pushing the envelope outwards?

It would seem to be reasonable to design the channel according to the capabilities of the human receiver, and an obvious set of parameters would accurately encode the entire auditory range of the listener, namely:

  1. Full spherical recording
  2. Frequency range from near DC to at least 25kHz in air
  3. Dynamic range from inaudible to 120dB spl.

6.1 Precision

Here are some relevant factors.

6.2 Frequency range

There is a requirement to sample at a higher rate than that used in the CD-DA, justified as follows:

In addition, we have to remember that the major investment in recording, production and broadcast is in machinery sampling at 48kHz.

6.3 Full spherical recording

While there may be some disagreement about the desirable frequency or dynamic ranges of an audio carrier, no-one can be in any doubt that there are considerable and immediately obvious benefits to reproducing sound from more than just the frontal horizontal quadrant.

There is an increasing awareness of and demand for the increased realism that results from taking 'stereo' beyond two speakers. There is currently considerable interest in surround-sound techniques, and this is fuelled by the better source in CD-DA and DSP. [26]

6.4 Dynamic Range Control

One of the aims of this proposal is to provide a full-dynamic-range option for those consumers who are in a position to exploit it. However there are circumstances in which, due to high levels of background noise or restricted loudness, the dynamic range may usefully be reduced. Under these circumstances the decoding equipment could control the dynamic range in a manner sympathetic to the programme material.

It has been shown in [13] and [14] that such a control can be provided by analysing the audio during the production process. The full dynamic range is conveyed on the disc, along with control data which can be used to apply dynamic-range reduction when required. This broadcast technique also has advantages in the HQAD application:

Such a process has already been developed and included in the specification of the EU147 Digital Audio Broadcasting (DAB) system. Back to contents


Part 2: Suggestions and Possible Strategies

7 Number of channels

The following DVD audio features are assumed:

7.1 Minimum requirement

The pure audio disc (HQAD) needs to be able to work simply with these speaker layouts. We suggest that five full-bandwidth channels are the minimum required.

7.2 Subwoofer feeds

We strongly recommend that music should not be recorded to layouts with a mono subwoofer, since single-subwoofer replay is very inferior in terms of energy response and spaciousness. The subwoofer feed should be generated by the end-user's equipment in the equivalent of the surround decoder function. This function will be present in equipment capable of decoding Dolby Surround and should be user-selectable according to customer preference or the capabilities of the loudspeakers.

7.3 Ambisonic process

We have determined that it is possible to take an Ambisonic W, X, Y (and Z if necessary) set and to 'decode' these to provide signals for recording onto the HQAD, and therefore, for reproduction via a standardised five-speaker arrangement.

By this method a soundfield recording or mix can be played simply on a standard 5.1 speaker layout.

For more advanced or higher-performance installations, the five feeds can be decoded back to W, X, Y (and Z) for re-operation into other layouts.

7.4 Effects channel

In the current cinema 5.1 systems, five full-bandwidth channels are augmented by a 0.1 (200Hz bandwidth) bass-effects channel. We suggest that this be used as a channel which adds low-frequency power-handling for special contemporary or experimental material. (It is not required for normal acoustic recordings.)

This sixth channel should be defined as a full-bandwidth channel. It could then be flagged in associated data/subcode as an Effects channel, to be used:

In the event that this channel is unused, or carries only low-frequency information, the data rate on the disc will automatically be reduced by the packing method proposed. Therefore, on material not requiring height information, longer playing times or higher resolution can be chosen at the producer's discretion.

7.5 Two-channel compatibility

In all audio systems there is an issue of down-compatibility.

In considering how two-speaker systems will play surround recordings, we see a number of options and pointers. One less desirable option for two-speaker listening is to use the CD-DA version of the recording. We feel that this option penalises the two-speaker listener, who does not gain access to the highest sound quality. In any event, producers will need to provide a Lt, Rt mixdown for the CD-DA release, and we have examined alternative methods for making this available along with the high-quality surround mix.

We have considered the following options for providing a two-speaker feed:

  1. Convey six channels as L, R, C, E, Ls, Rs. Provide DSP downmixing in all players to matrix to two channels in a manner analogous to the AC-3 downmix of 5.1 to 2. This method is not favoured, as it would add signal-processing to all players and disadvantage the two-channel downmix from an artistic point of view.
  2. Arrange a matrix such that six channels conveyed on the disc are not speaker feeds as L, R, C, E, Ls, Rs, but Lt, Rt, C, E, Ls, Rs. The Lt and Rt would be downmixed 5 to 2 before mastering in such a way that a sophisticated decoder could extract the original L and R. This method requires either known matrices or a subcode to describe the matrices. If the latter approach is taken, then there is a problem designing and fitting out new downmixing equipment that is able to write the associated mix data. A further problem with this scheme arises in processing and re-issuing original two-channel mixes for surround. [11] [8]
  3. Convey eight channels as L, R, C, E, Ls, Rs, with Lt and Rt in addition. This method has the considerable advantage of simplicity for recording companies, mastering houses and hardware makers. There is no real requirement for the Lt and Rt channels to be time-aligned with the surround mix, although that may prove to be beneficial to compression ratios.

We are firmly in favour of option 3, so long as packing (lossless compression) is used and the options for full surround are not ignored. Although many sophisticated schemes can be considered for downmixing matrices operating in the players, this strategy will always lead to more working difficulties in production and in replay-hardware design.

However, option 2 remains useful when minimising audio data rate is paramount, as in the case of sending three or five high-quality audio channels with MPEG-2 video.

7.6 Red Book compatibility

It would be very advantageous to evolve to a single inventory disc, by releasing HQAD discs with high-density information on one layer and a conventional CD-DA on the other.

Both the MMCD and SD proposals permit such mixed-mode discs.

In both cases the CD-DA would be placed on the back layer (furthest from the objective) while the HQAD would be on the nearer layer of a two-layer disc. Back to contents

8 Sampling frequency

Although on the face of it 55kHz is the minimum sampling rate necessary to encode all audible sounds, we recommend a specification that is based on multiples of the 48kHz rate found to be standard in professional audio and in AC-3. We see an advantage in permitting the development of quality improvements that highersampling rates would bring, and suggest that 48kHz and 96kHz are the only options required.

Although at first sight 96kHz may appear to be grossly wasteful of data rate, this is not the case if packing (lossless compression) is used. In section 10.2 we point out that the packed data rate for 96kHz sampling may typically be only 30% greater than that for the same material sampled at 48kHz, and can be less if the full 40kHz bandwidth offered by 96kHz sampling is reduced.

Having carefully considered all the factors, and assuming lossless packing to be used, we conclude that it is not necessary to cater for compromise sample rates such as 60kHz, 66.15kHz, 72kHz or 88.2kHz.

We further propose that MPEG packed-audio streams be defined for 48kHz and 96kHz and that these are both always present on the disc. In the majority of cases, one or the other will be sent null data, but there are circumstances in which both may be required and this structure allows for more flexible evolution to 96kHz operation on the part of hardware and software providers. Back to contents

9 Precision

In view of the known dynamic range of human hearing, recording spaces and analogue electronics, we feel comfortable in recommending channels that can obtain the audible equivalent of 21.5-bit precision when noise-shaping or a combination of noise-shaping with pre- and de-emphasis is used.

As a guideline, this would imply a requirement of:

Because pre-emphasis is not helpful to packed audio channels, and because a 14-bit specification is unlikely to find favour, 16 bits at 96kHz is the minimum practical alternative - one which fits in well with existing machinery and interfaces. Back to contents

10 Channel coding

Decisions about numbers of channels, precision and sampling rate converge on a 'bit budget'.

10.1 Signal processing

Linear and psychoacoustically correct coding methods are known which can improve the performance of linear-PCM channels. The two principal methods are noise-shaping and pre- /de-emphasis. [10] [2] [1] [24]

Noise-shaping can be made open-choice at the discretion of the recording producer.

Pre- /de-emphasis requires standardisation. Although there are better choices, the standard CD-DA 50/15s will have to be provided by any player capable of playing CD-DA as well as HQAD, and so should not be ruled out as an option.

10.2 Lossless coding

We strongly recommend that the high-quality audio channels be losslessly coded (packed). Signal processing has advanced to the state where the data-reduction benefits of such coding are too good to pass by. Unlike perceptual or lossy data reduction, lossless coding does not alter the final decoded transmitted signal in any way, but merely 'packs' the audio data more efficiently into a smaller data rate.

Existing lossless audio data compression systems are optimised for reducing average data rate, but not for reduction of peak data rate or for optimum results at high sampling rates such as 96kHz. We have determined simple-to-decode methods optimised for these latter requirements.

The process of packing PCM becomes more efficient as sampling rate is increased. For example, packed 96kHz audio does not double the data rate of packed 48kHz as you would expect; the increase is more like 30%.

Packing offers the opportunity to make a much better product. It allows us to convey more precision on more channels, but also gives a lot of open-ended flexibility to the user - as can be seen in some of the examples quoted in Table 3.

10.3 Lossless coding guidelines

We are aware of relatively simple-to-decode packing and unpacking techniques that should allow the lossless data compression shown in the table below for five or more associated channels. We anticipate that higher compression rates can be obtained with development over a relatively short period.(Compression is shown as the saving in bits per sample per channel).


Table 1           Data-rate reduction: bits/sample/channel

Sampling kHz             Peak                 Average

48                         0                     6

96                         5                     8



Different musical material compresses by different amounts with lossless packing, with material having narrow dynamic range and high treble energies compressing less well. Lossless coding algorithms can be chosen with less compressible material in mind, giving an overall improvement in degree of data-rate reduction. The degree of data-rate reduction will be greater for highly compressible audio material such as most classical music - for which absolute disc duration is most critical.

10.4 Pre-emphasis and lossless coding

We have determined that, for packed channels, the use of pre- and de-emphasis gives no advantage in coded-data rate for a given noise performance. Therefore, pre- and de-emphasis are only of benefit when used with linear PCM channels that are not losslessly coded, and should not be used with losslessly coded channels.

10.5 Noise-shaping and lossless coding

Psychoacoustic noise-shaping of the PCM audio channels may be used, along with lossless coding, to create a packed channel with perceptual improvements of about 3 bits at 48kHz and 5.5 bits at 96kHz.

For 96kHz recordings some noise-shaping is encouraged, in order to optimise the subjective dynamic range and overall data rate.

10.6 Lossless coding for flexible wordlength

It is possible to design the lossless-coding specification in such a way that at the mastering stage the record producer can make a personal trade-off between playing time, frequency range, number of active channels and precision. The packed channel can convey this choice implicitly in its control data, and the system operation will be transparent to the user.

This scenario has the following benefits:

By way of examples:

  1. playing time or precision may be extended by pre-filtering information above 30kHz
  2. playing time or precision may be extended by only supplying a 2, 3 or 4-channel mix.

The technical standard for lossless coding can specify the maximum input and output wordlength, possibly as either 20 or 24 bit. In addition, the standard can be arranged so that choices regarding input wordlength, number of active channels and bandwidth are automatically handled by the coding, without manual intervention by producer or end-user.

10.7 High-quality audio with MPEG-2 video

Consideration should also be given to using the proposed high-quality packed audio alongside MPEG2 video data and the compressed audio AC-3/MUSICAM.

It seems likely that there are a number of applications that will benefit from different compromises between audio quality, video quality and playing time from those made in the movie versions of the discs.

For example, slowly-moving or graphical video data could accompany high-quality surround recordings. Alternatively, for some types of music, such as opera, MPEG-2 pictures could accompany a high-quality sound track using two channels of packed audio (e.g. Lt and Rt at 48kHz and 20-bit nominal).

Within a common standard, producers could choose between a number of viable high-quality options.

10.8 HQAD player concept

Figures 2 and 3 show outline player architectures. Back to contents

Part 3: Proposals

11 The HQAD bit budget

Table 2 estimates the bit budget for HQAD and compares it to CD-DA. Within the constraints of close to 74 minutes' playing time and 9Mb/s (11Mb/s) peak data rate for SD (MMCD), several options exist, including those shown.

We have allocated eight channels to audio, 384kb/s to a parallel lossy-compressed AC-3 or MUSICAM channel, and a 176kb/s channel to a parallel data or subcode channel.

In the table above there are three columns describing data rate in Mb/s. The first, labelled 'Input', is the worst-case rate of data in the uncompressed recording being fed to the mastering process. The second column, labelled 'Ch Peak', gives the expected maximum data rate in the packed channel - i.e. on the disc. The last column shows the average disc data rate and is used to compute playing time.

These figures assume a lossless packing scheme optimised for peak rate reduction. They also assume 48kHz 20-bit or 96 kHz 16-bit signals having relatively moderate compressibility and the compression ratios given in Table 1 .

If one or more of the channels is unmodulated, e.g. if the C channel or the E channel or the Ls and Rs channels are not used, or are modulated with a highly compressible signal such as a bass-effects channel, then the data rates will be smaller than those shown, and playing times will be longer.

11.1 Examples illustrating the flexible disc capacity

Table 2 shows nominal capacity for the proposed HQAD. However, the use of lossless 'packing' gives a very flexible structure to the disc. By specifying a mastering system which can accept:

we effectively construct a carrier in which the producer can make the trade-off between numbers of channels, frequency range, precision and playing time.

The mastering process can embed precision information in the data stream, which has the added benefit that the standardisation process does not need to anticipate all the options - neither is subcode required to control the replay process.

The packing process can effectively provide a continuum of sampling rates between 48kHz and 96kHz, providing the input to the compression process is effectively low-pass filtered - the less information there is at high frequencies, the higher the compression ratio becomes.


In the table above, we illustrate some extremes of this flexible use. The duration is calculated on the basis of MMCD capacity.

12 Specification keypoints for High-Quality Audio Disc

12.1 Mandatory

12.2 Channels

  1. Up to eight full-bandwidth (i.e. DC to half-Nyquist) channels of high-quality sound.

The basic data shall be, at the disc producer's discretion:

  1. Two speaker feeds Lt, Rt, of any origin but including a mixdown from the surround, and either:

These options shall be recognisable in co-temporal subcode and/or in a header at the start of the disc. (There are significant advantages to having both.)

12.3 Channel coding

  1. Linear PCM
  2. Lossless compression (packing) applied to all eight channels

12.4 Precision and sampling frequency

Two disc-maker options permitted:

  1. 48kHz, 24-bit maximum, or
  2. 96kHz, 24-bit maximum

Normally 20 bits would be used at 48kHz and maybe only 16 bits at 96kHz, however, the producer has the option to use more data with material that compresses well, or when some channels are not used. (Obviously, limited by the maximum allowed data rate in the packed channel.) See the examples in Table 3.

12.5 MPEG type audio data stream

It is recommended that the MPEG-compatible packed audio stream should have, in all eight channels, separate fields for 48kHz-sampled and 96kHz-sampled signal components. These fields may be set to zero or to null status if not used, and will then occupy virtually no data rate in packed form. In the basic use, as described in section 0, either one or other of the fields would be null.

12.6 Pre- and de-emphasis

Not permitted for packed channels.

12.7 Two-channel compatibility

By provision of Lt and Rt at high precision.

12.8 Budget player compatibility

By provision of a lossy-compressed AC-3, MUSICAM or similar mix.

12.9 Digital outputs

Several requirements are highlighted:

12.10 Additional data channels

12.11 3-speaker support

To allow a simple standard method of downmixing the surrounds (Ls and Rs) into three front speakers, additional support is suggested in the data channels or header of 12.10 as follows:

12.12 Absolute sound level datum

The header information of section 12.10 should include an indicator of the reproduced sound-pressure level by defining the acoustic gain required in the playback system. A code should be present that indicates 'not known'.

It is possible that players could use this information to 'level' loudness on successive recordings. Back to contents

Part 4: Supporting information

13 Who is taking part in this discussion?

Discussion-group members, with their relevant affiliations, are as follows.

13.1 Technical Subcommittee, Acoustic Renaissance for Audio

Tony Griffiths. Technical Director Decca Recording Company. Fellow Audio Engineering Society, Member Acoustic Renaissance for Audio, Chairman Technical Subcommittee National Sound Archive, Member IEE, Member Royal Television Society.

Professor Malcolm Hawksford. University of Essex. Fellow Audio Engineering Society, Fellow Institute of Acoustics, Fellow IEE., Member Acoustic Renaissance for Audio.

David Meares. R&D Manager (Audio & Acoustics), BBC Research & Development Department. Fellow Institute of Acoustics, Member Acoustic Renaissance for Audio, Member IEE.

Bob Stuart. Chairman and Technical Director, Meridian Audio Ltd. Visiting Fellow Essex University, Fellow Audio Engineering Society, Member Acoustical Society of America, Chairman Acoustic Renaissance for Audio, Member XtraBits, Member Technical Subcommittee National Sound Archive, Member IEE and IEEE.

13.2 Advisors

Peter Craven. Consultant. Member Audio Engineering Society, Member XtraBits.

Michael Gerzon. Consultant. Gold Medallist and Fellow Audio Engineering Society, Member XtraBits, Member Acoustic Renaissance for Audio.

Hiro Negishi. Director D&D Centre, Canon Inc. Member Audio Engineering Society, Member Institute of Acoustics, Founder Acoustic Renaissance for Audio.

Francis Rumsey. University of Guildford. Member Audio Engineering Society, Member Acoustic Renaissance for Audio.

Chris Travis. Division Ltd. Member Audio Engineering Society. Back to contents

14 Bibliography and references

1 Akune, M., Heddle, R.M. and Akagiri, K. 'Super Bit Mapping: Psychoacoustically Optimized Digital Recording', AES 93rd Convention San Francisco, preprint 3371 (1992)

2 Craven, P.G. and Gerzon, M.A. 'Compatible Improvement of 16-Bit Systems Using Subtractive Dither' AES 93rd Convention San Francisco, preprint 3356 (1992)

3 Dunn, J. 'High Dynamic Range Audio Applications for Digital Signal Processing', 93rd AES Convention, San Francisco, preprint 3434 (Oct. 1992)

4 Fielder, L. 'Dynamic Range Issues in the Modern Digital Audio Environment' Proceedings AES UK Conference 'Managing the Bit Budget', 3-19 (May 1994)

5 Hawksford, M.O. 'Digital Frontiers', HiFi News, 40, no.2, 58-59 and 106 (Feb. 1995)

6 Gerzon, M.A. 'Periphony: With-Height Sound Reproduction' J. Audio Eng. Soc., 21, 2-10 (Jan/Feb 1973)

7 Gerzon, M.A. 'Ambisonics in Multichannel Broadcasting and Video' J. Audio Eng. Soc., 33, 859-871 (Nov 1985)

8 Gerzon, M.A. 'Optimal Reproduction Matrices for Multispeaker Stereo' J. Audio Eng. Soc., 40, 571-589 (July/Aug. 1992)

9 Gerzon, M.A. 'Problems of error-masking in audio data compression systems' AES 90th Convention, preprint 3013 (Feb. 1991)

10 Gerzon, M.A., Craven, P.G., Stuart, J.R. and Wilson, R.J. 'Psychoacoustic Noise Shaped Improvements in CD and Other Linear Digital Media' 94th AES Convention, Berlin preprint 3501 (March 1993)

11 Gerzon, M.A. 'Hierarchical System of Surround Sound Transmission for HDTV', 92nd AES Convention, Vienna, preprint 3339 (March 1992)

12 Gerzon, M.A. and Barton, G.J. 'Ambisonic Decoders for HDTV', 92nd AES Convention, Vienna, preprint 3345 (March 1992)

13 Gilchrist, N.H.C. 'DRACULA: dynamic range control for broadcasting and other applications' 18, 36-47, Tonmeistertagung, Karlsruhe (article in English) (1994).

14 Hoeg, W., Gilchrist, N., Twietmeyer, H. and Juenger, H. 'Dynamic Range Control (DRC) and music/speech control (MSC)' EBU Technical Review, 56- 70, (Autumn 1994)

15 Komamura, M. 'Wideband and wide dynamic-range recording and reproduction of digital audio' AES 96th Convention, Amsterdam, preprint 3844 (1994)

16 Meares, D.J. 'Perceptual Attributes of Multichannel Sound' Proceedings of AES 12th International Conference 'The Perception of Reproduced Sound', 171-179 (June 1993)

17 Meares, D.J. 'High definition sound for high definition television', Proceedings of the AES 9th International Conference 'Television Sound, Today and Tomorrow', Detroit, 187- 215 (Feb. 1991)

18 Meares, D.J. and Stoll, G. 'Sound systems in Digital Television', Technical Module of DVB, Document 1172, EBU (1993)

19 Ohashi, T., Nishina, E., Kawai, N., Fuwamoto, Y. and Imai, H. 'High Frequency Sound Above the Audible Range Affects Brain Electrical Activity and Sound Perception', 91st AES Convention, New York, preprint 3207 (Oct. 1991)

20 Ohashi, T., Nishina, E., Fuwamoto, Y. and Kawai, N. 'On the Mechanism of 'Hypersonic Effect', Proceedings Int'l Computer Music Conference, Tokyo, 432-434 (1993)

21 Oomen, A.W.J., Groenwegen, R.G., van der Waal, R.G. and Veldhuis, R.N.J. 'A Variable-Bit-Rate Buried-data Channel for Compact Disc' J. Audio Eng. Soc., 43, 23-28 (Jan./Feb. 1995)

22 Stuart, J.R. and Wilson, R.J. 'A search for efficient dither for DSP applications' AES 92nd Convention, Vienna, preprint 3334 (1992)

23 Stuart, J.R. 'Noise: Methods for Estimating Detectability and Threshold' J. Audio Eng. Soc., 42, 124-140 (March 1994)

24 Stuart, J.R. and Wilson, R.J. 'Dynamic Range Enhancement Using Noise-shaped Dither Applied to Signals with and without Pre-emphasis' AES 96th Convention, Amsterdam, preprint 3871 (1994)

25 Stuart, J.R. 'Auditory modelling related to the bit budget' Proceedings of AES UK Conference 'Managing the Bit Budget', 167-178 (1994)

26 Stuart, J.R. 'Perceptual issues in Multichannel environments' 97th AES Convention, San Francisco (1994)

27 Vanderkooy, J. and Lipshitz, S.P. 'Digital Dither: Signal Processing with Resolution Far Below the Least Significant Bit' AES 7th International Conference - Audio in Digital Times, Toronto, 87-96 (1989) Back to contents

15 Glossary

AC-3
A system for perceptually encoding at a reduced data rate both two-channel stereo and 5.1-channel surround sound. AC-3 has been developed by Dolby Laboratories; it is used in many motion-picture films and LaserDisc releases, and has been selected for television broadcast in the USA.
Ambisonics
A method of recording and playing back directional sound over all horizontal directions, or the full sphere of directions including height, based on transmitting directional components of the sound field rather than loudspeaker feeds, and of reproducing the sound field by deriving signals psychoacoustically optimised for the user's specific loudspeaker layout.
CD-DA
Red Book CD for Digital Audio.
DVD Digital Video Disc.
A collective term for high-density CDs carrying MPEG-2 encoded variable-rate video with lossy-compressed audio.
DSP
Digital signal processing.
HQAD High-Quality Audio Disc.
New format high-density CD applied to audio, as proposed in this document.
Lossless compression
A process by which the data of a PCM audio signal can be more efficiently packed into a channel. Although lossless compression of audio does not work in the same way, users of computers will be aware of algorithms such as ZIP and LZW that allow more efficient use of disc storage. Lossless compression of audio has the same effect: less space is used on the disc, which has the important effect of reducing the data rate. Unlike lossy compression, lossless compression systems return the input data exactly from a decoder. For clarity, in this document losslessly compressed PCM is referred to as 'packed audio'.
Lossy compression
A process by which an audio signal is examined from a human-psychoacoustic viewpoint. An algorithm attempts to estimate and remove the inaudible components of the signal. The remaining 'audible' component is efficiently coded in the output channel. Lossy compression schemes include MPEG audio, PASC, AC-3 and MUSICAM. The data recovered from a matched decoder is not identical to the original input, although it may sound very similar.
MMCD
The trademark for Philips/Sony high-density disc described in the 'Gold Book'.
MUSICAM
A system for perceptually encoding at a reduced data rate both two-channel stereo and multichannel surround sound using the MUSICAM Surround version. The two-channel version of MUSICAM forms layer 2 of MPEG-1.
MPEG
'Motion Picture Experts Group' refers to standards for perceptual coding at a reduced data rate of video and sound signals. MPEG-1 and MPEG-2 are respectively video-coding standards for medium and high-quality use, and MPEG-1 layers 1, 2 and 3 are systems for perceptually encoding two-channel stereo sound.
Packed audio
The data resulting when a linear PCM audio stream is losslessly compressed.
Packing
The process of losslessly compressing linear PCM audio.
PASC
A system for perceptually coding two-channel stereo sound at a reduced data rate, developed by Philips and used in the Digital Compact Cassette. PASC is related closely to MPEG layer 1.
PCM
Pulse code modulation. A method of coding whereby a signal is represented by a discrete-sampled series.
SD
The code abbreviation for the Toshiba/WEA 'Super-Density' disc.
Unpacking
The process of decoding losslessly compressed (packed) audio back into the original linear PCM full-rate data. Back to contents

16 Acknowledgements

Dolby, Dolby Surround, Pro Logic and AC-3 are trademarks of Dolby Licensing Inc.

MUSICAM, PASC, MPEG and DVD are registered trademarks.

HDCD is a trademark of Pacific Microsonics.

Ambisonics is a registered trademark of Nimbus Records Ltd.

MMCD is a registered trademark of Philips/Sony licensing.

SD is a registered trademark. Back to contents

17 Diagrams



Back to contents

18 Appendix: High-density formats, facts and assumptions

18.1 Toshiba/WEA

Codename SD, TAZ

Disc diameter 120mm

Disc thickness 1.2mm (two 0.6mm bonded)

Memory capacity 5GB per side

Track pitch 0.725 micrometer

Laser 650nm

N.A Aperture 0.6

Error correction RS-PC

Modulation Not 8-14 (4 to 9)

Play time:

  1. Movies 135 min/layer @ 4.94Mb/s average
  2. Broadcast 74min/layer @ 9Mb/s average

Picture code MPEG-2 variable rate 1-9Mb/s

Audio code AC-3. Min. 3 language, 4 subtitle channels. Up to 8 language + 32 sub

Other features Multiple aspect ratio, parental lock-out, 2-layer(sided) format

18.2 Sony/Philips

Codename MMCD

Disc diameter 120mm

Disc thickness 1.2mm

Memory capacity 3.7GB per layer

Track pitch 0.84 micrometer

Laser 635nm

N.A 0.52

Error correction RS-PC 'CIRC Plus'

Modulation EFM Plus

Sector size 2048 bytes

Program area 23/58 mm

Play time:

  1. Movies 135 min
  2. Broadcast Not known

Picture code MPEG-2 variable rate 1-11.2Mb/s, avge. 3.7Mb/s

Audio code MPEG-2 Layer 2

Other features 2-layer single-sided format

18.3 Conjecture on player architecture

18.4 Assumptions on Toshiba/WEA SD

18.5 Assumptions on Philips/Sony MMCD

Following the CD decoder we expect to see a stock audio MUSICAM or MPEG audio decoder which outputs two channels to on-board Lt and Rt DACs running at 48kHz. Back to contents

19 Document revision history

19.1 Version 1.1

Original version released 12 April 1995.

19.2 Version 1.2

The embargo is released and text updated.


Back to contents

Back to ARA Home Page