Library of Congress

Note: External links, streaming audio/video, forms and search boxes may not function within this web site archive

Library of Congress Web Archives Collection

This is an archived Web site from the Library of Congress

http://www.m4if.org/resources/profiles/audio.php

Archived: 01/08/2010 at 15:22:36

first First (01/08/2010)    previous Previous  #1 of 6  Next next    Last (09/21/2016) last entry

minimize

Levels for Audio Profiles

Compiled by Fernando Pereira

This document includes the profiles and levels specified in MPEG-4 Audio both in Version 1 as well as in Version 2 [MPEG4-3]. The MPEG-4 Audio standard defines (by October 2001) 23 object types and 8 profiles as explained in Chapter 13.

Four audio profiles have been defined in MPEG-4 Audio Version 1: Main, Scalable, Speech, and Synthetic. Four additional profiles were specified in MPEG-4 Audio Version 2: High Quality Audio, Low Delay, Natural Audio, and Mobile Audio Internetworking (MAUI).

Chapter 13 presents tables showing which audio coding tools includes each audio object type and which audio object types includes each audio profile.

B.1 COMPLEXITY UNITS

Complexity units are defined to give an approximation of the decoder complexity in terms of processing power and RAM usage required for processing MPEG-4 Audio bitstreams in dependence of specific parameters.

The approximated processing power is given in Processor Complexity Units (PCU), specified in integer numbers of MOPS (Millions of Operations Per Second). The approximated RAM usage is given in RAM Complexity Units (RCU), specified in (mostly) integer numbers of kWords (1000 Words). The RCU numbers do not include working buffers that can be shared between different objects and/or channels.

If a level for a profile is specified by the maximum number of complexity units, then a flexible configuration of the decoder handling different types of objects is allowed under the constraint that both values for the total decoding complexity and sampling rate conversion (if needed) do not exceed this limit.

Table B.1 gives complexity estimates for the different audio object types:

Table B.1 Complexity of audio object types
Object type Parameters PCU
(MOPS per channel)
RCU
(kWords per channel)
AAC Main1 fs = 48 kHz 5 5
AAC LC1 fs = 48 kHz 3 3
AAC SSR1 fs = 48 kHz 4 3
AAC LTP1 fs = 48 kHz 4 4
AAC Scalable1, 2 fs = 48 kHz 5 4
TwinVQ1 fs = 24 kHz 2 3
CELP fs = 8 kHz 1 1
CELP fs = 16 kHz 2 1
CELP fs = 8/16 kHz
(bandwidth scalable)
3 1
HVXC fs = 8 kHz 2 1
General MIDI   4 1
TTSI4 - - -
Wavetable Synthesis fs = 22.05 kHz Depends on bitstreams3 Depends on bitstreams3
Main Synthetic   Depends on bitstreams3 Depends on bitstreams3
Algorithmic Synthesis and AudioFX - Depends on bitstreams3 Depends on bitstreams3
Sampling Rate Conversion rf = 2, 3, 4, 6 2 0.5
ER AAC LC1 fs=48kHz 3 3
ER AAC SSR1 fs=48kHz 4 3
ER AAC LTP1 fs=48kHz 4 4
ER AAC Scalable1, 2 fs=48kHz 5 4
ER TwinVQ1 fs=24kHz 2 3
ER BSAC1 fs=48kHz
(input buffer size=26000bits)
4 4
fs=48kHz
(input buffer size=106000bits)
4 8
ER AAC LD1 fs=48kHz 3 2
ER CELP fs=8kHz 2 1
fs=16kHz 3 1
ER HVXC fs=8kHz 2 1
ER HILN2 fs=16kHz, ns=93 15 2
fs=16kHz, ns=47 8 2
ER Parametric5, 6 fs=8kHz, ns=47 4 2
fs - sampling frequency; rf - ratio of sampling rates; ns - maximum number of sinusoids being synthesized

Notes:

  1. PCU proportional to sampling frequency.
  2. Includes core decoder.
  3. See MPEG-4: Part 4 Conformance Testing [MPEG4-4].
  4. The complexity for speech synthesis is not taken into account.
  5. Parametric code in HILN mode; for HVXC mode, see ER HVXC.
  6. PCU depends on fs and ns, see next paragraph.

The computational complexity of HILN depends on the sampling frequency fs and the maximum number of sinusoids ns to be synthesized simultaneously. The value of ns for a frame is the total number of harmonic and individual lines synthesized in that frame, i.e. the number of starting plus continued plus ending lines. For fs in kHz, the PCU in MOPS is calculated as follows:

PCU = (1 + 0.15*ns) * fs / 16

The typical maximum values of ns are 47 for 6 kbit/s and 93 for 16 kbit/s HILN bitstreams.

B.2 DEFINITION OF LEVELS FOR AUDIO PROFILES

The levels for the audio profiles are specified using the complexity units defined above. A number of 0 stages of interleaving for the EP (error protection) tool indicates that error protection is not used for that particular level. The notation used to specify the number of audio channels indicates the number of full bandwidth channels and the number of low-frequency enhancement channels. For example, 5.1 indicates 5 full bandwidth channels and one low-frequency enhancement channel.

B.2.1 Main Profile

Since the Main Audio profile includes all natural and synthetic object types, levels are defined as a combination of the two different types of levels using the two different metrics defined for natural audio tools (computation-based metrics) and synthetic audio tools (macro-oriented metrics).

Four levels are defined for the Main Audio profile as a combination of constraints to the set of natural (corresponding to the object types not included in the Synthetic profile) and synthetic objects (corresponding to the object types in the Synthetic profile) in the scene.

For the set of natural audio objects in the scene, the following constraints are defined using the complexity units (PCU, RCU):

  • Natural Audio 1: PCU < 40, RCU < 20
  • Natural Audio 2: PCU < 80, RCU < 64
  • Natural Audio 3: PCU < 160, RCU < 128
  • Natural Audio 4: PCU < 320, RCU < 256

For the set of synthetic audio objects in the scene, the constraints correspond to the three levels defined in section B.2.4 for the Synthetic profile.

In conclusion, the four levels for the Main Audio profile are defined by the combination of the constraints on the natural and synthetic objects as:

  • Level 1 - Natural Audio 1 + Synthetic Audio Level 1
  • Level 2 - Natural Audio 2+ Synthetic Audio Level 1
  • Level 3 - Natural Audio 3: + Synthetic Audio Level 2
  • Level 4 - Natural Audio 4 + Synthetic Audio Level 3

B.2.2 Scalable Profile

Four levels are defined by configuration; complexity units define the fourth level:

  • Level 1 - Maximum sampling rate of 24 kHz, one mono object (all object types).
  • Level 2 - Maximum sampling rate of 24 kHz, one stereo object or two mono objects (all object types).
  • Level 3 - Maximum sampling rate of 48 kHz, one stereo object or two mono objects (all object types).
  • Level 4 - Maximum sampling rate of 48 kHz, one 5.1 channels object or multiple objects with, at maximum, one integer factor sampling rate conversion for a maximum of two channels. Flexible configuration is allowed with PCU < 30 and RCU < 19.

B.2.3 Speech Profile

Two levels are defined in terms of the number of objects:

  • Level 1 - One speech object.
  • Level 2 - Up to 20 speech objects.

B.2.4 Synthetic Profile

Three levels are defined for this profile as follows:

Level 1
  1. Low processing (exact numbers in MPEG-4 Part 4 Conformance Testing [MPEG4-4])
  2. Only core sample rates may be used
  3. No more than one TTSI object
Level 2
  1. Medium processing (exact numbers in MPEG-4 Part 4: Conformance Testing [MPEG4-4])
  2. Only core sample rates may be used
  3. No more than four TTSI objects
Level 3
  1. High processing (exact numbers in MPEG-4 Part 4: Conformance Testing [MPEG4-4])
  2. No more than twelve TTSI objects

For the case of scalable coding schemes, only the first instantiation of each object type will be counted to determine the number of objects relevant to the level definition and complexity metric. For example, in a scalable coder consisting of a CELP core coder and two enhancement layers implemented by means of AAC LC scalable objects, one CELP object and one AAC LC scalable object and their associated complexity metrics is counted since there is almost no overhead associated with the second (and any further) generic audio enhancement layer.

B.2.5 High Quality Audio Profile

Eight levels are defined for this profile as shown in Table B.2:

Table B.2 Levels for the High Quality Audio profile
Level
Max. channels / object
Max. sampling rate [kHz]
Max. PCU2
Max. RCU2
EP-Tool: Max. redundancy by class FEC1
EP-Tool: Max. nº stages of interleaving per object
1 2 22.05 5 8 0 % 0
2 2 48 10 8 0 % 0
3 5.1 48 25 123 0 % 0
4 5.1 48 100 423 0 % 0
5 2 22.05 5 8 20 % 9
6 2 48 10 8 20 %9
7 5.1 48 25 123 20 % 22
8 5.1 48 100 423 20 % 22

Notes:

  1. This number does not cover FEC (Forward Error Correction) for the EP header, i.e. FEC for the EP header is always permitted. In case of several audio objects the limit is valid independently for each audio object. This value is the maximum redundancy for the audio object, which has the longest frame length, for each profile@level.
  2. Level 5 to 8 do not include RAM and computational complexity for the EP tool.
  3. Sharing of work buffers between multiple objects or channel pair elements is assumed.

B.2.6 Low Delay Profile

Eight levels are defined for this profile as shown in Table B.3:

Table B.3 Levels for the Low Delay audio profile
Level
Max. channels / object
Max. sampling rate [kHz]
Max. PCU2
Max. RCU2
EP-Tool: Max. redundancy by class FEC1
EP-Tool: Max. nº stages of interleaving per object
1 1 8 2 1 0 % 0
2 1 16 3 1 0 % 0
3 1 48 3 2 0 % 0
4 2 48 24 123 0 % 0
5 1 8 2 1 100% 5
6 1 16 3 1 100% 5
7 1 48 3 2 20% 5
8 2 48 24 123 20% 9

Notes:

  1. This number does not cover FEC for the EP header, i. e. FEC for the EP header is always permitted. In case of several audio objects the limit is valid independently for each audio object. This value is the maximum redundancy for the audio object, which has the longest frame length, for each profile@level.
  2. Level 5 to 8 do not include RAM and computational complexity for the EP tool.
  3. Sharing of work buffers between multiple objects or channel pair elements is assumed.

B.2.7 Natural Audio Profile

Four levels are defined for this profile as shown in Table B.4:

Table B.4 Levels for the Natural Audio profile
Level
Max. sampling rate [kHz]
Max. PCU2
EP-Tool: Max. redundancy by class FEC1
EP-Tool: Max. nº stages of interleaving per object
1 48 20 0 % 0
2 96 100 0 % 0
3 48 20 20% 9
4 96 100 20% 22

Notes:

  1. This number does not cover FEC for the EP header, i. e. FEC for the EP header is always permitted. In case of several audio objects, the limit is valid independently for each audio object. This value is the maximum redundancy for the audio object, which has the longest frame length, for each profile@level.
  2. Level 3 and 4 do not include computational complexity for the EP tool.
No RCU limitations are specified for this profile.

B.2.8 Mobile Audio Internetworking Profile

Six levels are defined for this profile as shown in Table B.5:

Table B.5 Levels for the Mobile Audio Internetworking profile
Level
Max. channels / object
Max. sampling rate [kHz]
Max. PCU3
Max. RCU2,3
Max. nº audio objects
EP-Tool: Max. redundancy by class FEC1
EP-Tool: Max. nº stages of interleaving per object
1 1 24 2.5 4 1 0 % 0
2 2 48 10 8 2 0 % 0
3 5.1 48 25 124 - 0 % 0
4 1 24 2.5 4 1 20% 5
5 2 48 10 8 2 20% 9
6 5.1 48 25 124 - 20% 22

Notes:

  1. This number does not cover FEC for the EP header, i. e. FEC for the EP header is always permitted. In case of several audio objects the limit is valid independently for each audio object. This value is the maximum redundancy for the audio object, which has the longest frame length, for each profile@level.
  2. The maximum RCU for one channel in any object in this profile is 4. For the ER BSAC, this limits the input buffer size. The maximum possible input buffer size in bits for this case is given in PCU/RCU, see Table B.1.
  3. Level 4 to 6 do not include RAM and computational complexity for the EP tool.
  4. Sharing of work buffers between multiple objects or channel pair elements are assumed.

References
[MPEG4-3] ISO/IEC 14496-3:2001, "Coding of Audiovisual Objects - Part 3: Audio", 2nd Edition, 2001
[MPEG4-4] ISO/IEC 14496-4:2001, "Coding of Audiovisual Objects - Part 4: Conformance Test-ing", 2nd Edition, 2001

Why Join MPEGIF??


MPEGIF Logo Qualification Program



Download the NEW Whitepaper on MPEG-4

Quick Links

Search this site

Join MPEGIF's public mailing lists

Archive of Weekly News Digests


White Papers

White paper: High Efficiency AAC - World's Best Audio Codec

Intro to MPEG-7

Intro to MPEG-21


Brochures

What is AVC??

What is AAC?

What is (Advanced) Simple Profile?

On interactivity in MPEG-4


About MPEGIF's logo

Guidelines for Members and Non-members


MPEGIF Sponsored Events





©Copyright 2007 MPEG Industry Forum