|
|
|
|
![]() |
|
|
Levels for Audio ProfilesThis document includes the profiles and levels specified in MPEG-4 Audio both in Version 1 as well as in Version 2 [MPEG4-3]. The MPEG-4 Audio standard defines (by October 2001) 23 object types and 8 profiles as explained in Chapter 13. Four audio profiles have been defined in MPEG-4 Audio Version 1: Main, Scalable, Speech, and Synthetic. Four additional profiles were specified in MPEG-4 Audio Version 2: High Quality Audio, Low Delay, Natural Audio, and Mobile Audio Internetworking (MAUI). Chapter 13 presents tables showing which audio coding tools includes each audio object type and which audio object types includes each audio profile. B.1 COMPLEXITY UNITSComplexity units are defined to give an approximation of the decoder complexity in terms of processing power and RAM usage required for processing MPEG-4 Audio bitstreams in dependence of specific parameters. The approximated processing power is given in Processor Complexity Units (PCU), specified in integer numbers of MOPS (Millions of Operations Per Second). The approximated RAM usage is given in RAM Complexity Units (RCU), specified in (mostly) integer numbers of kWords (1000 Words). The RCU numbers do not include working buffers that can be shared between different objects and/or channels. If a level for a profile is specified by the maximum number of complexity units, then a flexible configuration of the decoder handling different types of objects is allowed under the constraint that both values for the total decoding complexity and sampling rate conversion (if needed) do not exceed this limit. Table B.1 gives complexity estimates for the different audio object types: Table B.1 Complexity of audio object types
Notes:
The computational complexity of HILN depends on the sampling frequency fs and the maximum number of sinusoids ns to be synthesized simultaneously. The value of ns for a frame is the total number of harmonic and individual lines synthesized in that frame, i.e. the number of starting plus continued plus ending lines. For fs in kHz, the PCU in MOPS is calculated as follows: PCU = (1 + 0.15*ns) * fs / 16 The typical maximum values of ns are 47 for 6 kbit/s and 93 for 16 kbit/s HILN bitstreams. B.2 DEFINITION OF LEVELS FOR AUDIO PROFILESThe levels for the audio profiles are specified using the complexity units defined above. A number of 0 stages of interleaving for the EP (error protection) tool indicates that error protection is not used for that particular level. The notation used to specify the number of audio channels indicates the number of full bandwidth channels and the number of low-frequency enhancement channels. For example, 5.1 indicates 5 full bandwidth channels and one low-frequency enhancement channel. B.2.1 Main ProfileSince the Main Audio profile includes all natural and synthetic object types, levels are defined as a combination of the two different types of levels using the two different metrics defined for natural audio tools (computation-based metrics) and synthetic audio tools (macro-oriented metrics).Four levels are defined for the Main Audio profile as a combination of constraints to the set of natural (corresponding to the object types not included in the Synthetic profile) and synthetic objects (corresponding to the object types in the Synthetic profile) in the scene. For the set of natural audio objects in the scene, the following constraints are defined using the complexity units (PCU, RCU):
For the set of synthetic audio objects in the scene, the constraints correspond to the three levels defined in section B.2.4 for the Synthetic profile. In conclusion, the four levels for the Main Audio profile are defined by the combination of the constraints on the natural and synthetic objects as:
B.2.2 Scalable ProfileFour levels are defined by configuration; complexity units define the fourth level:
B.2.3 Speech ProfileTwo levels are defined in terms of the number of objects:
B.2.4 Synthetic ProfileThree levels are defined for this profile as follows: Level 1
For the case of scalable coding schemes, only the first instantiation of each object type will be counted to determine the number of objects relevant to the level definition and complexity metric. For example, in a scalable coder consisting of a CELP core coder and two enhancement layers implemented by means of AAC LC scalable objects, one CELP object and one AAC LC scalable object and their associated complexity metrics is counted since there is almost no overhead associated with the second (and any further) generic audio enhancement layer. B.2.5 High Quality Audio ProfileEight levels are defined for this profile as shown in Table B.2: Table B.2 Levels for the High Quality Audio profile
Notes:
B.2.6 Low Delay ProfileEight levels are defined for this profile as shown in Table B.3: Table B.3 Levels for the Low Delay audio profile
Notes:
B.2.7 Natural Audio ProfileFour levels are defined for this profile as shown in Table B.4: Table B.4 Levels for the Natural Audio profile
Notes:
B.2.8 Mobile Audio Internetworking ProfileSix levels are defined for this profile as shown in Table B.5: Table B.5 Levels for the Mobile Audio Internetworking profile
Notes:
References |
Quick Links
Join MPEGIF's public mailing lists
Archive of Weekly News Digests
White Papers
White paper: High Efficiency AAC - World's Best Audio Codec
Brochures
What is (Advanced) Simple Profile?
About MPEGIF's logo
Guidelines for Members and Non-members
MPEGIF Sponsored Events |