The State of the Art in Procedural Audio

Dimitris Menexopoulos1, Pedro D. Pestana2, and Joshua D. Reiss1

1 Centre for Digital Music, Queen Mary University of London
2 Science and Technology Department, The Open University of Portugal (UAb)

Paper Code Video

Abstract


Procedural audio may be defined as real-time sound generation according to programmatic rules and live input. It is often considered a subset of sound synthesis and is especially applicable to nonlinear media, such as video games, virtual reality experiences and interactive audiovisual installations. However, there is resistance to widespread adoption of procedural audio because there is little awareness of the state of the art, including the diversity of sounds that may be generated, the controllability of procedural audio models and the quality of the sounds that it produces. We address all of these aspects in this review paper, while attempting a large scale categorisation of sounds that have been approached through procedural audio techniques. The role of recent advancements in neural audio synthesis, its current implementations, as well as potential future applications in the field are also discussed.

Figures


Fig. 1: Flowchart highlighting the study selection process and the number of articles selected at each step.

Fig. 2: Number of articles published since 1993 including a practical algorithm in procedural audio, broken down into our main taxonomies.

Fig. 3: An analysis of how the number of citations for each paper is spread across our main taxonomy, highlighting what seems to be considered more relevant. Closer to the left one can see the distribution of taxonomic themes along articles that have been cited fewer times. To the right, articles that have been cited over 100 times are mainly distributed between two taxons (Machine and Contact Sounds).

Fig. 4: Relationship between the design strategies used to build the model (y-axis) and the synthesis type used (x-axis) for the articles in Table 1. Several articles will make use of more than one methodology and synthesis type, therefore this heatmap does not have a one-to-one relationship with the articles (i.e. there are many more items here than the total number of articles analyzed).

Fig. 5: Relationship between the design strategies used to build the model (x-axis) and the sound type characterization (y-axis) for the articles in Table 1.

Tables


Table 1: A summary of the state of the art in the field of procedural audio. Numbers in each cell correspond to reviewed references in which a sound class is connected with a specific synthesis technique. n=x unique occurrences in the bibliography are provided for both the broad and narrow taxonomies, as well as for the synthesis techniques.

Table 2: State of the art in procedural audio evaluation. The references from 1 are classified by evaluation type. The last row gives papers from either subjective evaluation category (comparative or non-comparative) that also contain objective evaluation. The last column gives the average publication year of all papers in a category.

Citation


Published at the Journal of The Audio Engineering Society (2023 December Issue - Volume 71 Number 12).


    @article{menexopoulos2023the,
        title = {The State of the Art in Procedural Audio},
        author = {Menexopoulos, Dimitris and Pestana, Pedro and Reiss, Joshua},
        journal = {J. Audio Eng. Soc},
        volume = {71},
        number = {12},
        pages = {826--848},
        year = {2023},
        url = {http://www.aes.org/e-lib/browse.cfm?elib=22346}}
   	 

Supported by the EPSRC Centre for Doctoral Training in Intelligent Games and Game Intelligence (iGGi) [EP/S022325/1].