PRiSM SampleRNN: An Unconditional End-to-End Neural Audio Generation Model, for TensorFlow 2.

Christopher Melen (Developer), Sam Salem (Producer), Emily Howard (Producer), David De Roure (Consultant / Advisor), Marcus du Sautoy (Consultant / Advisor)

    Research output: Non-textual formSoftware

    Abstract

    PRiSM SampleRNN is a project including development of the code prism-samplernn, an open-access neural audio synthesis software tool released on GitHub in June 2020 as part of PRiSM Future Music #2.

    The software is PRiSM’s first major contribution to the field of Machine Learning. Built upon the SampleRNN model which addresses unconditional audio generation in the raw acoustic domain, it is able to generate new audio outputs by ‘learning’ the characteristics of any existing corpus of sound or music. Artists are invited to collate their own datasets (or rather, sound libraries), curated for their unique creative contexts. Changing how these datasets are organised, as well as the parameters of the algorithm during the generation processes significantly change the resultant output – making these choices part of the creative process. The audio generated can be used directly in a composition or to inform notated work to be played by an instrumentalist.

    The software was developed in response to work by Dr Sam Salem (PRiSM Senior Lecturer in Composition). For his piece Midlands (2019), Salem made field recordings whilst walking 120km of the River Derwent. These materials were used to synthesise new sounds with Wavenet, one of the earlier deep-learning algorithms for audio generation, but the speed of the workflow made it difficult to explore the full possibilities of the technique (documented in the PRiSM blog post A Psychogeography of Latent Space). An alternative, SampleRNN, represented an opportunity to generate sound more quickly but the code was broken. Dr Christopher Melen, PRiSM Research Software Engineer (2019-2023), undertook a complete reimplementation of the original SampleRNN code¹, fixing broken dependencies and upgrading it to work with the latest versions of Python and Tensorflow. It constitutes a completely new implementation of the SampleRNN architecture (documented in the PRiSM blog post A Short History of Neural Synthesis).

    The release of PRiSM SampleRNN was accompanied by a model developed by Salem using data from the RNCM’s world-class archive of choral music. Since then, it has developed into one of PRiSM’s major projects, bringing together practitioners across a diverse range of disciplines and fields of study, and illustrating PRiSM’s core research concerns of collaborative and interdisciplinary effort. It is currently being used in projects by composers, musicians and technologists across the globe. A free and open-source project, the latest release can be downloaded from the software development platform GitHub. The software is readily available (open source), and compatible with most main-stream computational systems (including devices with Apple’s Silicon chips), and the technique has been made explicit through a number of online resources and performances demonstrating this creative application of AI; informing technology researchers, other arts practitioners, educators and the general public.
    Original languageEnglish
    Media of outputOnline
    Publication statusPublished - 2020
    EventRNCM PRiSM Future Music #2 - Online
    Duration: 15 Jun 202015 Jun 2020
    https://www.rncm.ac.uk/research/research-activity/research-centres-rncm/prism/prism-news-and-events/future-music-2/

    Fingerprint

    Dive into the research topics of 'PRiSM SampleRNN: An Unconditional End-to-End Neural Audio Generation Model, for TensorFlow 2.'. Together they form a unique fingerprint.
    • Learning to Learn: A Reflexive Case Study of PRiSM SampleRNN

      Ma, B., Sargen, E., De Roure, D. & Howard, E., 2024, International Conference on AI and Musical Creativity (AIMC).

      Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

      Open Access
      File
    • DEVIANCE

      Howard, E. (Composer), Ma, B. (Sound Designer), Kanga, Z. (Performer) & Gustafsson, E. (Video / Graphic Designer), 2023

      Research output: Non-textual formDigital or Visual Products

    • DEVIANCE

      Howard, E. (Composer), Ma, B. (Sound Designer), Gustafsson, E. (Video / Graphic Designer) & Neuhaus, C. (Other), 2023

      Research output: Non-textual formComposition

    Cite this