In this page, you can find audio samples of performances synthesized by our DDSP-based piano synthesizer.

Overview

Model architecture

Full architecture of the proposed piano sound synthesizer. The blue boxes represent the trained modules for the control of the synthesizers. The differentiable synthesizers from DDSP are represented by yellow boxes (Additive, Filtered Noise and Reverberation).

Synthesis examples

The MIDI performances are taken from the test set of the MAESTRO dataset.

Benchmark systems include:

Original recording: the real audio recording of the performance.
Fluidsynth: a concatenative sampling-based algorithm.
Pianoteq: a physical-modeling-based synthesis software.
TTS: a neural-based synthesis model inpired by text-to-speech techniques.

Different configurations of our DDSP-based model are also compared:

Default: the default configuration illustrated previously.
No Fine-tuning: without applying the fine-tuned parameters of the detuner and inharmonicity sub-models.
Deep-Inharmonicity: replacing the explicit inharmonicity sub-model by a deep neural network.
Reduced-Context: remove the polyphonic information from the context computation.
2009-only: model trained solely on the 2009 piano model.

A. Scriabin - Etude, Op.42 No.4

Piano year: 2009

Model	Audio sample
Original recording
Fluidsynth
Pianoteq
TTS
Default
No Fine-tuning
Deep-Inharmonicity
Reduced-Context
2009-only

C. Debussy - Etude, No.7 “Study in Chromatic Steps”

Piano year: 2004

Model	Audio sample
Original recording
Fluidsynth
Pianoteq
TTS
Default
No Fine-tuning
Deep-Inharmonicity
Reduced-Context
2009-only

D. Scarlatti - Sonata in D Major, K.118

Piano year: 2014

Model	Audio sample
Original recording
Fluidsynth
Pianoteq
TTS
Default
No Fine-tuning
Deep-Inharmonicity
Reduced-Context
2009-only

F. Mendelssohn - Fantasy in F-sharp minor, Op.28

Piano year: 2017

Model	Audio sample
Original recording
Fluidsynth
Pianoteq
TTS
Default
No Fine-tuning
Deep-Inharmonicity
Reduced-Context
2009-only

F. Liszt - Hungarian Rhapsody No.9 in E-Flat Major, S.244

Piano year: 2015

Model	Audio sample
Original recording
Fluidsynth
Pianoteq
TTS
Default
No Fine-tuning
Deep-Inharmonicity
Reduced-Context
2009-only

F. Schubert - Impromptu Op.142 No.4, in F minor, D935

Piano year: 2011

Model	Audio sample
Original recording
Fluidsynth
Pianoteq
TTS
Default
No Fine-tuning
Deep-Inharmonicity
Reduced-Context
2009-only

F. Chopin - Nocturne in B Major, Op.9 No.3

Piano year: 2009

Model	Audio sample
Original recording
Fluidsynth
Pianoteq
TTS
Default
No Fine-tuning
Deep-Inharmonicity
Reduced-Context
2009-only

J.S. Bach - Prelude & Fugue in G-Sharp Minor, WTC I BWV.863

Piano year: 2013

Model	Audio sample
Original recording
Fluidsynth
Pianoteq
TTS
Default
No Fine-tuning
Deep-Inharmonicity
Reduced-Context
2009-only

L. van Beethoven - Rondo a Capriccioso “Rage over a Lost Penny”, Op.129

Piano year: 2018

Model	Audio sample
Original recording
Fluidsynth
Pianoteq
TTS
Default
No Fine-tuning
Deep-Inharmonicity
Reduced-Context
2009-only

S. Rachmaninoff, Etudes-Tableaux, Op.39 No.9

Piano year: 2006

Model	Audio sample
Original recording
Fluidsynth
Pianoteq
TTS
Default
No Fine-tuning
Deep-Inharmonicity
Reduced-Context
2009-only

W.A. Mozart - Sonata in B-Flat Major (1st movement), K333

Piano year: 2008

Model	Audio sample
Original recording
Fluidsynth
Pianoteq
TTS
Default
No Fine-tuning
Deep-Inharmonicity
Reduced-Context
2009-only

Bonus: Disentanglement of sound components

As in the original DDSP experiments, the audio output of the differentiable DSP layers can be heard. The audio without reverb can be extracted before applying the reverberation layer:

Wet audio	Reverb removed

Audio can also be decomposed into its pure harmonic components and the residual noise, by only listening to the outputs of the additive and filtered noise synthesizers:

Additive Synth	Filtered Noise Synth

The “initial impact” issue

In the manuscript, an “initial impact” issue was reported, where listeners could hear a noise at the beginning of the excerpts synthesized by DDSP-Piano, as if the piano soundboard was excited even before inputting any control.

The presented audio excerpts in this website have this issue removed, by means of audio fade-ins or RNN warm-up during inference time. Otherwise, it would sound like the following:

Bonus #2: An underfitted model

We slightly investigated whether the model is able to reproduce a piano with reduced training data or not.

A DDSP-Piano model was trained on a performance of Debussy’s “Images I: Reflets dans l’Eau” available in the training set of MAESTRO, which amounts for 15 minutes of MIDI-audio training data.

The resynthesis gives:

Original recording	DDSP-Piano

Clearly, the model underfits and fails to reproduce the target piano model, but it yields interesting tones that may find artistic usages…

Bonus #3: other polyphonic instruments

The polyphonic conditioning proposed for DDSP-Piano extends the DDSP approach to handle polyphonic MIDI inputs and audio outputs, not just for the piano instrument.

Here, the training setting of the Mel2Mel model has been reproduced, in order to train a DDSP-Piano model on other instruments:

Ground-truth MIDI data are taken from the MUS subset of the MAPS dataset.
Ground-truth audio data are synthesized by using Fluidsynth soundfont and its default multi-instrument soundfont.
The inharmonicity module of DDSP-Piano is set to produce only pure-harmonics of the fundamental frequency.

Instrument	Scriabin - Op.42 No.4	Rachmaninoff, Op.39 No.9
Acoustic Guitar
Church Organ
Electric Piano
Grand Piano
Orchestral Harp
Pizzicato Strings
String Ensemble
Synth Lead
Trumpet
Vibraphone