Overview

Model architecture

New architecture of DDSP-Piano. The main differences with the first version are:

  • Audio sampling rate increased from 16kHz to 24kHz.
  • Inharmonicity and detuning parameters are estimated using a signal-based extraction method, from isolated single piano notes.
  • A new differentiable reverberation module based on Feedback-Delay Networks.
  • Slightly deeper context and monophonic networks.

Synthesis examples

Other systems besides the presented DDSP-Piano v2 include:

  • Original recording: the real audio recording of the performance.
  • Piano-TTS v2: the improved neural-based synthesis model inspired by text-to-speech techniques.
  • DDSP-Piano v1: the first version of DDSP-Piano, retrained for synthesis with a 24kHz sampling rate.
  • DDSP-Piano v2 Regularized: a ablated variant of the v2 version, with a regularization on the early reflections of the reverberation module.

A. Scriabin - Etude, Op.42 No.4

Piano year: 2009

Model Audio sample
Original recording
Piano-TTS v2
DDSP-Piano v1
DDSP-Piano v2 Regularized
DDSP-Piano v2

C. Debussy - Etude, No.7 “Study in Chromatic Steps”

Piano year: 2004

Model Audio sample
Original recording
Piano-TTS v2
DDSP-Piano v1
DDSP-Piano v2 Regularized
DDSP-Piano v2

D. Scarlatti - Sonata in D Major, K.118

Piano year: 2014

Model Audio sample
Original recording
Piano-TTS v2
DDSP-Piano v1
DDSP-Piano v2 Regularized
DDSP-Piano v2

F. Mendelssohn - Fantasy in F-sharp minor, Op.28

Piano year: 2017

Model Audio sample
Original recording
Piano-TTS v2
DDSP-Piano v1
DDSP-Piano v2 Regularized
DDSP-Piano v2

F. Liszt - Hungarian Rhapsody No.9 in E-Flat Major, S.244

Piano year: 2015

Model Audio sample
Original recording
Piano-TTS v2
DDSP-Piano v1
DDSP-Piano v2 Regularized
DDSP-Piano v2

F. Schubert - Impromptu Op.142 No.4, in F minor, D935

Piano year: 2011

Model Audio sample
Original recording
Piano-TTS v2
DDSP-Piano v1
DDSP-Piano v2 Regularized
DDSP-Piano v2

F. Chopin - Nocturne in B Major, Op.9 No.3

Piano year: 2009

Model Audio sample
Original recording
Piano-TTS v2
DDSP-Piano v1
DDSP-Piano v2 Regularized
DDSP-Piano v2

J.S. Bach - Prelude & Fugue in G-Sharp Minor, WTC I BWV.863

Piano year: 2013

Model Audio sample
Original recording
Piano-TTS v2
DDSP-Piano v1
DDSP-Piano v2 Regularized
DDSP-Piano v2

L. van Beethoven - Rondo a Capriccioso “Rage over a Lost Penny”, Op.129

Piano year: 2018

Model Audio sample
Original recording
Piano-TTS v2
DDSP-Piano v1
DDSP-Piano v2 Regularized
DDSP-Piano v2

S. Rachmaninoff, Etudes-Tableaux, Op.39 No.9

Piano year: 2006

Model Audio sample
Original recording
Piano-TTS v2
DDSP-Piano v1
DDSP-Piano v2 Regularized
DDSP-Piano v2

W.A. Mozart - Sonata in B-Flat Major (1st movement), K333

Piano year: 2008

Model Audio sample
Original recording
Piano-TTS v2
DDSP-Piano v1
DDSP-Piano v2 Regularized
DDSP-Piano v2

Disentanglement of sound components

Here, we can compare the output of individual DDSP synthesizer of the presented models.

Note that the audio samples are not set to a common loudness, mainly for highlighting the differences when the early relections of the FDN reverb are regularized.

Component DDSP-Piano v1 DDSP-Piano v2 Regularized DDSP-Piano v2
Additive
Residual Noise
Dry Output
Reverb      

Bonus: Training on the MAPS dataset

MAPS is a piano dataset older than MAESTRO, that also provides aligned pairs of MIDI and audio performances. The ENSTDkCl (ENST Disklavier Close) subset has 2 hours of MIDI/Audio recordings of an upright Disklavier piano.

We trained a DDSP-Piano v2 model on this dataset. Below are some examples on MAESTRO performances, in comparison with the best sounding model trained on MAESTRO (year 2015):

Model Mendelssohn - Fantasy Schubert - Impromptu Liszt - Hungarian Rhapsody
ENSTDkCl
MAESTRO-2015