Cross-Modal Extension of Unpaired Performance Rendering
This page accompanies the last section of Chapter 5 about the expressive performance rendering model and its cross-modal extension.
Overview
Training pipeline for the proposed cross-modal performance rendering model. The performance rendering model R still modifies the note properties of a given score (timing, duration and velocity) into performance-like note features.
In this setting, both MIDI performances and audio performances are used as realistic data in the adversarial training. To account for both the symbolic and audio modalities, fake and real MIDI performances are all synthesized into audio with a smaller DDSP-Piano model (after a conversion from the note-wise to the frame-wise encoding of MIDI performances).
Real audio performances, real MIDI performances synthesized into audio, and fake audio performances rendered by the model are all fed into a multi-scale audio discriminator.
The unpaired datasets used for training:
- “Can I Play It?” provides the compositions.
- ASAP provides real MIDI performances recorded with Disklaviers.
- ATEPP provides real audio performances, gathered from Youtube.
Rendering examples
This section exposes some early examples rendered by the full rendering pipeline, at different stages:
- Deadpan: the plain MIDI score, synthesized into audio with DDSP-Piano.
- Proposed: the performance output by the rendering model and synthesized with DDSP-Piano.
- Human: a real human performance recorded in MIDI, synthesized with DDSP-Piano.
J.S. Bach - Fugue BWV 873
| Output | Audio sample |
|---|---|
| Deadpan S(X) | |
| Proposed S(R(X)) | |
| Human S(Y) |
L. van Beethoven - Sonata N°18, 1st Movement
| Output | Audio sample |
|---|---|
| Deadpan S(X) | |
| Proposed S(R(X)) | |
| Human S(Y) |
L. van Beethoven - Sonata N°8, 3rd Movement
| Output | Audio sample |
|---|---|
| Deadpan S(X) | |
| Proposed S(R(X)) | |
| Human S(Y) |
F. Chopin - Etude Op.10 N°2
| Output | Audio sample |
|---|---|
| Deadpan S(X) | |
| Proposed S(R(X)) | |
| Human S(Y) |
F. Liszt - Paganini Etude N°6
| Output | Audio sample |
|---|---|
| Deadpan S(X) | |
| Proposed S(R(X)) | |
| Human S(Y) |
F. Schubert - Sonata N°13 - 1st Movement
| Output | Audio sample |
|---|---|
| Deadpan S(X) | |
| Proposed S(R(X)) | |
| Human S(Y) |