Vietnamese Voice Conversion
Mar 9, 2024··
1 min read
admin
Overview
This thesis develops a voice conversion model for Vietnamese based on the Phoneme Hallucinator model with 2 adoptions: (1) Add a Text2SSL module to get more context information before performing the KNN algorithm, (2) To create a more
diverse dataset we apply spectrogram-resize (SR) based data augmentation idea from Free-VC model which distorts speaker information without changing content information to generate more ”speakers”.
The proposal model

Comparing different methods
This section compares the baseline and the proposal model.
| Source | Target | Baseline Model | Proposal Model | |
|---|---|---|---|---|
| [trangntt] Female to Female Conversion | ||||
| [trangntt] Male to Female Conversion | ||||
| [nguyenlm] Male to Male Conversion | ||||
| [nguyenlm] Female to Male Conversion | ||||
| [thanhpv] Male to Male Conversion | ||||
| [thanhpv] Female to Male Conversion | ||||