Vietnamese Voice Conversion
Vietnamese Voice Conversion
Overview
This thesis develops a voice conversion model for Vietnamese based on the Phoneme Hallucinator model with 2 adoptions: (1) Add a Text2SSL module to get more context information before performing the KNN algorithm, (2) To create a more
diverse dataset we apply spectrogram-resize (SR) based data augmentation idea from Free-VC model which distorts speaker information without changing content information to generate more ”speakers”.

Comparing different methods
This section compares the baseline and the proposal model.
| Source | Target | Baseline Model | Proposal Model | |
|---|---|---|---|---|
| [trangntt] Female to Female Conversion | ||||
| [trangntt] Male to Female Conversion | ||||
| [nguyenlm] Male to Male Conversion | ||||
| [nguyenlm] Female to Male Conversion | ||||
| [thanhpv] Male to Male Conversion | ||||
| [thanhpv] Female to Male Conversion | ||||