Vietnamese Voice Conversion
Vietnamese Voice Conversion
Overview
This thesis develops a voice conversion model for Vietnamese based on the Phoneme Hallucinator model with 2 adoptions: (1) Add a Text2SSL module to get more context information before performing the KNN algorithm, (2) To create a more
diverse dataset we apply spectrogram-resize (SR) based data augmentation idea from Free-VC model which distorts speaker information without changing content information to generate more ”speakers”.
Comparing different methods
This section compares the baseline and the proposal model.
Source | Target | Baseline Model | Proposal Model | |
---|---|---|---|---|
[trangntt] Female to Female Conversion | ||||
[trangntt] Male to Female Conversion | ||||
[nguyenlm] Male to Male Conversion | ||||
[nguyenlm] Female to Male Conversion | ||||
[thanhpv] Male to Male Conversion | ||||
[thanhpv] Female to Male Conversion | ||||