Vietnamese Voice Conversion

Minh Nguyen Le

Mar 9, 2022 1 min read speech-synthesis, voice-conversion

Overview

This thesis develops a voice conversion model for Vietnamese based on the Phoneme Hallucinator model with 2 adoptions: (1) Add a Text2SSL module to get more context information before performing the KNN algorithm, (2) To create a more diverse dataset we apply spectrogram-resize (SR) based data augmentation idea from Free-VC model which distorts speaker information without changing content information to generate more ”speakers”.

Comparing different methods

This section compares the baseline and the proposal model.

Source	Target	Baseline Model	Proposal Model
[trangntt] Female to Female Conversion

[trangntt] Male to Female Conversion

[nguyenlm] Male to Male Conversion

[nguyenlm] Female to Male Conversion

[thanhpv] Male to Male Conversion

[thanhpv] Female to Male Conversion