Vietnamese Voice Conversion

Mar 9, 2024·
admin
· 1 min read
blog Speech

Overview

This thesis develops a voice conversion model for Vietnamese based on the Phoneme Hallucinator model with 2 adoptions: (1) Add a Text2SSL module to get more context information before performing the KNN algorithm, (2) To create a more diverse dataset we apply spectrogram-resize (SR) based data augmentation idea from Free-VC model which distorts speaker information without changing content information to generate more ”speakers”.

The proposal model

The proposal model

Comparing different methods

This section compares the baseline and the proposal model.

SourceTargetBaseline ModelProposal Model
[trangntt] Female to Female Conversion
[trangntt] Male to Female Conversion
[nguyenlm] Male to Male Conversion
[nguyenlm] Female to Male Conversion
[thanhpv] Male to Male Conversion
[thanhpv] Female to Male Conversion