Posts

Vietnamese Voice Conversion

Overview

This thesis develops a voice conversion model for Vietnamese based on the Phoneme Hallucinator model with 2 adoptions: (1) Add a Text2SSL module to get more context information before performing the KNN algorithm, (2) To create a more diverse dataset we apply spectrogram-resize (SR) based data augmentation idea from Free-VC model which distorts speaker information without changing content information to generate more ”speakers”.

The proposal model

The proposal model

Postnet Layer

Generally speaking, the postnet layer receives a mel-spectrogram and predicts another mel-spectrogram with additional information. That makes the output mel-spectrogram more detail, and hence improves the quality of synthesis audio.

KNN-VC vs Phoneme Hallucinator [23/03/2024] ?

Overview

Comparing different methods

This section compares Phoneme Hallucinator kNN-VC and Phoneme Hallucinator.

SourceTargetPhoneme HallucinatorPhoneme Hallucinator + Text2SSL

KNN-VC vs Phoneme Hallucinator [09/03/2024] ?

Overview

Comparing different methods

This section compares Phoneme Hallucinator kNN-VC and Phoneme Hallucinator.

SourceTargetkNN-VCPhoneme Hallucinator

Fix "[Errno 32] Broken pipe" in Python

One day, I’ve tried to run a python script using the multiprocessing technique and for a while the program crashed and raised the [Errno 32] Broken pipe error…

Comparing batch vs layer normalization

The purpose of this post is just to understand the key difference between two types of well-known normalization techniques.