Speech

AI Speech Engineer Roadmap: From Zero to Production in 18 Months

A curated 18-month learning roadmap for becoming an AI Speech Engineer — covering foundations, core technologies (ASR, TTS, Speaker Verification, Diarization, Voice Conversion), …

Mar 11, 2026 • 7 min read

Deep Learning

Speaker Diarization: From Traditional Methods to the Modern Models

Speaker Diarization answers "Who spoken when?" — covering core concepts, traditional and modern end-to-end approaches, and the latest Sortformer model for speaker segmentation.

Apr 28, 2025 • 6 min read

Machine Learning

Why Entropy Matters in Machine Learning?

Understanding entropy and why it's a core concept in decision trees, neural networks, and loss functions like cross-entropy.

Apr 4, 2025 • 3 min read

Deep Learning

LoRA-Whisper: A Scalable and Efficient Solution for Multilingual ASR

Exploring LoRA-Whisper, a scalable and efficient approach for multilingual ASR using Low-Rank Adaptation to fine-tune OpenAI's Whisper model while avoiding catastrophic forgetting …

Mar 15, 2025 • 2 min read

Speech-Synthesis

Vietnamese Voice Conversion

Overview This thesis develops a voice conversion model for Vietnamese based on the Phoneme Hallucinator model with 2 adoptions: (1) Add a Text2SSL module to get more context …

admin

• Mar 9, 2024 • 1 min read

Tts

Postnet Layer

Generally speaking, the postnet layer receives a mel-spectrogram and predicts another mel-spectrogram with additional information. That makes the output mel-spectrogram more …