Posts

LoRA-Whisper: A Scalable and Efficient Solution for Multilingual ASR

Mar 15, 2025

Exploring LoRA-Whisper, a scalable and efficient approach for multilingual ASR using Low-Rank Adaptation to fine-tune OpenAI’s Whisper model while avoiding catastrophic forgetting across languages.

Mar 15, 2025

Understanding FlashAttention: Inner vs Outer Loop Optimization

Feb 1, 2025

FlashAttention is a groundbreaking optimization technique for computing attention in Transformer models, drastically improving GPU memory efficiency through inner vs outer loop restructuring.

Feb 1, 2025

Adversarial Attacks on Large Language Models (LLMs)

Jan 11, 2025

An overview of adversarial attacks on large language models (LLMs) — how manipulated inputs can deceive models into generating harmful or incorrect outputs, covering key attack types, implications, and defense strategies.

Jan 11, 2025

GLiNER: A Generalist Model for Named Entity Recognition using Bidirectional Transformers

Nov 2, 2024

A detailed summary of the GLiNER paper, introducing a lightweight, scalable, and highly effective model for open-type named entity recognition using bidirectional transformers with zero-shot generalization.

Nov 2, 2024

Handy Bash Snippets and Linux Tips

May 28, 2024

A curated collection of bash functions, troubleshooting commands, and performance tweaks that I often use in my daily workflow.

May 28, 2024

Vietnamese Voice Conversion

Mar 9, 2024

Overview

This thesis develops a voice conversion model for Vietnamese based on the Phoneme Hallucinator model with 2 adoptions: (1) Add a Text2SSL module to get more context information before performing the KNN algorithm, (2) To create a more diverse dataset we apply spectrogram-resize (SR) based data augmentation idea from Free-VC model which distorts speaker information without changing content information to generate more ”speakers”.

Mar 9, 2024

Postnet Layer

Mar 9, 2022

Generally speaking, the postnet layer receives a mel-spectrogram and predicts another mel-spectrogram with additional information. That makes the output mel-spectrogram more detail, and hence improves the quality of synthesis audio.