Exploring LoRA-Whisper, a scalable and efficient approach for multilingual ASR using Low-Rank Adaptation to fine-tune OpenAI’s Whisper model while avoiding catastrophic forgetting across languages.
FlashAttention is a groundbreaking optimization technique for computing attention in Transformer models, drastically improving GPU memory efficiency through inner vs outer loop restructuring.
An overview of adversarial attacks on large language models (LLMs) — how manipulated inputs can deceive models into generating harmful or incorrect outputs, covering key attack types, implications, and defense strategies.
A detailed summary of the GLiNER paper, introducing a lightweight, scalable, and highly effective model for open-type named entity recognition using bidirectional transformers with zero-shot generalization.
A curated collection of bash functions, troubleshooting commands, and performance tweaks that I often use in my daily workflow.
The proposal model
Generally speaking, the postnet layer receives a mel-spectrogram and predicts another mel-spectrogram with additional information. That makes the output mel-spectrogram more detail, and hence improves the quality of synthesis audio.

This section compares Phoneme Hallucinator kNN-VC and Phoneme Hallucinator.
| Source | Target | Phoneme Hallucinator | Phoneme Hallucinator + Text2SSL |
|---|---|---|---|

This section compares Phoneme Hallucinator kNN-VC and Phoneme Hallucinator.
| Source | Target | kNN-VC | Phoneme Hallucinator |
|---|---|---|---|
The trick for killing zombie processes using GPU in Linux 😃.