flash attention

Understanding FlashAttention: Inner vs Outer Loop Optimization

Understanding FlashAttention: Inner vs Outer Loop Optimization FlashAttention is a groundbreaking optimization technique for computing attention in Transformer models. It drastically improves performance by reducing memory bottlenecks and utilizing GPU memory more efficiently.

Feb 1, 2025 2 min read NLP, FlashAttention

Understanding FlashAttention: Inner vs Outer Loop Optimization