Understanding FlashAttention: Inner vs Outer Loop Optimization
FlashAttention is a groundbreaking optimization technique for computing attention in Transformer models, drastically improving GPU memory efficiency through inner vs outer loop …
•
1 min read
