Juyoung Suk

Juyoung Suk https://juyoung.site/blog/ Research notes and technical writing on foundation models, evaluation, and training systems. en Tue, 05 May 2026 00:00:00 GMT Fused Linear Cross-Entropy https://juyoung.site/blog/fused-lce/ https://juyoung.site/blog/fused-lce/ Tue, 05 May 2026 00:00:00 GMT Why fusing the LM head projection with cross-entropy is the single biggest memory win for training LLMs at long context. training kernels memory