Skip to content
back to blog

tag

memory

1 post tagged memory.

Fused Linear Cross-Entropy : Why fusing the LM head projection with cross-entropy is the single biggest memory win for training LLMs at long context.