Skip to content
블로그로 돌아가기

태그

training

training 태그가 달린 글 1개.

Fused Linear Cross-Entropy : Why fusing the LM head projection with cross-entropy is the single biggest memory win for training LLMs at long context.