英伟达megatron的数据layout为啥要用sbh（先seq）而非bsh（先batch）?-有趣的事

英伟达megatron的数据layout为啥要用sbh（先seq）而非bsh（先batch）?

2023-09-10 阅读 12

更新于 2025年12月26日