Good quality Llama 3.1 8B and 70B in torch_xla_models
Past due by 24 days
28% complete
The goal of this milestone is to ensure we can replace the hard-to-understand Llama reference implementation in https://github.com/pytorch-tpu/transformers/tree/flash_attention. The branch of the Huggingface fork is not ideal for engineering and for showing to interested users.
The goal of this milestone is to ensure we can replace the hard-to-understand Llama reference implementation in https://github.com/pytorch-tpu/transformers/tree/flash_attention. The branch of the Huggingface fork is not ideal for engineering and for showing to interested users.