Good quality Llama 3.1 8B and 70B in torch_xla_models Milestone

The goal of this milestone is to ensure we can replace the hard-to-understand Llama reference implementation in https://github.com/pytorch-tpu/transformers/tree/flash_attention. The branch of the Huggingface fork is not ideal for engineering and for showing to interested users.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Good quality Llama 3.1 8B and 70B in torch_xla_models