Skip to content

Commit

Permalink
Cleanup & Add contribution guide (#42)
Browse files Browse the repository at this point in the history
* chore: moving to folders

* fix: broken links

* add: contributing guide

* diagram to newline

* fix readme

* Update .github/CONTRIBUTING.md

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Update .github/CONTRIBUTING.md

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

---------

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
  • Loading branch information
ariG23498 and osanseviero authored Sep 30, 2024
1 parent 2e1c199 commit 4e67168
Showing 26 changed files with 68 additions and 23 deletions.
45 changes: 45 additions & 0 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Welcome to the contribution guide 🤗

We are excited to invite the community to contribute to the repository! We appreciate all contributions, big or small. Your efforts help make this repository a valuable resource for everyone working with Llama models.

Thank you for your time and happy coding!

## 🚀 How to Contribute

1. **Fork the Repository**
- Click on the "Fork" button at the top right corner of this page to create your own copy of the repository.

![fork button](../assets/Fork.png)

2. **Create a Branch**
- In your forked repository, create a new branch for your contribution:
```bash
git checkout -b feature/your-feature-name
```

3. **Make Your Changes**
- Add your scripts, notebooks, or any relevant files.
- **Don't forget to update the `README.md`** to include your example,
so others can easily find and use it.
4. **Commit and Push**
- Commit your changes with a meaningful commit message:
```bash
git commit -m "Add feature: your feature name"
```
- Push the changes to your forked repository:
```bash
git push origin feature/your-feature-name
```
5. **Open a Pull Request**
- Navigate to the original repository and click on "New Pull Request"
- Compare across forks and select your branch.
![pull request](../assets/PR.png)
- Provide a clear description of your contribution.
## 💡 Need Help?
If you have any questions or need guidance, feel free to open an issue or draft PR. We're here to help!
45 changes: 22 additions & 23 deletions README.md → .github/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Hugging Face Llama Recipes

![thumbnail for repository](./assets/hf-llama-recepies.png)
![thumbnail for repository](../assets/hf-llama-recepies.png)

🤗🦙Welcome! This repository contains *minimal* recipes to get started quickly
with **Llama 3.x** models, including **Llama 3.1** and **Llama 3.2**.
@@ -71,8 +71,6 @@ So do we! The memory requirements depend on the model size and the
precision of the weights. Here's a table showing the approximate
memory needed for different configurations:

### Llama 3.1

| Model Size | Llama Variant | BF16/FP16 | FP8 | INT4(AWQ/GPTQ/bnb) |
| :--: | :--: | :--: | :--: | :--: |
| 1B | 3.2 | 2.5 GB | 1.25GB | 0.75GB |
@@ -88,15 +86,15 @@ implementation details and optimizations.

Working with the capable Llama 3.1 8B models:

* [Run Llama 3.1 8B in 4-bits with bitsandbytes](./4bit_bnb.ipynb)
* [Run Llama 3.1 8B in 8-bits with bitsandbytes](./8bit_bnb.ipynb)
* [Run Llama 3.1 8B with AWQ & fused ops](./awq.ipynb)
* [Run Llama 3.1 8B in 4-bits with bitsandbytes](../local_inference/4bit_bnb.ipynb)
* [Run Llama 3.1 8B in 8-bits with bitsandbytes](../local_inference/8bit_bnb.ipynb)
* [Run Llama 3.1 8B with AWQ & fused ops](../local_inference/awq.ipynb)

Working on the 🐘 big Llama 3.1 405B model:

* [Run Llama 3.1 405B FP8](./fp8-405B.ipynb)
* [Run Llama 3.1 405B quantized to INT4 with AWQ](./awq_generation.py)
* [Run Llama 3.1 405B quantized to INT4 with GPTQ](./gptq_generation.py)
* [Run Llama 3.1 405B FP8](../local_inference/fp8-405B.ipynb)
* [Run Llama 3.1 405B quantized to INT4 with AWQ](../local_inference/awq_generation.py)
* [Run Llama 3.1 405B quantized to INT4 with GPTQ](../local_inference/gptq_generation.py)

## Model Fine Tuning:

@@ -106,43 +104,44 @@ custom dataset. Here are some scripts showing
how to fine-tune the models.

Fine tune models on your custom dataset:
* [Fine tune Llama 3.2 Vision on a custom dataset](./Llama-Vision%20FT.ipynb)
* [Supervised Fine Tuning on Llama 3.2 Vision with TRL](./sft_vlm.py)
* [How to fine-tune Llama 3.1 8B on consumer GPU with PEFT and QLoRA with bitsandbytes](./peft_finetuning.py)
* [Execute a distributed fine tuning job for the Llama 3.1 405B model on a SLURM-managed computing cluster](./qlora_405B.slurm)
* [Fine tune Llama 3.2 Vision on a custom dataset](../fine_tune/Llama-Vision%20FT.ipynb)
* [Supervised Fine Tuning on Llama 3.2 Vision with TRL](../fine_tune/sft_vlm.py)
* [How to fine-tune Llama 3.1 8B on consumer GPU with PEFT and QLoRA with bitsandbytes](../fine_tune/peft_finetuning.py)
* [Execute a distributed fine tuning job for the Llama 3.1 405B model on a SLURM-managed computing cluster](../fine_tune/qlora_405B.slurm)

## Assisted Decoding Techniques

Do you want to use the smaller Llama 3.2 models to speedup text generation
of bigger models? These notebooks showcase assisted decoding (speculative decoding), which gives you upto 2x speedups for text generation on Llama 3.1 70B (with greedy decoding).

* [Run assisted decoding with 🐘 Llama 3.1 70B and 🤏 Llama 3.2 3B](./assisted_decoding_70B_3B.ipynb)
* [Run assisted decoding with Llama 3.1 8B and Llama 3.2 1B](./assisted_decoding_8B_1B.ipynb)
* [Assisted Decoding with 405B model](./assisted_decoding.py)
* [Run assisted decoding with 🐘 Llama 3.1 70B and 🤏 Llama 3.2 3B](../assisted_decoding/assisted_decoding_70B_3B.ipynb)
* [Run assisted decoding with Llama 3.1 8B and Llama 3.2 1B](../assisted_decoding/assisted_decoding_8B_1B.ipynb)
* [Assisted Decoding with 405B model](../assisted_decoding/assisted_decoding.py)

## Performance Optimization

Let us optimize performace shall we?

* [Accelerate your inference using torch.compile](./torch_compile.py)
* [Accelerate your inference using torch.compile and 4-bit quantization with torchao](./torch_compile_with_torchao.ipynb)
* [Quantize KV Cache to lower memory requirements](./quantized_cache.py)
* [How to reuse prompts with dynamic caching](./prompt_reuse.py)
* [Accelerate your inference using torch.compile](../performance_optimization/torch_compile.py)
* [Accelerate your inference using torch.compile and 4-bit quantization with torchao](../performance_optimization/torch_compile_with_torchao.ipynb)
* [Quantize KV Cache to lower memory requirements](../performance_optimization/quantized_cache.py)
* [How to reuse prompts with dynamic caching](../performance_optimization/prompt_reuse.py)
* [How to setup distributed training utilizing DeepSpeed with mixed-precision and Zero-3 optimization](../performance_optimization/deepspeed_zero3.yaml)

## API inference

Are these models too large for you to run at home? Would you like to experiment with Llama 70B? Try out the following examples!

* [Use the Inference API for PRO users](./inference-api.ipynb)
* [Use the Inference API for PRO users](../api_inference/inference-api.ipynb)

## Llama Guard and Prompt Guard

In addition to the generative models, Meta released two new models: Llama Guard 3 and Prompt Guard. Prompt Guard is a small classifier that detects jailbreaks and prompt injections. Llama Guard 3 is a safeguard model that can classify LLM inputs and generations. Learn how to use them as done in the following notebooks:

* [Detecting jailbreaks and prompt injection with Prompt Guard](./prompt_guard.ipynb)
* [Detecting jailbreaks and prompt injection with Prompt Guard](../llama_guard/prompt_guard.ipynb)

## Synthetic Data Generation
With the ever hungry models, the need for synthetic data generation is
on the rise. Here we show you how to build your very own synthetic dataset.

* [Generate synthetic data with `distilabel`](./synthetic-data-with-llama.ipynb)
* [Generate synthetic data with `distilabel`](../synthetic_data_gen/synthetic-data-with-llama.ipynb)
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.DS_Store
File renamed without changes.
Binary file added assets/Fork.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/PR.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 comments on commit 4e67168

Please sign in to comment.