Cleanup & Add contribution guide (#42)

* chore: moving to folders * fix: broken links * add: contributing guide * diagram to newline * fix readme * Update .github/CONTRIBUTING.md Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> * Update .github/CONTRIBUTING.md Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> --------- Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
huggingface · Sep 30, 2024 · 4e67168 · 4e67168
1 parent 2e1c199
commit 4e67168
Showing 26 changed files with 68 additions and 23 deletions.
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -0,0 +1,45 @@
+# Welcome to the contribution guide 🤗
+
+We are excited to invite the community to contribute to the repository! We appreciate all contributions, big or small. Your efforts help make this repository a valuable resource for everyone working with Llama models.
+
+Thank you for your time and happy coding!
+
+## 🚀 How to Contribute
+
+1. **Fork the Repository**
+    - Click on the "Fork" button at the top right corner of this page to create your own copy of the repository.
+
+    ![fork button](../assets/Fork.png)
+
+2. **Create a Branch**
+    - In your forked repository, create a new branch for your contribution:
+        ```bash
+        git checkout -b feature/your-feature-name
+        ```
+
+3. **Make Your Changes**
+    - Add your scripts, notebooks, or any relevant files.
+    - **Don't forget to update the `README.md`** to include your example, 
+        so others can easily find and use it.
+
+4. **Commit and Push**
+    - Commit your changes with a meaningful commit message:
+        ```bash
+        git commit -m "Add feature: your feature name"
+        ```
+    - Push the changes to your forked repository:
+        ```bash
+        git push origin feature/your-feature-name
+        ```
+
+5. **Open a Pull Request**
+    - Navigate to the original repository and click on "New Pull Request"
+    - Compare across forks and select your branch.
+    
+    ![pull request](../assets/PR.png)
+    - Provide a clear description of your contribution.
+   
+
+## 💡 Need Help?
+
+If you have any questions or need guidance, feel free to open an issue or draft PR. We're here to help!
diff --git a/README.md → .github/README.md b/README.md → .github/README.md
@@ -1,6 +1,6 @@
 # Hugging Face Llama Recipes
 
-![thumbnail for repository](./assets/hf-llama-recepies.png)
+![thumbnail for repository](../assets/hf-llama-recepies.png)
 
 🤗🦙Welcome! This repository contains *minimal* recipes to get started quickly
 with **Llama 3.x** models, including **Llama 3.1** and **Llama 3.2**.
@@ -71,8 +71,6 @@ So do we! The memory requirements depend on the model size and the
 precision of the weights. Here's a table showing the approximate
 memory needed for different configurations:
 
-### Llama 3.1
-
 | Model Size | Llama Variant | BF16/FP16 | FP8 | INT4(AWQ/GPTQ/bnb) |
 | :--: | :--: | :--: | :--: | :--: |
 | 1B | 3.2 | 2.5 GB | 1.25GB | 0.75GB |
@@ -88,15 +86,15 @@ implementation details and optimizations.
 
 Working with the capable Llama 3.1 8B models:
 
-* [Run Llama 3.1 8B in 4-bits with bitsandbytes](./4bit_bnb.ipynb)
-* [Run Llama 3.1 8B in 8-bits with bitsandbytes](./8bit_bnb.ipynb)
-* [Run Llama 3.1 8B with AWQ & fused ops](./awq.ipynb)
+* [Run Llama 3.1 8B in 4-bits with bitsandbytes](../local_inference/4bit_bnb.ipynb)
+* [Run Llama 3.1 8B in 8-bits with bitsandbytes](../local_inference/8bit_bnb.ipynb)
+* [Run Llama 3.1 8B with AWQ & fused ops](../local_inference/awq.ipynb)
 
 Working on the 🐘 big Llama 3.1 405B model:
 
-* [Run Llama 3.1 405B FP8](./fp8-405B.ipynb)
-* [Run Llama 3.1 405B quantized to INT4 with AWQ](./awq_generation.py)
-* [Run Llama 3.1 405B quantized to INT4 with GPTQ](./gptq_generation.py)
+* [Run Llama 3.1 405B FP8](../local_inference/fp8-405B.ipynb)
+* [Run Llama 3.1 405B quantized to INT4 with AWQ](../local_inference/awq_generation.py)
+* [Run Llama 3.1 405B quantized to INT4 with GPTQ](../local_inference/gptq_generation.py)
 
 ## Model Fine Tuning:
 
@@ -106,43 +104,44 @@ custom dataset. Here are some scripts showing
 how to fine-tune the models.
 
 Fine tune models on your custom dataset:
-* [Fine tune Llama 3.2 Vision on a custom dataset](./Llama-Vision%20FT.ipynb)
-* [Supervised Fine Tuning on Llama 3.2 Vision with TRL](./sft_vlm.py)
-* [How to fine-tune Llama 3.1 8B on consumer GPU with PEFT and QLoRA with bitsandbytes](./peft_finetuning.py)
-* [Execute a distributed fine tuning job for the Llama 3.1 405B model on a SLURM-managed computing cluster](./qlora_405B.slurm)
+* [Fine tune Llama 3.2 Vision on a custom dataset](../fine_tune/Llama-Vision%20FT.ipynb)
+* [Supervised Fine Tuning on Llama 3.2 Vision with TRL](../fine_tune/sft_vlm.py)
+* [How to fine-tune Llama 3.1 8B on consumer GPU with PEFT and QLoRA with bitsandbytes](../fine_tune/peft_finetuning.py)
+* [Execute a distributed fine tuning job for the Llama 3.1 405B model on a SLURM-managed computing cluster](../fine_tune/qlora_405B.slurm)
 
 ## Assisted Decoding Techniques
 
 Do you want to use the smaller Llama 3.2 models to speedup text generation
 of bigger models? These notebooks showcase assisted decoding (speculative decoding), which gives you upto 2x speedups for text generation on Llama 3.1 70B (with greedy decoding).
 
-* [Run assisted decoding with 🐘 Llama 3.1 70B and 🤏 Llama 3.2 3B](./assisted_decoding_70B_3B.ipynb)
-* [Run assisted decoding with Llama 3.1 8B and Llama 3.2 1B](./assisted_decoding_8B_1B.ipynb)
-* [Assisted Decoding with 405B model](./assisted_decoding.py)
+* [Run assisted decoding with 🐘 Llama 3.1 70B and 🤏 Llama 3.2 3B](../assisted_decoding/assisted_decoding_70B_3B.ipynb)
+* [Run assisted decoding with Llama 3.1 8B and Llama 3.2 1B](../assisted_decoding/assisted_decoding_8B_1B.ipynb)
+* [Assisted Decoding with 405B model](../assisted_decoding/assisted_decoding.py)
 
 ## Performance Optimization
 
 Let us optimize performace shall we?
 
-* [Accelerate your inference using torch.compile](./torch_compile.py)
-* [Accelerate your inference using torch.compile and 4-bit quantization with torchao](./torch_compile_with_torchao.ipynb)
-* [Quantize KV Cache to lower memory requirements](./quantized_cache.py)
-* [How to reuse prompts with dynamic caching](./prompt_reuse.py)
+* [Accelerate your inference using torch.compile](../performance_optimization/torch_compile.py)
+* [Accelerate your inference using torch.compile and 4-bit quantization with torchao](../performance_optimization/torch_compile_with_torchao.ipynb)
+* [Quantize KV Cache to lower memory requirements](../performance_optimization/quantized_cache.py)
+* [How to reuse prompts with dynamic caching](../performance_optimization/prompt_reuse.py)
+* [How to setup distributed training utilizing DeepSpeed with mixed-precision and Zero-3 optimization](../performance_optimization/deepspeed_zero3.yaml)
 
 ## API inference
 
 Are these models too large for you to run at home? Would you like to experiment with Llama 70B? Try out the following examples!
 
-* [Use the Inference API for PRO users](./inference-api.ipynb)
+* [Use the Inference API for PRO users](../api_inference/inference-api.ipynb)
 
 ## Llama Guard and Prompt Guard
 
 In addition to the generative models, Meta released two new models: Llama Guard 3 and Prompt Guard. Prompt Guard is a small classifier that detects jailbreaks and prompt injections. Llama Guard 3 is a safeguard model that can classify LLM inputs and generations. Learn how to use them as done in the following notebooks:
 
-* [Detecting jailbreaks and prompt injection with Prompt Guard](./prompt_guard.ipynb)
+* [Detecting jailbreaks and prompt injection with Prompt Guard](../llama_guard/prompt_guard.ipynb)
 
 ## Synthetic Data Generation
 With the ever hungry models, the need for synthetic data generation is
 on the rise. Here we show you how to build your very own synthetic dataset.
 
-* [Generate synthetic data with `distilabel`](./synthetic-data-with-llama.ipynb)
+* [Generate synthetic data with `distilabel`](../synthetic_data_gen/synthetic-data-with-llama.ipynb)
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+.DS_Store
diff --git a/inference-api.ipynb → api_inference/inference-api.ipynb b/inference-api.ipynb → api_inference/inference-api.ipynb
diff --git a/assets/Fork.png b/assets/Fork.png
diff --git a/assets/PR.png b/assets/PR.png
diff --git a/assisted_decoding.py → assisted_decoding/assisted_decoding.py b/assisted_decoding.py → assisted_decoding/assisted_decoding.py
diff --git a/assisted_decoding_70B_3B.ipynb → ...d_decoding/assisted_decoding_70B_3B.ipynb b/assisted_decoding_70B_3B.ipynb → ...d_decoding/assisted_decoding_70B_3B.ipynb
diff --git a/assisted_decoding_8B_1B.ipynb → ...ed_decoding/assisted_decoding_8B_1B.ipynb b/assisted_decoding_8B_1B.ipynb → ...ed_decoding/assisted_decoding_8B_1B.ipynb
diff --git a/Llama-Vision FT.ipynb → fine_tune/Llama-Vision FT.ipynb b/Llama-Vision FT.ipynb → fine_tune/Llama-Vision FT.ipynb
diff --git a/peft_finetuning.py → fine_tune/peft_finetuning.py b/peft_finetuning.py → fine_tune/peft_finetuning.py
diff --git a/qlora_405B.slurm → fine_tune/qlora_405B.slurm b/qlora_405B.slurm → fine_tune/qlora_405B.slurm
diff --git a/sft_vlm.py → fine_tune/sft_vlm.py b/sft_vlm.py → fine_tune/sft_vlm.py
diff --git a/prompt_guard.ipynb → llama_guard/prompt_guard.ipynb b/prompt_guard.ipynb → llama_guard/prompt_guard.ipynb
diff --git a/4bit_bnb.ipynb → local_inference/4bit_bnb.ipynb b/4bit_bnb.ipynb → local_inference/4bit_bnb.ipynb
diff --git a/8bit_bnb.ipynb → local_inference/8bit_bnb.ipynb b/8bit_bnb.ipynb → local_inference/8bit_bnb.ipynb
diff --git a/awq.ipynb → local_inference/awq.ipynb b/awq.ipynb → local_inference/awq.ipynb
diff --git a/awq_generation.py → local_inference/awq_generation.py b/awq_generation.py → local_inference/awq_generation.py
diff --git a/fp8-405B.ipynb → local_inference/fp8-405B.ipynb b/fp8-405B.ipynb → local_inference/fp8-405B.ipynb
diff --git a/gptq_generation.py → local_inference/gptq_generation.py b/gptq_generation.py → local_inference/gptq_generation.py
diff --git a/deepspeed_zero3.yaml → ...ormance_optimization/deepspeed_zero3.yaml b/deepspeed_zero3.yaml → ...ormance_optimization/deepspeed_zero3.yaml
diff --git a/prompt_reuse.py → performance_optimization/prompt_reuse.py b/prompt_reuse.py → performance_optimization/prompt_reuse.py
diff --git a/quantized_cache.py → performance_optimization/quantized_cache.py b/quantized_cache.py → performance_optimization/quantized_cache.py
diff --git a/torch_compile.py → performance_optimization/torch_compile.py b/torch_compile.py → performance_optimization/torch_compile.py
diff --git a/torch_compile_with_torchao.ipynb → ...mization/torch_compile_with_torchao.ipynb b/torch_compile_with_torchao.ipynb → ...mization/torch_compile_with_torchao.ipynb
diff --git a/synthetic-data-with-llama.ipynb → ..._data_gen/synthetic-data-with-llama.ipynb b/synthetic-data-with-llama.ipynb → ..._data_gen/synthetic-data-with-llama.ipynb