diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
new file mode 100644
index 0000000..c5746a2
--- /dev/null
+++ b/.github/CONTRIBUTING.md
@@ -0,0 +1,45 @@
+# Welcome to the contribution guide 🤗
+
+We are excited to invite the community to contribute to the repository! We appreciate all contributions, big or small. Your efforts help make this repository a valuable resource for everyone working with Llama models.
+
+Thank you for your time and happy coding!
+
+## 🚀 How to Contribute
+
+1. **Fork the Repository**
+    - Click on the "Fork" button at the top right corner of this page to create your own copy of the repository.
+    
+    ![fork button](../assets/Fork.png)
+
+2. **Create a Branch**
+    - In your forked repository, create a new branch for your contribution:
+        ```bash
+        git checkout -b feature/your-feature-name
+        ```
+
+3. **Make Your Changes**
+    - Add your scripts, notebooks, or any relevant files.
+    - **Don't forget to update the `README.md`** to include your example, 
+        so others can easily find and use it.
+
+4. **Commit and Push**
+    - Commit your changes with a meaningful commit message:
+        ```bash
+        git commit -m "Add feature: your feature name"
+        ```
+    - Push the changes to your forked repository:
+        ```bash
+        git push origin feature/your-feature-name
+        ```
+
+5. **Open a Pull Request**
+    - Navigate to the original repository and click on "New Pull Request"
+    - Compare across forks and select your branch.
+    
+    ![pull request](../assets/PR.png)
+    - Provide a clear description of your contribution.
+   
+
+## 💡 Need Help?
+
+If you have any questions or need guidance, feel free to open an issue or draft PR. We're here to help!
diff --git a/README.md b/.github/README.md
similarity index 72%
rename from README.md
rename to .github/README.md
index 36ffbd2..8ddc1e4 100644
--- a/README.md
+++ b/.github/README.md
@@ -1,6 +1,6 @@
 # Hugging Face Llama Recipes
 
-![thumbnail for repository](./assets/hf-llama-recepies.png)
+![thumbnail for repository](../assets/hf-llama-recepies.png)
 
 🤗🦙Welcome! This repository contains *minimal* recipes to get started quickly
 with **Llama 3.x** models, including **Llama 3.1** and **Llama 3.2**.
@@ -71,8 +71,6 @@ So do we! The memory requirements depend on the model size and the
 precision of the weights. Here's a table showing the approximate
 memory needed for different configurations:
 
-### Llama 3.1
-
 | Model Size | Llama Variant | BF16/FP16 | FP8 | INT4(AWQ/GPTQ/bnb) |
 | :--: | :--: | :--: | :--: | :--: |
 | 1B | 3.2 | 2.5 GB | 1.25GB | 0.75GB |
@@ -88,15 +86,15 @@ implementation details and optimizations.
 
 Working with the capable Llama 3.1 8B models:
 
-* [Run Llama 3.1 8B in 4-bits with bitsandbytes](./4bit_bnb.ipynb)
-* [Run Llama 3.1 8B in 8-bits with bitsandbytes](./8bit_bnb.ipynb)
-* [Run Llama 3.1 8B with AWQ & fused ops](./awq.ipynb)
+* [Run Llama 3.1 8B in 4-bits with bitsandbytes](../local_inference/4bit_bnb.ipynb)
+* [Run Llama 3.1 8B in 8-bits with bitsandbytes](../local_inference/8bit_bnb.ipynb)
+* [Run Llama 3.1 8B with AWQ & fused ops](../local_inference/awq.ipynb)
 
 Working on the 🐘 big Llama 3.1 405B model:
 
-* [Run Llama 3.1 405B FP8](./fp8-405B.ipynb)
-* [Run Llama 3.1 405B quantized to INT4 with AWQ](./awq_generation.py)
-* [Run Llama 3.1 405B quantized to INT4 with GPTQ](./gptq_generation.py)
+* [Run Llama 3.1 405B FP8](../local_inference/fp8-405B.ipynb)
+* [Run Llama 3.1 405B quantized to INT4 with AWQ](../local_inference/awq_generation.py)
+* [Run Llama 3.1 405B quantized to INT4 with GPTQ](../local_inference/gptq_generation.py)
 
 ## Model Fine Tuning:
 
@@ -106,43 +104,44 @@ custom dataset. Here are some scripts showing
 how to fine-tune the models.
 
 Fine tune models on your custom dataset:
-* [Fine tune Llama 3.2 Vision on a custom dataset](./Llama-Vision%20FT.ipynb)
-* [Supervised Fine Tuning on Llama 3.2 Vision with TRL](./sft_vlm.py)
-* [How to fine-tune Llama 3.1 8B on consumer GPU with PEFT and QLoRA with bitsandbytes](./peft_finetuning.py)
-* [Execute a distributed fine tuning job for the Llama 3.1 405B model on a SLURM-managed computing cluster](./qlora_405B.slurm)
+* [Fine tune Llama 3.2 Vision on a custom dataset](../fine_tune/Llama-Vision%20FT.ipynb)
+* [Supervised Fine Tuning on Llama 3.2 Vision with TRL](../fine_tune/sft_vlm.py)
+* [How to fine-tune Llama 3.1 8B on consumer GPU with PEFT and QLoRA with bitsandbytes](../fine_tune/peft_finetuning.py)
+* [Execute a distributed fine tuning job for the Llama 3.1 405B model on a SLURM-managed computing cluster](../fine_tune/qlora_405B.slurm)
 
 ## Assisted Decoding Techniques
 
 Do you want to use the smaller Llama 3.2 models to speedup text generation
 of bigger models? These notebooks showcase assisted decoding (speculative decoding), which gives you upto 2x speedups for text generation on Llama 3.1 70B (with greedy decoding).
 
-* [Run assisted decoding with 🐘 Llama 3.1 70B and 🤏 Llama 3.2 3B](./assisted_decoding_70B_3B.ipynb)
-* [Run assisted decoding with Llama 3.1 8B and Llama 3.2 1B](./assisted_decoding_8B_1B.ipynb)
-* [Assisted Decoding with 405B model](./assisted_decoding.py)
+* [Run assisted decoding with 🐘 Llama 3.1 70B and 🤏 Llama 3.2 3B](../assisted_decoding/assisted_decoding_70B_3B.ipynb)
+* [Run assisted decoding with Llama 3.1 8B and Llama 3.2 1B](../assisted_decoding/assisted_decoding_8B_1B.ipynb)
+* [Assisted Decoding with 405B model](../assisted_decoding/assisted_decoding.py)
 
 ## Performance Optimization
 
 Let us optimize performace shall we?
 
-* [Accelerate your inference using torch.compile](./torch_compile.py)
-* [Accelerate your inference using torch.compile and 4-bit quantization with torchao](./torch_compile_with_torchao.ipynb)
-* [Quantize KV Cache to lower memory requirements](./quantized_cache.py)
-* [How to reuse prompts with dynamic caching](./prompt_reuse.py)
+* [Accelerate your inference using torch.compile](../performance_optimization/torch_compile.py)
+* [Accelerate your inference using torch.compile and 4-bit quantization with torchao](../performance_optimization/torch_compile_with_torchao.ipynb)
+* [Quantize KV Cache to lower memory requirements](../performance_optimization/quantized_cache.py)
+* [How to reuse prompts with dynamic caching](../performance_optimization/prompt_reuse.py)
+* [How to setup distributed training utilizing DeepSpeed with mixed-precision and Zero-3 optimization](../performance_optimization/deepspeed_zero3.yaml)
 
 ## API inference
 
 Are these models too large for you to run at home? Would you like to experiment with Llama 70B? Try out the following examples!
 
-* [Use the Inference API for PRO users](./inference-api.ipynb)
+* [Use the Inference API for PRO users](../api_inference/inference-api.ipynb)
 
 ## Llama Guard and Prompt Guard
 
 In addition to the generative models, Meta released two new models: Llama Guard 3 and Prompt Guard. Prompt Guard is a small classifier that detects jailbreaks and prompt injections. Llama Guard 3 is a safeguard model that can classify LLM inputs and generations. Learn how to use them as done in the following notebooks:
 
-* [Detecting jailbreaks and prompt injection with Prompt Guard](./prompt_guard.ipynb)
+* [Detecting jailbreaks and prompt injection with Prompt Guard](../llama_guard/prompt_guard.ipynb)
 
 ## Synthetic Data Generation
 With the ever hungry models, the need for synthetic data generation is
 on the rise. Here we show you how to build your very own synthetic dataset.
 
-* [Generate synthetic data with `distilabel`](./synthetic-data-with-llama.ipynb)
+* [Generate synthetic data with `distilabel`](../synthetic_data_gen/synthetic-data-with-llama.ipynb)
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..496ee2c
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1 @@
+.DS_Store
\ No newline at end of file
diff --git a/inference-api.ipynb b/api_inference/inference-api.ipynb
similarity index 100%
rename from inference-api.ipynb
rename to api_inference/inference-api.ipynb
diff --git a/assets/Fork.png b/assets/Fork.png
new file mode 100644
index 0000000..819338a
Binary files /dev/null and b/assets/Fork.png differ
diff --git a/assets/PR.png b/assets/PR.png
new file mode 100644
index 0000000..cc2de15
Binary files /dev/null and b/assets/PR.png differ
diff --git a/assisted_decoding.py b/assisted_decoding/assisted_decoding.py
similarity index 100%
rename from assisted_decoding.py
rename to assisted_decoding/assisted_decoding.py
diff --git a/assisted_decoding_70B_3B.ipynb b/assisted_decoding/assisted_decoding_70B_3B.ipynb
similarity index 100%
rename from assisted_decoding_70B_3B.ipynb
rename to assisted_decoding/assisted_decoding_70B_3B.ipynb
diff --git a/assisted_decoding_8B_1B.ipynb b/assisted_decoding/assisted_decoding_8B_1B.ipynb
similarity index 100%
rename from assisted_decoding_8B_1B.ipynb
rename to assisted_decoding/assisted_decoding_8B_1B.ipynb
diff --git a/Llama-Vision FT.ipynb b/fine_tune/Llama-Vision FT.ipynb
similarity index 100%
rename from Llama-Vision FT.ipynb
rename to fine_tune/Llama-Vision FT.ipynb
diff --git a/peft_finetuning.py b/fine_tune/peft_finetuning.py
similarity index 100%
rename from peft_finetuning.py
rename to fine_tune/peft_finetuning.py
diff --git a/qlora_405B.slurm b/fine_tune/qlora_405B.slurm
similarity index 100%
rename from qlora_405B.slurm
rename to fine_tune/qlora_405B.slurm
diff --git a/sft_vlm.py b/fine_tune/sft_vlm.py
similarity index 100%
rename from sft_vlm.py
rename to fine_tune/sft_vlm.py
diff --git a/prompt_guard.ipynb b/llama_guard/prompt_guard.ipynb
similarity index 100%
rename from prompt_guard.ipynb
rename to llama_guard/prompt_guard.ipynb
diff --git a/4bit_bnb.ipynb b/local_inference/4bit_bnb.ipynb
similarity index 100%
rename from 4bit_bnb.ipynb
rename to local_inference/4bit_bnb.ipynb
diff --git a/8bit_bnb.ipynb b/local_inference/8bit_bnb.ipynb
similarity index 100%
rename from 8bit_bnb.ipynb
rename to local_inference/8bit_bnb.ipynb
diff --git a/awq.ipynb b/local_inference/awq.ipynb
similarity index 100%
rename from awq.ipynb
rename to local_inference/awq.ipynb
diff --git a/awq_generation.py b/local_inference/awq_generation.py
similarity index 100%
rename from awq_generation.py
rename to local_inference/awq_generation.py
diff --git a/fp8-405B.ipynb b/local_inference/fp8-405B.ipynb
similarity index 100%
rename from fp8-405B.ipynb
rename to local_inference/fp8-405B.ipynb
diff --git a/gptq_generation.py b/local_inference/gptq_generation.py
similarity index 100%
rename from gptq_generation.py
rename to local_inference/gptq_generation.py
diff --git a/deepspeed_zero3.yaml b/performance_optimization/deepspeed_zero3.yaml
similarity index 100%
rename from deepspeed_zero3.yaml
rename to performance_optimization/deepspeed_zero3.yaml
diff --git a/prompt_reuse.py b/performance_optimization/prompt_reuse.py
similarity index 100%
rename from prompt_reuse.py
rename to performance_optimization/prompt_reuse.py
diff --git a/quantized_cache.py b/performance_optimization/quantized_cache.py
similarity index 100%
rename from quantized_cache.py
rename to performance_optimization/quantized_cache.py
diff --git a/torch_compile.py b/performance_optimization/torch_compile.py
similarity index 100%
rename from torch_compile.py
rename to performance_optimization/torch_compile.py
diff --git a/torch_compile_with_torchao.ipynb b/performance_optimization/torch_compile_with_torchao.ipynb
similarity index 100%
rename from torch_compile_with_torchao.ipynb
rename to performance_optimization/torch_compile_with_torchao.ipynb
diff --git a/synthetic-data-with-llama.ipynb b/synthetic_data_gen/synthetic-data-with-llama.ipynb
similarity index 100%
rename from synthetic-data-with-llama.ipynb
rename to synthetic_data_gen/synthetic-data-with-llama.ipynb