feat(readme): Update README with examples and roadmap

feat(github): Add workflows for testing and linting
johnsutor · Oct 6, 2024 · d15328a · d15328a
1 parent b28e76b
commit d15328a
Show file tree

Hide file tree

Showing 4 changed files with 146 additions and 1 deletion.
diff --git a/.github/workflows/style.yaml b/.github/workflows/style.yaml
@@ -0,0 +1,25 @@
+name: Lint
+
+on:
+  push:
+    branches: [ main ]
+  pull_request:
+    branches: [ main ]
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v3
+    - name: Set up Python
+      uses: actions/setup-python@v4
+      with:
+        python-version: '3.x'
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install ruff
+    - name: Run Ruff
+      run: ruff check .
+    - name: Run Ruff Format
+      run: ruff format . --check
diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml
@@ -0,0 +1,21 @@
+name: Test
+
+on:
+  push:
+    branches: [ main ]
+  pull_request:
+    branches: [ main ]
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v3
+    - name: Set up Python
+      uses: actions/setup-python@v4
+      with:
+        python-version: '3.x'
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install pytest
diff --git a/README.md b/README.md
@@ -1,4 +1,103 @@
 # Llama-Jarvis
+![Lint Status](https://github.com/johnsutor/llama-jarvis/workflows/Lint/badge.svg)
+![Tests Status](https://github.com/johnsutor/llama-jarvis/workflows/Test/badge.svg)
+![contributions welcome](https://img.shields.io/badge/contributions-welcome-blue.svg?style=flat)
+
+![alt text](image.png)
 Train a speech-to-speech model using your own language model. Currently based on the [Seamless Model](https://huggingface.co/collections/facebook/seamless-communication-6568d486ef451c6ba62c7724), but plan to support more models in the future.
 
-This model is based on speech-to-speech models such as [Llama-Omni](https://github.com/ictnlp/LLaMA-Omni).
+This model is based on speech-to-speech models such as [Llama-Omni](https://github.com/ictnlp/LLaMA-Omni). However, it aims to take advantage of the joint speech-text embeddings of the Seamless Model.
+
+This code is very much a work in progress. Any and all contributions are welcome!  
+
+## Examples
+**NOTE** For some of the below, you may have to first [log in to Huggingface](https://huggingface.co/docs/huggingface_hub/main/package_reference/authentication) to gain access to the gated models (especially Llama models).  
+
+### Running Locally 
+This code is not yet available via PyPi (I am hesitant to release it without thoroughly testing the code). Thus, to try it locally, please run
+```shell 
+git clone https://github.com/johnsutor/llama-jarvis
+cd llama-jarvis 
+pip install -e . 
+```
+
+### Phase One Loss
+The example code will return the phase one loss (i.e., when training the first phase of Llama-Omni) 
+```py 
+from llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor
+
+BASE_LLM = "meta-llama/Llama-3.2-1B"
+SEAMLESS_MODEL = "facebook/hf-seamless-m4t-medium"
+LANGUAGE = "eng"
+
+jarvis_config = JarvisConfig(
+    BASE_LLM,
+    SEAMLESS_MODEL
+)
+jarvis_model = JarvisModel(jarvis_config)
+jarvis_processor = JarvisProcessor(
+    BASE_LLM,
+    SEAMLESS_MODEL
+)
+
+inputs = processor(
+    instruction=["You are a language model who should respond to my speech"],
+    text=["What is two plus two?"],
+    label=["Two plus two is four"],
+    src_lang=LANGUAGE,
+    return_tensors="pt",
+    padding=True
+)
+
+outputs = model.forward(
+    **inputs,
+    tgt_lang=LANGUAGE
+)
+
+print(output.loss)
+```
+
+### Phase One Two
+The example code will return the phase two loss (i.e., when training the second phase of Llama-Omni) 
+```py 
+from llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor
+
+BASE_LLM = "meta-llama/Llama-3.2-1B"
+SEAMLESS_MODEL = "facebook/hf-seamless-m4t-medium"
+LANGUAGE = "eng"
+
+jarvis_config = JarvisConfig(
+    BASE_LLM,
+    SEAMLESS_MODEL
+)
+jarvis_model = JarvisModel(jarvis_config)
+jarvis_processor = JarvisProcessor(
+    BASE_LLM,
+    SEAMLESS_MODEL
+)
+
+inputs = processor(
+    instruction=["You are a language model who should respond to my speech"],
+    text=["What is two plus two?"],
+    label=["Two plus two is four"],
+    src_lang=LANGUAGE,
+    return_tensors="pt",
+    padding=True
+)
+
+outputs = model.forward(
+    **inputs,
+    tgt_lang=LANGUAGE,
+    train_phase=2
+)
+
+print(output.loss)
+```
+
+## Roadmap
+- [ ] Train a baseline model using Llama 3.2 1B and Seamless Medium
+- [ ] Provide training example code 
+- [ ] Fully document the code 
+- [ ] Create an inference script for the model
+- [ ] Write thorough tests for the code, and test with a multitude of open-source models 
+- [ ] Release the code on PyPi 
diff --git a/assets/llama.png b/assets/llama.png