diff --git a/.github/workflows/style.yaml b/.github/workflows/style.yaml new file mode 100644 index 0000000..f67caaf --- /dev/null +++ b/.github/workflows/style.yaml @@ -0,0 +1,25 @@ +name: Lint + +on: + push: + branches: [ main ] + pull_request: + branches: [ main ] + +jobs: + lint: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v3 + - name: Set up Python + uses: actions/setup-python@v4 + with: + python-version: '3.x' + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install ruff + - name: Run Ruff + run: ruff check . + - name: Run Ruff Format + run: ruff format . --check \ No newline at end of file diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml new file mode 100644 index 0000000..e06868d --- /dev/null +++ b/.github/workflows/test.yaml @@ -0,0 +1,21 @@ +name: Test + +on: + push: + branches: [ main ] + pull_request: + branches: [ main ] + +jobs: + lint: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v3 + - name: Set up Python + uses: actions/setup-python@v4 + with: + python-version: '3.x' + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install pytest \ No newline at end of file diff --git a/README.md b/README.md index f63e107..672bf9b 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,103 @@ # Llama-Jarvis +![Lint Status](https://github.com/johnsutor/llama-jarvis/workflows/Lint/badge.svg) +![Tests Status](https://github.com/johnsutor/llama-jarvis/workflows/Test/badge.svg) +![contributions welcome](https://img.shields.io/badge/contributions-welcome-blue.svg?style=flat) + +![alt text](image.png) Train a speech-to-speech model using your own language model. Currently based on the [Seamless Model](https://huggingface.co/collections/facebook/seamless-communication-6568d486ef451c6ba62c7724), but plan to support more models in the future. -This model is based on speech-to-speech models such as [Llama-Omni](https://github.com/ictnlp/LLaMA-Omni). +This model is based on speech-to-speech models such as [Llama-Omni](https://github.com/ictnlp/LLaMA-Omni). However, it aims to take advantage of the joint speech-text embeddings of the Seamless Model. + +This code is very much a work in progress. Any and all contributions are welcome! + +## Examples +**NOTE** For some of the below, you may have to first [log in to Huggingface](https://huggingface.co/docs/huggingface_hub/main/package_reference/authentication) to gain access to the gated models (especially Llama models). + +### Running Locally +This code is not yet available via PyPi (I am hesitant to release it without thoroughly testing the code). Thus, to try it locally, please run +```shell +git clone https://github.com/johnsutor/llama-jarvis +cd llama-jarvis +pip install -e . +``` + +### Phase One Loss +The example code will return the phase one loss (i.e., when training the first phase of Llama-Omni) +```py +from llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor + +BASE_LLM = "meta-llama/Llama-3.2-1B" +SEAMLESS_MODEL = "facebook/hf-seamless-m4t-medium" +LANGUAGE = "eng" + +jarvis_config = JarvisConfig( + BASE_LLM, + SEAMLESS_MODEL +) +jarvis_model = JarvisModel(jarvis_config) +jarvis_processor = JarvisProcessor( + BASE_LLM, + SEAMLESS_MODEL +) + +inputs = processor( + instruction=["You are a language model who should respond to my speech"], + text=["What is two plus two?"], + label=["Two plus two is four"], + src_lang=LANGUAGE, + return_tensors="pt", + padding=True +) + +outputs = model.forward( + **inputs, + tgt_lang=LANGUAGE +) + +print(output.loss) +``` + +### Phase One Two +The example code will return the phase two loss (i.e., when training the second phase of Llama-Omni) +```py +from llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor + +BASE_LLM = "meta-llama/Llama-3.2-1B" +SEAMLESS_MODEL = "facebook/hf-seamless-m4t-medium" +LANGUAGE = "eng" + +jarvis_config = JarvisConfig( + BASE_LLM, + SEAMLESS_MODEL +) +jarvis_model = JarvisModel(jarvis_config) +jarvis_processor = JarvisProcessor( + BASE_LLM, + SEAMLESS_MODEL +) + +inputs = processor( + instruction=["You are a language model who should respond to my speech"], + text=["What is two plus two?"], + label=["Two plus two is four"], + src_lang=LANGUAGE, + return_tensors="pt", + padding=True +) + +outputs = model.forward( + **inputs, + tgt_lang=LANGUAGE, + train_phase=2 +) + +print(output.loss) +``` + +## Roadmap +- [ ] Train a baseline model using Llama 3.2 1B and Seamless Medium +- [ ] Provide training example code +- [ ] Fully document the code +- [ ] Create an inference script for the model +- [ ] Write thorough tests for the code, and test with a multitude of open-source models +- [ ] Release the code on PyPi \ No newline at end of file diff --git a/assets/llama.png b/assets/llama.png new file mode 100644 index 0000000..e92bb26 Binary files /dev/null and b/assets/llama.png differ