Skip to content

Commit

Permalink
feat(readme): Update README with examples and roadmap
Browse files Browse the repository at this point in the history
feat(github): Add workflows for testing and linting
  • Loading branch information
johnsutor committed Oct 6, 2024
1 parent b28e76b commit d15328a
Show file tree
Hide file tree
Showing 4 changed files with 146 additions and 1 deletion.
25 changes: 25 additions & 0 deletions .github/workflows/style.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Lint

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install ruff
- name: Run Ruff
run: ruff check .
- name: Run Ruff Format
run: ruff format . --check
21 changes: 21 additions & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: Test

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytest
101 changes: 100 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,103 @@
# Llama-Jarvis
![Lint Status](https://github.com/johnsutor/llama-jarvis/workflows/Lint/badge.svg)
![Tests Status](https://github.com/johnsutor/llama-jarvis/workflows/Test/badge.svg)
![contributions welcome](https://img.shields.io/badge/contributions-welcome-blue.svg?style=flat)

![alt text](image.png)
Train a speech-to-speech model using your own language model. Currently based on the [Seamless Model](https://huggingface.co/collections/facebook/seamless-communication-6568d486ef451c6ba62c7724), but plan to support more models in the future.

This model is based on speech-to-speech models such as [Llama-Omni](https://github.com/ictnlp/LLaMA-Omni).
This model is based on speech-to-speech models such as [Llama-Omni](https://github.com/ictnlp/LLaMA-Omni). However, it aims to take advantage of the joint speech-text embeddings of the Seamless Model.

This code is very much a work in progress. Any and all contributions are welcome!

## Examples
**NOTE** For some of the below, you may have to first [log in to Huggingface](https://huggingface.co/docs/huggingface_hub/main/package_reference/authentication) to gain access to the gated models (especially Llama models).

### Running Locally
This code is not yet available via PyPi (I am hesitant to release it without thoroughly testing the code). Thus, to try it locally, please run
```shell
git clone https://github.com/johnsutor/llama-jarvis
cd llama-jarvis
pip install -e .
```

### Phase One Loss
The example code will return the phase one loss (i.e., when training the first phase of Llama-Omni)
```py
from llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor

BASE_LLM = "meta-llama/Llama-3.2-1B"
SEAMLESS_MODEL = "facebook/hf-seamless-m4t-medium"
LANGUAGE = "eng"

jarvis_config = JarvisConfig(
BASE_LLM,
SEAMLESS_MODEL
)
jarvis_model = JarvisModel(jarvis_config)
jarvis_processor = JarvisProcessor(
BASE_LLM,
SEAMLESS_MODEL
)

inputs = processor(
instruction=["You are a language model who should respond to my speech"],
text=["What is two plus two?"],
label=["Two plus two is four"],
src_lang=LANGUAGE,
return_tensors="pt",
padding=True
)

outputs = model.forward(
**inputs,
tgt_lang=LANGUAGE
)

print(output.loss)
```

### Phase One Two
The example code will return the phase two loss (i.e., when training the second phase of Llama-Omni)
```py
from llama_jarvis.model import JarvisModel, JarvisConfig, JarvisProcessor

BASE_LLM = "meta-llama/Llama-3.2-1B"
SEAMLESS_MODEL = "facebook/hf-seamless-m4t-medium"
LANGUAGE = "eng"

jarvis_config = JarvisConfig(
BASE_LLM,
SEAMLESS_MODEL
)
jarvis_model = JarvisModel(jarvis_config)
jarvis_processor = JarvisProcessor(
BASE_LLM,
SEAMLESS_MODEL
)

inputs = processor(
instruction=["You are a language model who should respond to my speech"],
text=["What is two plus two?"],
label=["Two plus two is four"],
src_lang=LANGUAGE,
return_tensors="pt",
padding=True
)

outputs = model.forward(
**inputs,
tgt_lang=LANGUAGE,
train_phase=2
)

print(output.loss)
```

## Roadmap
- [ ] Train a baseline model using Llama 3.2 1B and Seamless Medium
- [ ] Provide training example code
- [ ] Fully document the code
- [ ] Create an inference script for the model
- [ ] Write thorough tests for the code, and test with a multitude of open-source models
- [ ] Release the code on PyPi
Binary file added assets/llama.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d15328a

Please sign in to comment.