The repository contains code and data for LLMs that translate China words to Taiwan words. The main technique is instruction fine-tuning.
Example:
- Input:
這個軟件的質量真高啊
- Output:
這個軟體的品質真高啊
😍😍 See the model card and play it 😍😍
-
Install Miniconda or Anaconda
-
Create a Conda environment:
tw_word
.
conda create --name tw_word python=3.10
- Activate the environment.
conda activate tw_word
- Install PyTorch related packages.
# GPU
pip install torch==2.2.0 torchvision==0.17.0 --index-url https://download.pytorch.org/whl/cu118
# or, CPU-only (This may be very slow)
pip install torch==2.2.0 torchvision==0.17.0
- Install required packages.
pip install -r requirements.txt
- (Optional) Setup your OpenAI API key if you want to use OpenAI related functions.
export OPENAI_API_KEY=${YOUR_OPENAI_API_KEY}
To run the translation powered by Llama translator, just typing following command on your terminal:
python inf.py "這個軟件的質量真高啊" llama --model "feabries/TaiwanWordTranslator-v0.1"
For OpenAI translator:
python inf.py "這個軟件的質量真高啊" openai
To run the testing set evaluation for llama translator:
python eval.py llama --model "feabries/TaiwanWordTranslator-v0.1"
For OpenAI translator:
python eval.py openai
To run llama model training on training set:
python train.py
Current dataset is collected from MBZUAI/Bactrian-X and automatically labeled by 繁化姬.