Skip to content

Latest commit

 

History

History
46 lines (29 loc) · 1.42 KB

usage_instructions.md

File metadata and controls

46 lines (29 loc) · 1.42 KB

How to Use

Step 1: Install python requirements

Before running the following Python scripts, run this command to install the necessary Python packages:

pip install -r requirements.txt

Step 2: Download and Tokenize a Dataset

Use the tinyshakespeare dataset for a quick setup. This dataset is the fastest to download and tokenize. Run the following command to download and prepare the dataset:

python prepro_tinyshakespeare.py

(all Python scripts in this repo are from Andrej Karpathy's llm.c repository.)

Alternatively, download and tokenize the larger TinyStories dataset with the following command:

python prepro_tinystory.py

Step 3: Download the weights

Next download the GPT-2 weights and save them as a checkpoint we can load in Mojo with following command:

python train_gpt2.py

Step 4: Train the Model

Ensure that the Magic command line tool is installed by following the Modular Docs.

Train your model by running:

magic shell
mojo train_gpt2.mojo

This command initiates the training process using the prepared data. When you execute the magic command for the first time, it will automatically install all necessary dependencies.