A Rust implementation for experimenting with the Whisper speech recognition model using the Candle framework.
- Mac with GPU and Metal
- Huggingface CLI
- MP3 to WAV converter, e.g. FFMpeg
- Clone the repository:
git clone https://github.com/corybuecker/whisper-experiments.git
cd whisper-experiments
- Install the Huggingface CLI if you haven't already.
brew install huggingface-cli
huggingface-cli login
- Run the model download script:
./download_models.sh
- Create the Mel spectrogram filter bank with Python:
pip install librosa
import numpy as np
from librosa import filters
filters.mel(sr=16000, n_fft=400, n_mels=128, dtype=np.float32).flatten().tofile("melfilter.bytes")
- Convert your MP3 file to the required format:
- Mono, single channel
- 16-bit depth
- 16kHz sample rate
- WAV format
ffmpeg -i input.mp3 -ac 1 -acodec pcm_s16le -ar 16000 -map_metadata -1 output.wav
- Run the project:
cargo run --release --features metal -- --file output.wav
Once the transcription process completes, it will write a JSON file with the full text and each parsed segment with its timestamps.
This project is licensed under the MIT License - see the LICENSE file for details.
- Candle framework and examples
- Huggingface