This is a Python script to use Google's speech-to-text API to transcribe annotations in a specified tier of an Elan project.
-
Clone the repo &
cd
intoelan-asr/
. -
Make a python env and activate
python -m venv env
source env/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
(Optionally) Set an alias or add to PATH environment.
-
Use
Create an Elan project. Delimit speech on a given tier by creating annotations. In my experience annotations 30 seconds or longer return errors from the API, so limit annotations to single utterances. Run the script. Specify the Elan file with -e | --elan-file
or a list of Elan files with -E | --list-elan
and the language to be speech-recognized with -l | --language
. Specify a tier by name with -t | --tier
and / or an associated media file with -m | --media-index
(otherwise, the script will take the first media / tier it encounters in the Elan file).
Print language options using -L | --language-options
or if you're not sure of the order of linked media, print their indexes with -M | --media-indexes
.
Running the script generates a temporary folder in the same location as the Elan file operated on, which will contain (a) the sliced media (ffmpeg output: generated .wav files for each annotation), and (b) the full return from the ASR API (a json file with potential alternative text values and the confidence score for the highest ranked alternative) -- keep these temporary files with -k | --keep-tmp
.
usage: elan-asr.py
[-h] [-e ELAN_FILE] [-E LIST_ELAN] [-t TIER] [-l LANGUAGE] [-L] [-m MEDIA_INDEX]
[-M] [-k]
Automatically run asr on a specified tier of an elan project. This program will read the tier make copies of media segments corresponding to annotation values, send that fragment to an asr API, then populate the return text value in that tier / annotation. Requires FFMPEG on system PATH environment.
optional arguments:
-h, --help show this help message and exit
-e ELAN_FILE, --elan-file ELAN_FILE
Elan file (.eaf).
-E LIST_ELAN, --list-elan LIST_ELAN
List of Elan files (.eaf).
-t TIER, --tier TIER
Exact name (case sensitive) of tier to operate on (if tiername contains spaces, wrap arg in quotes). If unset, script will operate on the first/top-level tier.
-l LANGUAGE, --language LANGUAGE
Language to ASR (Use BCP-47 code).
-L, --language-options
Print ASR language options.
-m MEDIA_INDEX, --media-index MEDIA_INDEX
Select media file to work with. Use only in cases where there are multiple media files associated with the selected elan file. HINT: user `-M` to find media indexes.
-M, --media-indexes Print associated media indexes.
-k, --keep-tmp Don't delete temporary files generated by the script (txt files and sliced media files).
Aside from the python modules in the requirements file, ffmpeg must be installed on the users PATH environment.
CC BY
- So far, this only works on alignable tier types. If there's any demand, I can add other tier types.
The research leading to these results has received funding from the Norwegian Financial Mechanism 2014-2021: 2020/37/K/HS2/02779