Skip to content

Transcribe your elan projects with speech-to-text

Notifications You must be signed in to change notification settings

BobBorges/elan-asr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

ELAN-ASR

This is a Python script to use Google's speech-to-text API to transcribe annotations in a specified tier of an Elan project.

Installation

  1. Clone the repo & cd into elan-asr/.

  2. Make a python env and activate

    python -m venv env

    source env/bin/activate

  3. Install dependencies

    pip install -r requirements.txt

  4. (Optionally) Set an alias or add to PATH environment.

  5. Use

Usage

Create an Elan project. Delimit speech on a given tier by creating annotations. In my experience annotations 30 seconds or longer return errors from the API, so limit annotations to single utterances. Run the script. Specify the Elan file with -e | --elan-file or a list of Elan files with -E | --list-elan and the language to be speech-recognized with -l | --language. Specify a tier by name with -t | --tier and / or an associated media file with -m | --media-index (otherwise, the script will take the first media / tier it encounters in the Elan file).

Print language options using -L | --language-options or if you're not sure of the order of linked media, print their indexes with -M | --media-indexes.

Running the script generates a temporary folder in the same location as the Elan file operated on, which will contain (a) the sliced media (ffmpeg output: generated .wav files for each annotation), and (b) the full return from the ASR API (a json file with potential alternative text values and the confidence score for the highest ranked alternative) -- keep these temporary files with -k | --keep-tmp.

usage: elan-asr.py 

[-h] [-e ELAN_FILE] [-E LIST_ELAN] [-t TIER] [-l LANGUAGE] [-L] [-m MEDIA_INDEX]

[-M] [-k]

Automatically run asr on a specified tier of an elan project. This program will read the tier make copies of media segments corresponding to annotation values, send that fragment to an asr API, then populate the return text value in that tier / annotation. Requires FFMPEG on system PATH environment.

optional arguments:

-h, --help            show this help message and exit
-e ELAN_FILE, --elan-file ELAN_FILE
			Elan file (.eaf).
-E LIST_ELAN, --list-elan LIST_ELAN
			List of Elan files (.eaf).
-t TIER, --tier TIER  
  			Exact name (case sensitive) of tier to operate on (if tiername contains spaces, wrap arg in quotes). If unset, script will operate on the first/top-level tier.
-l LANGUAGE, --language LANGUAGE
			Language to ASR (Use BCP-47 code).
-L, --language-options
  			Print ASR language options.
-m MEDIA_INDEX, --media-index MEDIA_INDEX
			Select media file to work with. Use only in cases where there are multiple media files associated with the selected elan file. HINT: user `-M` to find media indexes.
-M, --media-indexes   Print associated media indexes.
-k, --keep-tmp        Don't delete temporary files generated by the script (txt files and sliced media files).

Requirements

Aside from the python modules in the requirements file, ffmpeg must be installed on the users PATH environment.

Licence

CC BY

Caveats

  1. So far, this only works on alignable tier types. If there's any demand, I can add other tier types.

Funding acknowledgement

The research leading to these results has received funding from the Norwegian Financial Mechanism 2014-2021: 2020/37/K/HS2/02779

About

Transcribe your elan projects with speech-to-text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages