doc: add README with simple usage example

shahruk10 · Jul 13, 2022 · 0356e89 · 0356e89
1 parent 8d1a49f
commit 0356e89
Showing 1 changed file with 137 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,137 @@
+# Go Wrapper for SCTK
+
+[![CI](https://github.com/shahruk10/go-sctk/actions/workflows/ci.yml/badge.svg)](https://github.com/shahruk10/go-sctk/actions/workflows/ci.yml)
+
+- [SCTK](https://github.com/usnistgov/SCTK) is a toolkit made by NIST that can be used for evaluating the output of automatic speech recognition systems (ASR). It can be used to:
+
+  - Calculate Word Error Rate (WER) and Character Error Rate (CER)
+
+  - Analyze different types of errors made by ASR systems: Substitutions, Insertions and Deletions.
+
+  - Generate alignments between multiple sets of transcripts (reference and hypotheses)
+
+  - Use statistical tests to evaluate the significance in performance delta between ASR systems.
+
+- This repository offers a single binary with a command line interface (CLI) that wraps around the all different tools in SCTK; the CLI has a simple and easy to use interface.
+
+- This repo is a work-in-progress.
+
+---
+
+## Usage Examples
+
+### Evaluating CER
+
+- This example evaluates the character error rate (CER) between reference transcripts and hypothesis transcript generated by a ASR system. The example uses Bengali text but SCTK supports most languages since it expects text with UTF-8 encoding.
+
+```sh
+# Creating dummy reference transcript file in CSV format.
+cat << EOF > reference.csv
+utterance_id,transcript
+spk01-utt01,এর মূল্য বার্ষিক দশ লক্ষ ইউরো।
+spk02-utt02,খেলাটি চার টেস্ট সিরিজের চূড়ান্ত ছিল।
+EOF
+
+# Creating dummy hypothesis transcript from an ASR system, in CSV format.
+cat << EOF > hypothesis.csv
+utterance_id,transcript
+spk01-utt01,এর মূল্য বার্ দশ লক ইউর।
+spk02-utt02,খেলা ছার টেস্ট শিরিজের চূড়ান্ত ছিল।
+EOF
+
+# Getting the sctk CLI tool from this repository and giving it executable permissions.
+version=v0.1.0
+wget -O sctk https://github.com/shahruk10/go-sctk/releases/download/${version}/sctk
+chmod +x sctk
+
+# Using sctk CLI to evaluate CER and check errors.
+#
+# Setting `--ignore-first=true` to ignore header row.
+# Check `sctk score --help` for documentation of each argument.
+sctk score \
+  --ignore-first=true \
+  --delimiter="," \
+  --col-id=0 \
+  --col-trn=1 \
+  --case-sensitive=false \
+  --normalize-unicode=true \
+  --cer=true \
+  --out=./cer \
+  --ref=reference.csv \
+  --hyp=hypothesis.csv
+```
+
+- Now we can check generated reports in the `./cer` directory.
+
+```
+  cer/
+  ├── hyp1.trn
+  ├── hyp1.trn.dtl
+  ├── hyp1.trn.pra
+  ├── hyp1.trn.raw
+  ├── hyp1.trn.sgml
+  ├── hyp1.trn.sys
+  └── ref.trn
+```
+
+- The `*.sys` file contains a table showing a breakdown of the different types of errors.
+
+  - The results are aggregated for each speaker; `Corr`, `Sub`, `Del` and `Ins` stands for
+    the percentage of characters that were correctly decoded, substituted, deleted and inserted
+    in the hypothesis respectively.
+
+  ```
+                   SYSTEM SUMMARY PERCENTAGES by SPEAKER                      
+
+     ,----------------------------------------------------------------.
+     |                              hyp1                              |
+     |----------------------------------------------------------------|
+     | SPKR   | # Snt # Chr | Corr    Sub    Del    Ins    Err  S.Err |
+     |--------+-------------+-----------------------------------------|
+     | spk01  |    1     25 | 76.0    0.0   24.0    0.0   24.0  100.0 |
+     |--------+-------------+-----------------------------------------|
+     | spk02  |    1     33 | 87.9    6.1    6.1    0.0   12.1  100.0 |
+     |================================================================|
+     | Sum/Avg|    2     58 | 82.8    3.4   13.8    0.0   17.2  100.0 |
+     |================================================================|
+     |  Mean  |  1.0   29.0 | 81.9    3.0   15.0    0.0   18.1  100.0 |
+     |  S.D.  |  0.0    5.7 |  8.4    4.3   12.7    0.0    8.4    0.0 |
+     | Median |  1.0   29.0 | 81.9    3.0   15.0    0.0   18.1  100.0 |
+     `----------------------------------------------------------------'
+  ```
+
+- The `*.dtl` file shows further details of each type of error. This can reveal systematic
+  errors and patterns in how the ASR system is transcribing the audio.
+
+  ```
+  ... (other useful stuff)
+
+  CONFUSION PAIRS                  Total                 (2)
+                                   With >=  1 occurrences (2)
+     1:    1  ->  চ ==> ছ
+     2:    1  ->  স ==> শ
+       -------
+           2
+
+  ... (other useful stuff)
+
+  DELETIONS                        Total                 (6)
+                                   With >=  1 occurrences (6)
+     1:    2  ->  ষ
+     2:    2  ->  ি
+     3:    1  ->  ক
+     4:    1  ->  ট
+     5:    1  ->  ো
+     6:    1  ->  ্
+       -------
+           8
+  ```
+
+- The `*.pra` file shows alignments between the reference and hypothesis text, which
+  makes it easy to see errors in context.
+
+---
+
+## License
+
+[Apache License 2.0](LICENSE)