Skip to content

Normalize UTF8 diacritics

nvolk edited this page May 2, 2023 · 1 revision

Normalize UTF-8 diacritics from latin only texts:

latin only: precompose å, ä and ö. Remove all other diacritics.

non-latin scripts: use ye olde unicode-decomposition.js validator/fixer as we don't (do we?) want to decompose other scripts, which we don't understand well enough.