Skip to content

Commit

Permalink
updated converting HTML entities to 'normal' UTF-8 in bash.md
Browse files Browse the repository at this point in the history
  • Loading branch information
alifeee committed Feb 10, 2025
1 parent aa7bed3 commit 1529be2
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions notes/converting HTML entities to 'normal' UTF-8 in bash.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,18 @@ cat file.txt | decodeHTML
```

instead of the massive `php -r 'while ($f = fgets(STDIN)){ echo html_entity_decode($f); }'`.

## Python

(2025-02-10 edit) I have also found a nice way to do this using the Python html library's `escape` and `unescape` (because that's what I had installed in [my workflow](https://github.com/alifeee/alifeee.github.io/tree/main/.github/workflows) and I couldn't be bothered to install PHP)
```bash
$ cat file.txt | python3 -c 'import sys;from html import unescape;print(unescape(sys.stdin.read()),end="")'
Children's event,
Wildlife & Nature,
peddler-market-nº-88,
Artists’ Circle,
surface – Breaking
woodland walk. (nbsp)
Justin Adams & Mauro Durante
```

0 comments on commit 1529be2

Please sign in to comment.