-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TOCs of a couple of discs that cause whipper to fail #183
Labels
Bug
Generic bug: can be used together with more specific labels
Needed: discussion
More discussion needed before anything can be done (or still no agreement has been reached)
On Hold
Waiting for other actions
Priority: high
High priority
Comments
I have one more with the following:
...leading to:
It's actually full of stuff like this, more or less supported:
(Language: chinese) |
Closed
JoeLametta
added
the
Bug
Generic bug: can be used together with more specific labels
label
Apr 6, 2018
JoeLametta
added
Priority: high
High priority
On Hold
Waiting for other actions
Needed: discussion
More discussion needed before anything can be done (or still no agreement has been reached)
labels
Nov 13, 2018
Related to #169. |
ntrrgc
added a commit
to ntrrgc/whipper
that referenced
this issue
Aug 27, 2024
This patch replaces the previous broken approach to TOC string decoding that used `.encode().decode('unicode_escape')` with proper parsing of the escape sequences cdrdao is known to generate. The new parser is also lenient with invalid escape sequences, that can occur due to improper escaping in cdrdao. See: cdrdao/cdrdao#32 Latin-1: This new parsing method should work for Latin-1 strings for both old and new versions of cdrdao, as long as those strings don't trigger the improper escaping issues in upstream cdrdao. This has been verified with the album Diorama from the Danish black metal band MØL. MS-JIS: This new parsing method should also work for MS-JIS strings as long as the .toc file was generated by cdrdao 1.2.5+ and the strings don't trigger improper escaping issues in upstream cdrdao. Unfortunately, I don't have any CD with CD-Text in MS-JIS, so I could not verify this. cdrdao versions before 1.2.5 will still cause whipper to produce mojibake (garbled characters) when reading MS-JIS CD-Text, as those versions do not encode strings in UTF-8. Other encodings: As far as I know, CD-Text only supports officially ASCII, Latin-1 and MS-JIS, but I wouldn't be surprised if there are unofficial encodings out there, given the strange strings I've seen in some bug reports. If you have a CD with garbled CD-Text, please submit a bug report indicating the performer, album name, language and attach the .toc file so that the produced strings can be compared to the expected text. Fixes whipper-team#169 Fixes whipper-team#183 Signed-off-by: Alicia Boya García <ntrrgc@gmail.com>
ntrrgc
added a commit
to ntrrgc/whipper
that referenced
this issue
Aug 27, 2024
This patch replaces the previous broken approach to TOC string decoding that used `.encode().decode('unicode_escape')` with proper parsing of the escape sequences cdrdao is known to generate. The new parser is also lenient with invalid escape sequences, that can occur due to improper escaping in cdrdao. See: cdrdao/cdrdao#32 Latin-1: This new parsing method should work for Latin-1 strings for both old and new versions of cdrdao, as long as those strings don't trigger the improper escaping issues in upstream cdrdao. This has been verified with the album Diorama from the Danish black metal band MØL. MS-JIS: This new parsing method should also work for MS-JIS strings as long as the .toc file was generated by cdrdao 1.2.5+ and the strings don't trigger improper escaping issues in upstream cdrdao. Unfortunately, I don't have any CD with CD-Text in MS-JIS, so I could not verify this. cdrdao versions before 1.2.5 will still cause whipper to produce mojibake (garbled characters) when reading MS-JIS CD-Text, as those versions do not encode strings in UTF-8. Other encodings: As far as I know, CD-Text only supports officially ASCII, Latin-1 and MS-JIS, but I wouldn't be surprised if there are unofficial encodings out there, given the strange strings I've seen in some bug reports. If you have a CD with garbled CD-Text, please submit a bug report indicating the performer, album name, language and attach the .toc file so that the produced strings can be compared to the expected text. Fixes whipper-team#169 Fixes whipper-team#183 Signed-off-by: Alicia Boya García <ntrrgc@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Bug
Generic bug: can be used together with more specific labels
Needed: discussion
More discussion needed before anything can be done (or still no agreement has been reached)
On Hold
Waiting for other actions
Priority: high
High priority
I've created a gist, https://gist.github.com/BonkaBonka/884166f337f7d31d9146df0b909031e2, with the TOC files from two discs I have that cause whipper to fail as mentioned in #174.
The text was updated successfully, but these errors were encountered: