Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

French translation (po files) #6469

Merged
merged 10 commits into from
Sep 21, 2024
Merged

French translation (po files) #6469

merged 10 commits into from
Sep 21, 2024

Conversation

phgrosjean
Copy link
Contributor

Here are the fr.po and R-fr.pofor {data.table}.

@MichaelChirico MichaelChirico added the translation issues/PRs related to message translation projects label Sep 5, 2024
@MichaelChirico
Copy link
Member

There we go, finally :)

Please copy over the compiled .mo files to this branch as well

@MichaelChirico
Copy link
Member

MichaelChirico commented Sep 18, 2024

@philippechataignon PTAL at the code-quality CI miss here -- is it based on using a different .pot file to generate your .po? No

It looks like there are indeed extra messages in your .mo file:

msgunfmt -o xx.po inst/po/fr/LC_MESSAGES/data.table.mo
grep -Fr "but the input only has" xx.po
# msgid "skip=%llu but the input only has %llu line%s"
# msgid "skip=%lu but the input only has %lu line%s"
# msgid "skip=%<PRIu64> but the input only has %<PRIu64> line%s"

grep -Fr "but the input only has" src/*fr.po
# msgid "skip=%<PRIu64> but the input only has %<PRIu64> line%s"

Any idea where those extras came from? I can just re-generate the .mo file, but don't necessarily want to lose any work if that's meaningful, WDYT?

@aitap
Copy link
Contributor

aitap commented Sep 19, 2024

# msgid "skip=%llu but the input only has %llu line%s"
# msgid "skip=%lu but the input only has %lu line%s"
# msgid "skip=%<PRIu64> but the input only has %<PRIu64> line%s"

I think msgfmt generates these in order to make PRI<x>64 work in a portable manner (i.e. compatible with non-GNU gettext, because the glibc function will expand them by itself). They come up with all possible printf specifiers that could be substituted in place of PRI<x>64 and generate a translation for each:

The PO file will contain the string "The amount is %0<PRId64>\n". The translators will provide a translation containing "%0<PRId64>" as well, and at runtime the gettext function’s result will contain the appropriate constant string, "d" or "ld" or "lld".

https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-tools/src/write-mo.c;h=73e6a9ea761b34bdff59d9e949ed1410e855c020;hb=HEAD#l158

https://sourceware.org/git/?p=glibc.git;a=blob;f=intl/loadmsgcat.c;h=c4747bf67749a20648aa7c09ad5e3a0af71c325a;hb=HEAD#l931

@MichaelChirico
Copy link
Member

@aitap interesting/nice spot... but why has that only happened for this particular .mo file?

@aitap
Copy link
Contributor

aitap commented Sep 19, 2024

It's a relatively new feature (June 2023). An msgfmt that doesn't have the --no-redundancy flag will not be expanding the specifiers.

@MichaelChirico
Copy link
Member

thanks! asking since I'm afk -- is there a corresponding option for msgunfmt?

I'm thinking of how to adjust our CI check for .mo-match-.po to account for the new feature:

  1. only require "all translated messages in .po are found in .mo"
  2. use msgfmt and compare the resulting .mo (instead of currently we do the reverse)
  3. use a msgunfmt option to better match the files
  4. ...

@philippechataignon
Copy link
Contributor

Hi. I did not manually edit .po or .mo files, you can overwrite them. aitap comments are interesting and printf is not obvious in fwrite because some variables types came from zlib.

@aitap
Copy link
Contributor

aitap commented Sep 19, 2024

@MichaelChirico looks like (3) is not an option. On libera.chat's #gnu I was recommended (2) as the simplest option, but to keep track of msgfmt's options --alignment, --endianness, --no-hash, which may result in semantivally equivalent (and equally portable) but different binaries. Unless someone goes out of their way to translate on a big-endian machine (or we get another update to deal with non-GNU gettext functions), it looks like the *.mo binaries should be stable. --no-hash seems to save 173 kB in total for es, fr, pt_BR, zh_CN, but I didn't try to measure any slowdowns from binary search vs hash lookup.

@MichaelChirico
Copy link
Member

Thanks for tracking down all the nuance here... maybe option 4 is "translators don't contribute .mo at all, leave that to data.table maintainers" is the right approach. We could also look into having GHA do that automatically.

@MichaelChirico
Copy link
Member

(also just realized I tagged @philippechataignon by mistake, very sorry for the stray ping!! I meant @phgrosjean 😬)

@phgrosjean
Copy link
Contributor Author

@MichaelChirico translators don't contribute .mo files is for me the best. My .mo file was done using poEdit 3.5 (latest version at that time). I guess that .mo files may slightly differ, depending on the software used to build it.

If it is OK, I will eliminate the .mo files from the pull request.

@MichaelChirico
Copy link
Member

CI currently requires the .mo files -- leave them in for now. But you don't need to worry about fixing the issue yourself.

@MichaelChirico
Copy link
Member

Thanks @phgrosjean!! I'll deal with getting the GHA for ensuring .po matches .mo separately.

@MichaelChirico MichaelChirico merged commit f4a3d92 into master Sep 21, 2024
7 checks passed
@MichaelChirico MichaelChirico deleted the french_translation_po branch September 21, 2024 15:36
@aitap
Copy link
Contributor

aitap commented Sep 24, 2024

This pull request may have needed a CODEOWNERS change:

data.table/CODEOWNERS

Lines 42 to 50 in 745160d

# translations
/inst/po/ @michaelchirico
/po/ @michaelchirico
/R/translation.R @michaelchirico
/src/po.h @michaelchirico
/po/*.pot @Rdatatable/translators
/po/*zh_CN.po @Rdatatable/chinese
/po/*pt_BR.po @Rdatatable/brazil
/po/*es.po @Rdatatable/spanish

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
translation issues/PRs related to message translation projects
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants