Leverage Fluent and Unicode CLDR for localization #675

alerque · 2019-10-12T16:50:19Z

Early work in relation to #665.

alerque · 2019-10-12T19:42:48Z

This won't start passing CI until I do another release of fluent-lua because wrote this against my git HEAD version and tweaked a couple things along the way to make the downstream implementation easier. But the gist of this works already.

en.ftl:

hello = Hello { $name }!

tr.ftl

hello = Merhaba { $name }!

fluent.sil:

\language[main=en]
\fluent[name=World]{hello}
\language[main=tr]
\fluent[name=Dünya]{hello}

fluent.pdf

Noticeably missing at this stage are locale fallbacks for missing messages, but that's coming.

alerque · 2022-03-11T00:01:52Z

@Omikhleia / @OlivierNicole / @simoncozens

Have any of you by chanced taken a peak at this? I know it's not passing tests yet, the issue with that is what to do when inserting a translation that doesn't exist. The addition of an empty node in the current scheme is changing some tests, but I‌ can fix that.

What I'm really interested in is whether the overall usage / API looks ergonomic as a SILE user or even makes a bean of sense to somebody who hasn't been mucking around waste-deep in localization files for the better part of a couple years.

I have some more SILE stuff coming up that could really benefit to having this merged and I'm interested in finalizing the direction soon.

Omikhleia · 2022-03-11T01:36:25Z

I haven't looked in detail yet, I have to admit.

For the interface of SILE itself (error messages, etc.), it could make some sense, though it's certainly a lot of work to do properly (and not necessarily easy to maintain when updating/fixing messages).

For strings output to the rendered document, I have to admit that while I admire the effort, I am not much convinced of the general usefulness. I use to say I do not believe much in localized strings in these cases... The only cases currently with the default class and packages (if I am not mistaken) mostly concern the TOC header and the chaptering.

Regarding the former, if I may quote what I wrote in own revisited TOC package, is that

such a package should only do one thing well: typesetting the table of contents, period. Any title (if one is even desired) should be left to the sole decision of the user, e.g. explicitly defined with a \chapter command or any other appropriate sectioning command, and with whatever additional content one may want in between. Even if LaTeX has a default title for the table of contents, there is no strong reason to do the same. It cannot be general: One could want “Table of Contents”, “Contents”, “Summary”, “Topics”, etc. depending of the type of book. It feels wrong and cumbersome to always get a default title and have to override it, while it is so simple to just add a consistently-styled section above the table…
As for chapters, parts, tables and figures (which I have in my book class)... My line of thinking is that

I have no idea what a “part” would be in your book. “Part I”, are you sure? What if you typeset, say, The Lord of the Rings, wouldn’t that rather be “Book I”?

And a chapter could as well be a "Canto" if typesetting Dante, etc. Likewise, say, for "Figure". Could as well be "Illustration" in some book context, etc.

In other terms, I somewhat feel I will always have to redefine these things in a specific book context, defeating the purpose of a general translation file... And I never liked having to dabble into Babel's files in LaTeX do know/guess what to override.

(As an aside note regarding the more general i18n logic in pattern-based generic tools such as Fluent, I am not sure it addresses in an easy way many language-specific features. Take for instance elision and contraction rules for articles that many Latin-derived language have, and which may depend on grammatical gender or etymology... The rules to get "The {season}, the {tool}" (where season and tool would be the replaceable strings) are quite unpractical. -- e.g. in French, winter = l'hiver (elided), axe = la hache (not elided)... Something one cannot get right without human intervention, or a good bit of NLP if one wants to automate it. Of course, we are hardly in such a case in our context - It's just to say my opinion on such tools is somewhat mitigated.)

alerque · 2022-03-11T07:08:09Z

@Omikhleia Well good morning wet blanket! (Don't worry I'm not offended, I asked for feedback.)

I get some of your points, but I guess I have different needs. I suppose I should rephrase my question to ask whether you think this is actively worse than the current ad-hock localization using redefining SILE commands that output content. It brings 2 more Lua dependencies into the mix, that's one downside — but I don't think it's a blocker.

On the other hand I do think its easier to use than what we have now. Lets say you want to override the default TOC title from Lua:

-- old way
SILE.registerCommand("tableofcontents:title", function (_, _)
  SILE.process({ "Konu Haritası" }
end)
SILE.call("tableofcontents")

-- new way
SILE.fluent:add_messages("toc-title = Konu Haritası")
SILE.call("tableofcontents")

Of course Lua is verbose, but it isn't much different from SILE syntax:

% old way
\define[command=tableofcontents:title]{Konu Haritası}
\tableofcontents

% new way
\ftl{toc-title = Konu Haritası}
\tableofcontents

Yup that's it.

For the interface of SILE itself (error messages, etc.), it could make some sense, though it's certainly a lot of work to do properly (and not necessarily easy to maintain when updating/fixing messages).

True story. Having this tooling would certainly enable one to localize the SILE interface itself, but that wasn't my goal. I wouldn't object if somebody wants to contribute localizations, but the upkeep on those would be up to the user base. I would note that we've had some extraordinary efforts in this direction already with a full Japanese translation of the manual, but again this wasn't my main objective.

For strings output to the rendered document, I have to admit that while I admire the effort, I am not much convinced of the general usefulness. I use to say I do not believe much in localized strings in these cases.

I totally see your point here and would be happy to make SILE easy to use as a platform in this mode, but I don't think we're ever going to get away from some people wanting ready to use templates including things like TOC headers. Again the question is is this way actively worse than what we have? And is providing tooling that other packages could use for their own use cases a bad thing rather than making them implement their own content functions or Lua routines?

And a chapter could as well be a "Canto" if typesetting Dante, etc. Likewise, say, for "Figure". Could as well be "Illustration" in some book context, etc.

Sure. So is providing an easy way to use a templates that includes strings (as opposed to always rolling your own) while providing your own strings a bad thing? My books don't just have a TOC, they also have a TOV (Table of Verses) and a TOR (Table of References) and a TOM (Table of Maps) and a BALROG and so forth and so on. Being able to define all the strings in one place rather than having to have them inline with the content (which is all auto generated and can change order, etc.) is a lot easier for me.

\begin{ftl}
toc-title = Outline of Topics
tom-title = Maps of the Lands
toi-title = Illustrations and Illuminations
\end{ftl}

In other terms, I somewhat feel I will always have to redefine these things in a specific book context, defeating the purpose of a general translation file..

I don't think having a "general purpose translation file" is really the primary goal here. Sure it works for a few strings we may want to support across a bunch of languages out of the box, but the real goal is making the tooling available for more specialized use.

Lets talk about a real use case: one of the projects I'm working on with SILE is typesetting some workbook material in a whole host of languages. There are a lot of boiler plate strings, often longer than a couple of words. Managing the content and the typesetting template separately is the only way that makes sense, having the template strings appear over and over in the content would be hugely repetitive and make managing the actual content harder. Of course I could write a library of commands with localized outputs for each language, but this tooling allows me to write 1 template, then write one localization file for each language, then typeset the content in any language.

Also just in case this was missed, this currently has to be in SILE core not a package because enabling raw pass-through commands in the parser (for parsing FTL content in SILE files) not yet supported from a package.

Take for instance elision and contraction rules for articles that many Latin-derived language have, and which may depend on grammatical gender or etymology... The rules to get "The {season}, the {tool}" (where season and tool would be the replaceable strings) are quite unpractical.

I'm fully aware that localization systems are hard and some languages are much harder to do well than others. That being said I have used a lot of systems and Fluent is far and away the most versatile with enough features to make complex replacements possible in natural contexts. Replacements are not just variable expansions but can include meta data about gender or other grammatical forms so that localizers can cover all the possible scenarios. There are a couple of examples on the Project Fluent site, but those are just the tip of the iceberg. Handily the code end doesn't need to understand a given complexity, it is really up to the translator to provide the localization. This helps side step issues like programmers assuming 1 is singular and 2+ is plural where a language might actually have a few/many break at 5 instead. The localization defines this, not the code, so you don't get stuck as often as a translator is some situation where you can't make a natural translation because some coder didn't think about your language's special issues.

Again, do you think it makes SILE worse for providing an advanced (but in my view easy to use) way to localize strings? I'm not saying we have to bake a lot of strings in and we can make the TOC title output optional (for example), but if we do output it do you have an objection to the commands shown above for doing it?

Omikhleia · 2022-03-11T08:36:34Z

Well good morning wet blanket! (Don't worry I'm not offended, I asked for feedback.)

I feel apologetic, I had no intention to be offending at all :( - I just tried to expressed my concerns that there might not be a general solution to the problem.

Regarding the syntax,

% 1. "Old" command redefinition hook
\define[command=book:chapter:pre:ja]{第}
\define[command=book:chapter:post:ja]{章}
% 2. Fluent way (if I got it correctly !)
\ftl{book-chapter-title-pre = 第}
\ftl{book-chapter-title-post = 章}
%3. My "styling" approach, kind of.
% (eventually - the current progress is not at this point yet and low-level syntax is currently a tad more verbose than that...)
\style:extend[name=sectioning:chapter:number]{\numbering[before=第, after=章]}

All do seem more or less equivalent to me. (Well, I could advocate that styling also covers the font selection, which is done currently via another command, and all spacing, skips, alignments etc. with a loose inheritance system).

This said, I don't have strong objections either.

But you make good points too.

this currently has to be in SILE core not a package because enabling raw pass-through commands in the parser
(for parsing FTL content in SILE files) not yet supported from a package.

We might actually need more flexible ways for extending the parser, to be able to have package declare some hooks - true verbatim, inline scripts with syntax highlighting, inline SVG or any other text-based graphics (e.g. PlantUML and charts), etc. come to mind.

alerque · 2022-03-11T08:55:32Z

Well good morning wet blanket! (Don't worry I'm not offended, I asked for feedback.)

I feel apologetic, I had no intention to be offending at all :( - I just tried to expressed my concerns that there might not be a general solution to the problem.

No no you didn't offend at all, that was good feedback and I appreciate it. Sorry if my pre-coffee joke came off wrong.

(Well, I could advocate that styling also covers the font selection, which is done currently via another command, and all spacing, skips, alignments etc. with a loose inheritance system).

This is actually a coupling I'm trying to get away from. Not that it isn't good for some scenarios, but the mix of content and presentation style is bad new for my projects. I have different people typesetting vs. translating and having Lua commands that have both presentation styles like fonts and spacing and the related content strings is just a disaster. I can code my way out of it with Lua, but adding Fluent libraries to SILE core takes care of that abstraction for me.

We might actually need more flexible ways for extending the parser

True, but as of yet I haven't figured out how to do it with a one-pass LPEG parser. By the time we get to the package load commands in the document the parsing grammar is already defined and it is too late for a callback to change it. I haven't wanted to go down the multi-pass rabbit hole yet.

to be able to have package declare some hooks - true verbatim, inline scripts with syntax highlighting, inline SVG or any other text-based graphics (e.g. PlantUML and charts), etc. come to mind.

Yes, but at this point the only way we have to do that is with a whitelist of tags that are processed as raw content instead of SILE syntax. This is already how we deal with \script and \math, both of which allow full embedded languages.

PlanetUML (I can't find the issue, maybe this was in my head) and SVG (#1091) have come up before, as have MathJax (#220), Lilypond (#435), ImageMagick (#436), and TikZ (#437)

The only alternatives I found to that problem to allow packages to require package loading on the command line before loading the document or do to multi-pass parsing, both of which have huge downsides. See #1092.

core/inputs-texlike.lua

alerque · 2022-03-12T06:42:12Z

I completely agree the whitelist of passthrough commands is less than ideal, but I'm also willing to let it stand until such a time as we agree on a better replacement. As such I think I'm going to move ahead with this even knowing we might refactor it later if we change the parser mechanism.

alerque · 2022-04-16T14:14:41Z

Gah, the manual is a language disaster. With localization updates in place it becomes clear we're leaking language settings all over the place. Our code samples set all following content to und, chapter 6 leaves the typesetter in Turkish, chapter 8 leaves it in Japanese, etc.

…t into main settings

…ader

… warnings

…zation

alerque · 2022-04-16T23:19:44Z

In case I forget by the next time I pick this up, the current test failures are because of load paths. We have a lot of things stuffed in the package.path for loading lua, but we need to add a loader for FTL files so the path handling is similarly robust. The current code works when installed, or works uninstalled if configured with the install location as the SILE source directory, but not otherwise. Even our CI setup is a case of otherwise.

Review loadkit for how to add custom loaders for an extension using the same path handling...

alerque self-assigned this Oct 12, 2019

alerque added the enhancement Software improvement or feature request label Oct 12, 2019

alerque added this to the v0.11.0 milestone Oct 12, 2019

alerque force-pushed the fluent branch 2 times, most recently from 8f26b12 to cb02eaf Compare October 12, 2019 19:32

alerque mentioned this pull request Oct 17, 2019

Extricate char-def library to ... somewhere #686

Open

alerque force-pushed the fluent branch from 2699a95 to f21d84f Compare October 30, 2019 07:25

alerque mentioned this pull request Apr 17, 2020

fix(languages): Localize TOC title functions #849

Merged

alerque force-pushed the master branch from 4c31fcf to 8172455 Compare July 18, 2020 15:57

alerque modified the milestones: v0.11.0, v0.12.0 Sep 1, 2021

alerque mentioned this pull request Sep 7, 2021

Default language string functions not loaded #1157

Closed

alerque added 2 commits September 9, 2021 02:02

chore(deps): Add dependencies on lua-fluent and lua-cldr

5175c6f

chore(core): Add access to external tools from global SILE object

54ca405

alerque force-pushed the fluent branch from f21d84f to 516d56d Compare September 8, 2021 23:08

alerque force-pushed the fluent branch from 516d56d to 08609ab Compare March 10, 2022 23:52

alerque marked this pull request as ready for review March 10, 2022 23:56

alerque requested review from a team and simoncozens as code owners March 10, 2022 23:56

alerque requested a review from OlivierNicole March 10, 2022 23:57

Omikhleia reviewed Mar 11, 2022

View reviewed changes

core/inputs-texlike.lua Show resolved Hide resolved

Omikhleia mentioned this pull request Mar 11, 2022

Provide packages with a way to handle raw content #1092

Closed

chore(classes): Restore all stub functions for styling headings

958da3c

alerque force-pushed the fluent branch from 7fe8d50 to 6de1b43 Compare April 16, 2022 14:23

This was referenced Apr 16, 2022

Bable strikes back — chaos in language settings #1367

Open

You put what in my canon(ical language)? #1368

Open

alerque added 16 commits April 17, 2022 00:38

chore(languages): Pass current locale to fluent functions

c879316

chore(languages): Add debug output and more one step at a time

58a925a

chore(deps): Bump cldr-lua dependency

0c9bffd

ci(cirrus): Install new dependencies in BSD test containers

a1fc024

refactor(language): Debug after variable set not before

ea49337

test(languages): Update expected test output

e56be31

docs(manual): Typeset Arabic glyphs in source code samples

3fb7876

docs(manual): Copy-edit introduction to language handling

d5cecdf

refactor(settings): Move document language setting out of font suppor…

5b55f07

…t into main settings

refactor(utilities): Move common content paradigm to utilities

20ae0d9

chore(deps): Bump fluent-lua dependency

64b1033

chore(languages): Make use of more robust upstream fluent resource lo…

3a1f499

…ader

chore(build): Add new Lua requirements check to autoconf

a5f39c2

chore(languages): Add mostly empty localization for 'und' to suppress…

48e619c

… warnings

chore(tooling): Remove obselete linguist stuff from git attributes

34d5ad1

docs(manual): Document \ftl and \fluent command usage

68e8de6

alerque force-pushed the fluent branch from 6de1b43 to 68e8de6 Compare April 16, 2022 22:24

chore(packages): Move strings output by bibliography to Fluent locali…

deada94

…zation

alerque added 2 commits April 17, 2022 11:05

chore(deps): Add dependency on LuaRock loadkit

b54f6b7

chore(languages): Use loadkit to find and load FTL resources

d01fb9e

This was referenced Apr 17, 2022

Make src= paths relative to file, not CWD #1042

Open

0.10.x breaks -I input for *.sil classes #798

Closed

alerque merged commit bca0f36 into sile-typesetter:master Apr 18, 2022

alerque deleted the fluent branch April 18, 2022 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leverage Fluent and Unicode CLDR for localization #675

Leverage Fluent and Unicode CLDR for localization #675

alerque commented Oct 12, 2019

alerque commented Oct 12, 2019 •

edited

Loading

alerque commented Mar 11, 2022

Omikhleia commented Mar 11, 2022 •

edited

Loading

alerque commented Mar 11, 2022 •

edited

Loading

Omikhleia commented Mar 11, 2022

alerque commented Mar 11, 2022 •

edited

Loading

alerque commented Mar 12, 2022

alerque commented Apr 16, 2022 •

edited

Loading

alerque commented Apr 16, 2022

Leverage Fluent and Unicode CLDR for localization #675

Leverage Fluent and Unicode CLDR for localization #675

Conversation

alerque commented Oct 12, 2019

alerque commented Oct 12, 2019 • edited Loading

alerque commented Mar 11, 2022

Omikhleia commented Mar 11, 2022 • edited Loading

alerque commented Mar 11, 2022 • edited Loading

Omikhleia commented Mar 11, 2022

alerque commented Mar 11, 2022 • edited Loading

alerque commented Mar 12, 2022

alerque commented Apr 16, 2022 • edited Loading

alerque commented Apr 16, 2022

alerque commented Oct 12, 2019 •

edited

Loading

Omikhleia commented Mar 11, 2022 •

edited

Loading

alerque commented Mar 11, 2022 •

edited

Loading

alerque commented Mar 11, 2022 •

edited

Loading

alerque commented Apr 16, 2022 •

edited

Loading