-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leverage Fluent and Unicode CLDR for localization #675
Conversation
8f26b12
to
cb02eaf
Compare
This won't start passing CI until I do another release of en.ftl: hello = Hello { $name }! tr.ftl hello = Merhaba { $name }! fluent.sil: \language[main=en]
\fluent[name=World]{hello}
\language[main=tr]
\fluent[name=Dünya]{hello} Noticeably missing at this stage are locale fallbacks for missing messages, but that's coming. |
@Omikhleia / @OlivierNicole / @simoncozens Have any of you by chanced taken a peak at this? I know it's not passing tests yet, the issue with that is what to do when inserting a translation that doesn't exist. The addition of an empty node in the current scheme is changing some tests, but I can fix that. What I'm really interested in is whether the overall usage / API looks ergonomic as a SILE user or even makes a bean of sense to somebody who hasn't been mucking around waste-deep in localization files for the better part of a couple years. I have some more SILE stuff coming up that could really benefit to having this merged and I'm interested in finalizing the direction soon. |
I haven't looked in detail yet, I have to admit. For the interface of SILE itself (error messages, etc.), it could make some sense, though it's certainly a lot of work to do properly (and not necessarily easy to maintain when updating/fixing messages). For strings output to the rendered document, I have to admit that while I admire the effort, I am not much convinced of the general usefulness. I use to say I do not believe much in localized strings in these cases... The only cases currently with the default class and packages (if I am not mistaken) mostly concern the TOC header and the chaptering.
In other terms, I somewhat feel I will always have to redefine these things in a specific book context, defeating the purpose of a general translation file... And I never liked having to dabble into Babel's files in LaTeX do know/guess what to override. (As an aside note regarding the more general i18n logic in pattern-based generic tools such as Fluent, I am not sure it addresses in an easy way many language-specific features. Take for instance elision and contraction rules for articles that many Latin-derived language have, and which may depend on grammatical gender or etymology... The rules to get "The {season}, the {tool}" (where season and tool would be the replaceable strings) are quite unpractical. -- e.g. in French, winter = l'hiver (elided), axe = la hache (not elided)... Something one cannot get right without human intervention, or a good bit of NLP if one wants to automate it. Of course, we are hardly in such a case in our context - It's just to say my opinion on such tools is somewhat mitigated.) |
@Omikhleia Well good morning wet blanket! (Don't worry I'm not offended, I asked for feedback.) I get some of your points, but I guess I have different needs. I suppose I should rephrase my question to ask whether you think this is actively worse than the current ad-hock localization using redefining SILE commands that output content. It brings 2 more Lua dependencies into the mix, that's one downside — but I don't think it's a blocker. On the other hand I do think its easier to use than what we have now. Lets say you want to override the default TOC title from Lua: -- old way
SILE.registerCommand("tableofcontents:title", function (_, _)
SILE.process({ "Konu Haritası" }
end)
SILE.call("tableofcontents")
-- new way
SILE.fluent:add_messages("toc-title = Konu Haritası")
SILE.call("tableofcontents") Of course Lua is verbose, but it isn't much different from SILE syntax:
Yup that's it.
True story. Having this tooling would certainly enable one to localize the SILE interface itself, but that wasn't my goal. I wouldn't object if somebody wants to contribute localizations, but the upkeep on those would be up to the user base. I would note that we've had some extraordinary efforts in this direction already with a full Japanese translation of the manual, but again this wasn't my main objective.
I totally see your point here and would be happy to make SILE easy to use as a platform in this mode, but I don't think we're ever going to get away from some people wanting ready to use templates including things like TOC headers. Again the question is is this way actively worse than what we have? And is providing tooling that other packages could use for their own use cases a bad thing rather than making them implement their own content functions or Lua routines?
Sure. So is providing an easy way to use a templates that includes strings (as opposed to always rolling your own) while providing your own strings a bad thing? My books don't just have a TOC, they also have a TOV (Table of Verses) and a TOR (Table of References) and a TOM (Table of Maps) and a BALROG and so forth and so on. Being able to define all the strings in one place rather than having to have them inline with the content (which is all auto generated and can change order, etc.) is a lot easier for me.
I don't think having a "general purpose translation file" is really the primary goal here. Sure it works for a few strings we may want to support across a bunch of languages out of the box, but the real goal is making the tooling available for more specialized use. Lets talk about a real use case: one of the projects I'm working on with SILE is typesetting some workbook material in a whole host of languages. There are a lot of boiler plate strings, often longer than a couple of words. Managing the content and the typesetting template separately is the only way that makes sense, having the template strings appear over and over in the content would be hugely repetitive and make managing the actual content harder. Of course I could write a library of commands with localized outputs for each language, but this tooling allows me to write 1 template, then write one localization file for each language, then typeset the content in any language. Also just in case this was missed, this currently has to be in SILE core not a package because enabling raw pass-through commands in the parser (for parsing FTL content in SILE files) not yet supported from a package.
I'm fully aware that localization systems are hard and some languages are much harder to do well than others. That being said I have used a lot of systems and Fluent is far and away the most versatile with enough features to make complex replacements possible in natural contexts. Replacements are not just variable expansions but can include meta data about gender or other grammatical forms so that localizers can cover all the possible scenarios. There are a couple of examples on the Project Fluent site, but those are just the tip of the iceberg. Handily the code end doesn't need to understand a given complexity, it is really up to the translator to provide the localization. This helps side step issues like programmers assuming 1 is singular and 2+ is plural where a language might actually have a few/many break at 5 instead. The localization defines this, not the code, so you don't get stuck as often as a translator is some situation where you can't make a natural translation because some coder didn't think about your language's special issues. Again, do you think it makes SILE worse for providing an advanced (but in my view easy to use) way to localize strings? I'm not saying we have to bake a lot of strings in and we can make the TOC title output optional (for example), but if we do output it do you have an objection to the commands shown above for doing it? |
I feel apologetic, I had no intention to be offending at all :( - I just tried to expressed my concerns that there might not be a general solution to the problem. Regarding the syntax,
All do seem more or less equivalent to me. (Well, I could advocate that styling also covers the font selection, which is done currently via another command, and all spacing, skips, alignments etc. with a loose inheritance system). This said, I don't have strong objections either. But you make good points too.
We might actually need more flexible ways for extending the parser, to be able to have package declare some hooks - true verbatim, inline scripts with syntax highlighting, inline SVG or any other text-based graphics (e.g. PlantUML and charts), etc. come to mind. |
No no you didn't offend at all, that was good feedback and I appreciate it. Sorry if my pre-coffee joke came off wrong.
This is actually a coupling I'm trying to get away from. Not that it isn't good for some scenarios, but the mix of content and presentation style is bad new for my projects. I have different people typesetting vs. translating and having Lua commands that have both presentation styles like fonts and spacing and the related content strings is just a disaster. I can code my way out of it with Lua, but adding Fluent libraries to SILE core takes care of that abstraction for me.
True, but as of yet I haven't figured out how to do it with a one-pass LPEG parser. By the time we get to the package load commands in the document the parsing grammar is already defined and it is too late for a callback to change it. I haven't wanted to go down the multi-pass rabbit hole yet.
Yes, but at this point the only way we have to do that is with a whitelist of tags that are processed as raw content instead of SILE syntax. This is already how we deal with PlanetUML (I can't find the issue, maybe this was in my head) and SVG (#1091) have come up before, as have MathJax (#220), Lilypond (#435), ImageMagick (#436), and TikZ (#437) The only alternatives I found to that problem to allow packages to require package loading on the command line before loading the document or do to multi-pass parsing, both of which have huge downsides. See #1092. |
I completely agree the whitelist of passthrough commands is less than ideal, but I'm also willing to let it stand until such a time as we agree on a better replacement. As such I think I'm going to move ahead with this even knowing we might refactor it later if we change the parser mechanism. |
Gah, the manual is a language disaster. With localization updates in place it becomes clear we're leaking language settings all over the place. Our code samples set all following content to |
…t into main settings
In case I forget by the next time I pick this up, the current test failures are because of load paths. We have a lot of things stuffed in the Review loadkit for how to add custom loaders for an extension using the same path handling... |
Early work in relation to #665.