Internationalisation (i18n) and Localisation (l10n)
Source:vignettes/articles/internationalisation.Rmd
internationalisation.Rmd
NOTE: This is currently a WIP and several things ahve changed since it’s initial writing
Internationalization of the lessons have always been a priority for The Carpentries given the fact that we are a global organization. There are three levels at which translations can be added:
- The structural elements for the page (e.g. dropdown menu headings and navigation tooltips)
- References and definitions
- Page content
Structural Elements (l10n)
The first issue is common for all websites and has several available solutions that exits in several languages.
At the moment, The Carpentries
website supports l10n in a low-rent manner by including a yaml
dictionary in the _data
directory and switching languages
via the site.data.language
variable (as
shown in this example that translates “This content is open
source”). Though, how exactly this is accessible via the main site
is not clear.
To make sure that the translations are compatible no matter what
tooling we use (R, Python, JavaScript, PhP), we should store the
translations in the *.po
(portable object) so that each
language can use its own gettext()
utility to swap out the
translations.
Because it will be associated with the lesson template itself, the
*po
files will live in the {varnish} package and be
used from R to translate messages when the website is being
generated.
References and Definitions
References for definitions is achieved via the {glosario} project where the glossary is formatted as a yaml file and there are python, and R libraries that can be used to extract specific translations for these glossaries.
Page Content (i18n)
This is a topic that is currently not well addressed and is quite
hard to do because translating prose is much harder than translating
individual messages because the context of an individual paragraph in a
section is important. David Pérez-Suárez has proposed to use a
{gettext}
solution because this is a standard for
translating messages in several computer programs. He found a python +
BASH project called po4gitbook
that will convert markdown content to po files for translation and back
again. However, he’s finding that it breaks down a lot with parsing
markdown elements like lists and R chunks. I’m thinking that a solution
is to use parse the markdown with the commonmark XML spec and then use
that to extract the paragraph elements, recast them into markdown and
use those for basis of the translated messages. This way, parsing won’t
be an issue. The big challenge is that the library has to be re-written
for that to happen.