Home > Main > On the state of i18n in Perl

On the state of i18n in Perl

The following text represents an effort to describe the situation I’ve encountered when I came to the Perl world last December. I’ve done some translating for the Debian project and I was a bit shocked about the state of Perl’s i18n. I have to admit, I’m still an inexperienced hacker, but I wanted to write this article to raise some awareness for the issues described if I’m right and learn something new if I’m wrong. Anyway, I tried to keep this article constructive and it’s still just my opinion, so please comment appropriately.

Disclaimer: I’m essentially talking about l10n, but most people know it as i18n, so I’m keeping “i18n” in text.

The i18n problem

When it comes to making your application tranlatable in Perl, there are actually two schools of doing this: via Maketext and via GNU gettext. GNU gettext is the most known software translation tool used in most open-source projects while Maketext is a child of the Perl world. And the bad thing is: Maketext is currently more popular, but if you are using Maketext for making your application translatable, you are doing it wrong!

Let’s look at how Maketext works, according to its documentation and contrast that with the gettext way.

Maketext manual defines the process as following (quoting freely):

  • Decide what system you’ll use for lexicon keys (i.e. base language)
  • Create a class for your localization project
  • Create a class for the language your internal keys are in
  • Go and write your program
  • Once the program is otherwise done, and once its localization for the first language works right (via the data and methods in Projname::L10N::en_us), you can get together the data for translation.
  • Submit all messages/phrases/etc. to translators
    • Translators may request clarification of the situation in which a particular phrase is found
    • Each translator should make clear what dependencies the number causes in the sentence
    • Remind the translators to consider the case where N is 0
    • Remember to ask your translators about numeral formatting in their language
    • The basic quant method that Locale::Maketext provides should be good for many languages. […] For the particularly problematic Slavic languages, what you may need is a method which you provide with the number, the citation form of the noun to quantify, and the case and gender that the sentence’s syntax projects onto that noun slot.
  • Once you’ve localized your program/site/etc. for all desired languages, be sure to show the result (whether live, or via screenshots) to the translators.

There is a lot of sense in this and this has certainly been valid back in 1999, but a lot of work in this process is not specified. For example, the translation process itself is questionable:

  • How do you “Submit all messages/phrases/etc. to translators”?
  • How do you integrate translations back from translators?
  • How do you resubmit translation strings if they change?
  • How do you communicate “situation in which a particular phrase is found” (i.e. context)?
  • What happens if one phrase has to be translated differently depending on context? How does one implement that in a module properly?
  • How does the translator “make clear what dependencies the number causes”? At what extents does that happen? Will the developer even understand him at all?
  • Does the programmer really have to understand all of implications of each language implemented? Should every programmer on the team understand them?
  • Who actually implements that “quant” method? How? What about languages with exceptions?

One basic, but fatal, mistake Maketext does is off-loading a lot of linguistic work onto programmer.

  • One particularly important point is the plural forms support (‘1 apple’, ‘2 apples’), which is important for many languages outside of USA and Western Europe . Maketext requires you to write a quant function that gets a string and a number as parameters and does some voodoo to produce the right string. Voodoo is undefined. In gettext it is — a formula for producing plural forms is defined which selects one of provided plural phrases.
  • No translator in his sane mind will ever write a Perl module for a language (they aren’t programmers, remember?), the programmer will have to do it and will also have to understand the implications.
  • The quant notation ("Your search matched [quant,_1,document]!") foolishly assumes word order is the same in all languages. Implementing a quant method properly would require passing the whole sentence into the function and doing a complete linguistic transformation which is highly non-trivial and better done by human.
  • Most of those linguistic “conventions” like number formatting or plural forms do not change over time and can be compiled at one place. One such place is Unicode’s CLDR project, which also includes plural form building and number/date formatting among other country- and language-dependant data.
  • It can’t even be assumed that the translators actually know all of these conventions! They might assume they know them, but translator is not necessarily doing translations for a living, he might be a volunteer, like in most open source projects. Imagine what happens when an amateur translator explains the inner workings of his native language to a programmer?

Compared to this gettext has a saner, more practical approach — they provide a standardized translation string format, handle updates of message catalogs cleanly, provide all necessary tools for message extraction, don’t require any additional modules, work mostly language-agnostic, provide contexts and translators’ comments, even plural forms calculation formulae are explicitely noted in the manual. It also emphasizes asynchronous translation: translation strings can be extracted and imported at any time in the lifecycle of a project. A developer essentially has to do the following:

  • Implement using gettext in his project (depends on the language used)
  • Mark extractable strings
  • Run extraction and merging scripts (mostly included by gettext)
  • Submit translation files to translators
  • Copy received translations back into the project

gettext of course is not perfect. It lacks several vastly important features, like proper gender support (e.g. “He was born” and “She was born” is different in Russian). But it generally follows the “It mostly works” principle, making features needed 95% of the time available. Workflow tools make using gettext a snap. Compared to Maketext it is also easier to support for the programmer and easier for the translator to produce translations. The dreaded quant function actually makes using Maketext properly for translations impossible.

Apart from those techical shortcomings, there is a bigger threat.

Community separation

Remember TPJ13? TPJ13 is an excellent summary of i18n problems, which every developer, even non-Perl one, should read. It’s solution part is hopelessly out-of-date — don’t forget, TPJ13 is getting ten years old this year. Back in 1999 gettext hasn’t had any plural forms support and also lacked many other features so the authors’ point used to be valid at that point. However, gettext had implemented its support for plurals rather fast and at that time Maketext should have been retired immediately. Sadly, this has not happened.

That misunderstanding haunts us until this day. Every novice Perl hacker is introduced to TPJ13 and tends to believe Maketext is the way to go. Failing to see its shortcomings however, yields in well-meant but still failed creations like Locale::Maketext::Lexicon which tries hard to bring the world of gettext to Maketext-infected minds. What we get is crazy stuff like (verbatim from the POD)

#: Hello.pm:11
msgid "You have %quant(%1,piece) of mail."
msgstr "Sie haben %quant(%1,Poststueck,Poststuecken)."

instead of a proper (German spelling corrected a bit):

#: Hello.pm:11
msgid "You have 1 piece of mail."
msgid_plural "You have %d pieces of mail"
msgstr[0] "Sie haben 1 Poststueck"
msgstr[1] "Sie haben %d Poststuecke"

The former has virtually no tool support (not even gettext’s extraction routine xgettext), but extraction is supported by home-grown xgettext.pl (notice the .pl suffix). And there we have some fatal stuff going on:

  • Locale::Maketext::Lexicon is considered the solution for using Maketext with .po files.
  • Neither Locale::Maketext::Lexicon nor xgettext.pl have any notion of proper gettext plurals
  • .po files created by xgettext.pl are not fully supported by translation tools like PoEdit, KBabel, Launchpad Rosetta, 99translations.com etc.
  • Catalyst::Plugin::I18N, the only i18n plugin for the extremely popular Catalyst web framework, is based on Locale::Maketext::Lexicon
  • xgettext.pl has support for Template-Toolkit templates, YAML, FormFu and Mason. Original gettext’s xgettext does not.

So there we have it: Perl hackers mostly use tools which are unsuitable and incompatible with the rest of the world without knowing it. The right tools actually can’t help them become “sane”, since xgettext can’t extract all those formats which xgettext.pl can and I don’t think that’ll change sometime soon.

Alternatives

Luckily, some hackers have produced a libintl-perl library which basically re-implements GNU gettext in Perl. There is a pure Perl implementation of message catalogs called Locale::gettext_pp, an XS version called Locale::gettext_xs (Warning: this one has some problems with mod_perl2!), a Perl wrapper around that (Locale::Messages) and building upon that an excellent Perl-y implementation of the framework Locale::TextDomain. These tools are worth your time.

Even though we have Locale::TextDomain, what should be done to amend the whole Maketext situation? I’d propose several possible actions:

  • Read the GNU gettext Manual to fully understand what these tools can do for you
  • Educate your colleagues, tell them about this article and explain the differences
  • If you can, port your current code to Locale::TextDomain
  • Don’t use Maketext for any new code
  • Update important code using Maketext like the Catalyst plugin mentioned above to support gettext
  • Update TPJ13 to reflect the situation
  • Port extraction routines from xgettext.pl to xgettext

This and general awareness of the issue should bring Perl’s i18n back on track. Thank you for reading!

  1. Schwern
    December 14th, 2011 at 17:13 | #1

    I think there’s a big reason why Catalyst chose to use Maketext to implement their i18n plugin: libintl-perl uses POSIX::setlocale to determine what the program’s current locale is (and setlocale isn’t thread safe). So, if you’re running a web application in a multi-threaded web server, you could run into the scenario where someone gets a page back in the wrong language, because the locale was switched as part of someone else’s request.

  2. Igor Zinovyev
    November 30th, 2012 at 20:03 | #2

    You wrote that “XS version called Locale::gettext_xs (Warning: this one has some problems with mod_perl2!).” Can you elaborate on this please? I’m working on a project that uses mod_perl2, and I’m really curious about what kind of problems there are.

    And thanks for a great article!

  3. July 18th, 2014 at 14:20 | #3

    Meanwhile Wise’s $70 million Ponzi scheme, in which he was accused of conning more than 120 ‘investors’ in at least 13 states out of substantial amounts of money, some, their life savings, began to collapse. yet with an Internet-age spin that he describes is like “putting lipstick on a pig. t have to ship goods and many other troubles you don.

  4. July 23rd, 2014 at 14:45 | #4

    This will help to avoid streaking and will ensure results. Vitamin D is a hormone-like substance critical for the vast majority of your biological functions, including:. Additionally, it can also play MPEG-4 video upto 2.

  5. August 31st, 2014 at 09:42 | #5

    Even frequent blow-drying at extreme heat or brushing obsessively can cause your hair to become extremely fragile causing it to break and fall out. Electrolysis - Electrolysis is a solution which utilizes a needle to kill the hair at its root. However, pulling on a scab that was adherent to the skin usually dislodged the graft - often several days after pulling on a hair was safe.

  6. September 20th, 2014 at 03:39 | #6

    When I initially commented I clicked the “Notify me when new comments are added” checkbox and now each time a comment is added I get four e-mails with the same comment.

    Is there any way you can remove me from that service? Many thanks!

  7. September 20th, 2014 at 07:30 | #7

    You don’t know if you’re going to be able to attack the same way and dive on the floor the same way. However, managing the nose will involve control the amount of cartilage and having it trimmed to the appropriate size. Once the water boils, turn the heater off and let the steam reach your skin, standing approximately 2 feet away from the pot.

  8. September 22nd, 2014 at 17:11 | #8

    The tummy tuck will give you a sleek waist that every woman desires. The smaller molecular size of the Glycolic acid is precisely what makes its penetration better and potentially superior to its fruit acid siblings. Just as in any field, including medicine, there are groups and people that provide low cost, operations for large volume client bases on low budgets and higher cost operations to superstar medical personalities willing to pay astronomical prices.

  9. September 22nd, 2014 at 18:33 | #9

    Good bye for now, look for our next piece of writing, coming shortly, from FLAT Rigs outriggers, shock cord snubber, and rigging kit. Would you like to advertise your fishing boat for sale. There are so many great places to go while you are there.

  10. September 29th, 2014 at 04:09 | #10

    2 ANALYSIS ON THE CONCENTRATION BY AREA IN TERMS OF THE MARKET SHARE 26. Our day always started by walking a mile to the fitness trail for an early morning jog. Once father is outside, he should find out how warm it is in Phoenix or in which actually he lives.

  11. October 17th, 2014 at 01:13 | #11

    bookmarked!!, I гeally like youг blog!

  1. No trackbacks yet.