<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rassie&#039;s Doghouse &#187; gettext</title>
	<atom:link href="http://rassie.org/archives/tag/gettext/feed" rel="self" type="application/rss+xml" />
	<link>http://rassie.org</link>
	<description>Barking at technology</description>
	<lastBuildDate>Sun, 08 May 2011 13:49:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>On the state of i18n in Perl</title>
		<link>http://rassie.org/archives/247</link>
		<comments>http://rassie.org/archives/247#comments</comments>
		<pubDate>Sun, 26 Apr 2009 19:39:22 +0000</pubDate>
		<dc:creator>rassie</dc:creator>
				<category><![CDATA[Main]]></category>
		<category><![CDATA[gettext]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[maketext]]></category>
		<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">http://rassie.org/archives/247</guid>
		<description><![CDATA[The following text represents an effort to describe the situation I&#8217;ve encountered when I came to the Perl world last December. I&#8217;ve done some translating for the Debian project and I was a bit shocked about the state of Perl&#8217;s i18n. I have to admit, I&#8217;m still an inexperienced hacker, but I wanted to write [...]]]></description>
			<content:encoded><![CDATA[<p>The following text represents an effort to describe the situation I&#8217;ve encountered when I came to the Perl world last December. I&#8217;ve done some translating for the Debian project and I was a bit shocked about the state of Perl&#8217;s i18n. I have to admit, I&#8217;m still an inexperienced hacker, but I wanted to write this article to raise some awareness for the issues described if I&#8217;m right and learn something new if I&#8217;m wrong. Anyway, I tried to keep this article constructive and it&#8217;s still just my opinion, so please comment appropriately.</p>

<p>Disclaimer: I&#8217;m essentially talking about l10n, but most people know it as i18n, so I&#8217;m keeping &#8220;i18n&#8221; in text.</p>

<h2>The i18n problem</h2>

<p>When it comes to making your application tranlatable in Perl, there are actually two schools of doing this: via <a href="http://search.cpan.org/dist/Locale-Maketext/lib/Locale/Maketext.pod">Maketext</a> and via <a href="http://www.gnu.org/software/gettext/gettext.html">GNU gettext</a>. <code>GNU gettext</code> is the most known software translation tool used in most open-source projects while <code>Maketext</code> is a child of the Perl world. And the bad thing is: <code>Maketext</code> is currently more popular, but if you are using <code>Maketext</code> for making your application translatable, you are doing it wrong!</p>

<p>Let&#8217;s look at how <code>Maketext</code> works, according to its documentation and contrast that with the <code>gettext</code> way.</p>

<p><code>Maketext</code> manual defines the process as following (quoting freely):</p>

<ul>
<li>Decide what system you&#8217;ll use for lexicon keys (i.e. base language)</li>
<li>Create a class for your localization project</li>
<li>Create a class for the language your internal keys are in</li>
<li>Go and write your program</li>
<li>Once the program is otherwise done, and once its localization for the first language works right (via the data and methods in Projname::L10N::en_us), you can get together the data for translation.</li>
<li>Submit all messages/phrases/etc. to translators

<ul>
<li>Translators may request clarification of the situation in which a particular phrase is found</li>
<li>Each translator should make clear what dependencies the number causes in the sentence</li>
<li>Remind the translators to consider the case where N is 0</li>
<li>Remember to ask your translators about numeral formatting in their language</li>
<li>The basic quant method that Locale::Maketext provides should be good for many languages. [&#8230;] For the particularly problematic Slavic languages, what you may need is a method which you provide with the number, the citation form of the noun to quantify, and the case and gender that the sentence&#8217;s syntax projects onto that noun slot.</li>
</ul></li>
<li>Once you&#8217;ve localized your program/site/etc. for all desired languages, be sure to show the result (whether live, or via screenshots) to the translators.</li>
</ul>

<p>There is a lot of sense in this and this has certainly been valid back in 1999, but a lot of work in this process is not specified. For example, the translation process itself is questionable:</p>

<ul>
<li>How do you &#8220;Submit all messages/phrases/etc. to translators&#8221;?</li>
<li>How do you integrate translations back from translators?</li>
<li>How do you resubmit translation strings if they change?</li>
<li>How do you communicate &#8220;situation in which a particular phrase is found&#8221; (i.e. context)?</li>
<li>What happens if one phrase has to be translated differently depending on context? How does one implement that in a module properly?</li>
<li>How does the translator &#8220;make clear what dependencies the number causes&#8221;? At what extents does that happen? Will the developer even understand him at all?</li>
<li>Does the programmer really have to understand all of implications of each language implemented? Should every programmer on the team understand them?</li>
<li>Who actually implements that &#8220;quant&#8221; method? How? What about languages with exceptions?</li>
</ul>

<p>One basic, but fatal, mistake <code>Maketext</code> does is off-loading a lot of linguistic work onto programmer.</p>

<ul>
<li>One particularly important point is the plural forms support (&#8216;1 apple&#8217;, &#8216;2 apples&#8217;), which is important for many languages outside of USA and Western Europe . <code>Maketext</code> requires you to write a <em>quant</em> function that gets a string and a number as parameters and does some voodoo to produce the right string. Voodoo is undefined. In <code>gettext</code> it is &#8212; a formula for producing plural forms is defined which selects one of provided plural phrases.</li>
<li>No translator in his sane mind will ever write a Perl module for a language (they aren&#8217;t programmers, remember?), the programmer will have to do it and will also have to understand the implications.</li>
<li>The <em>quant</em> notation (<code>"Your search matched [quant,_1,document]!"</code>) foolishly assumes word order is the same in all languages. Implementing a <em>quant</em> method properly would require passing the whole sentence into the function and doing a complete linguistic transformation which is highly non-trivial and better done by human.</li>
<li>Most of those linguistic &#8220;conventions&#8221; like number formatting or plural forms do not change over time and can be compiled at one place. One such place is Unicode&#8217;s <a href="http://cldr.unicode.org">CLDR</a> project, which also includes plural form building and number/date formatting among other country- and language-dependant data.</li>
<li>It can&#8217;t even be assumed that the translators actually know all of these conventions! They might assume they know them, but translator is not necessarily doing translations for a living, he might be a volunteer, like in most open source projects. Imagine what happens when an amateur translator explains the inner workings of his native language to a programmer?</li>
</ul>

<p>Compared to this <code>gettext</code> has a saner, more practical approach &#8212; they provide a standardized translation string format, handle updates of message catalogs cleanly, provide all necessary tools for message extraction, don&#8217;t require any additional modules, work mostly language-agnostic, provide contexts and translators&#8217; comments, even plural forms calculation formulae are explicitely noted in the manual. It also emphasizes asynchronous translation: translation strings can be extracted and imported at any time in the lifecycle of a project. A developer essentially has to do the following:</p>

<ul>
<li>Implement using <code>gettext</code> in his project (depends on the language used)</li>
<li>Mark extractable strings</li>
<li>Run extraction and merging scripts (mostly included by <code>gettext</code>)</li>
<li>Submit translation files to translators</li>
<li>Copy received translations back into the project</li>
</ul>

<p><code>gettext</code> of course is not perfect. It lacks several vastly important features, like proper gender support (e.g. &#8220;He was born&#8221; and &#8220;She was born&#8221; is different in Russian). But it generally follows the &#8220;It mostly works&#8221; principle, making features needed 95% of the time available. Workflow tools make using <code>gettext</code> a snap. Compared to <code>Maketext</code> it is also easier to support for the programmer and easier for the translator to produce translations. The dreaded <em>quant</em> function actually makes using <code>Maketext</code> properly for translations impossible.</p>

<p>Apart from those techical shortcomings, there is a bigger threat.</p>

<h2>Community separation</h2>

<p>Remember <a href="http://search.cpan.org/~ferreira/Locale-Maketext-1.13/lib/Locale/Maketext/TPJ13.pod">TPJ13</a>?
TPJ13 is an excellent summary of i18n problems, which every developer, even non-Perl one, should read. It&#8217;s solution part is hopelessly out-of-date &#8212; don&#8217;t forget, TPJ13 is getting ten years old this year. Back in 1999&#160;<code>gettext</code> hasn&#8217;t had any plural forms support and also lacked many other features so the authors&#8217; point used to be valid at that point. However, gettext had implemented its support for plurals rather fast and at that time <code>Maketext</code> should have been retired immediately. Sadly, this has not happened.</p>

<p>That misunderstanding haunts us until this day. Every novice Perl hacker is introduced to TPJ13 and tends to believe <code>Maketext</code> is the way to go. Failing to see its shortcomings however, yields in well-meant but still failed creations like <a href="http://search.cpan.org/dist/Locale-Maketext-Lexicon/lib/Locale/Maketext/Lexicon.pm">Locale::Maketext::Lexicon</a>
which tries hard to bring the world of <code>gettext</code> to <code>Maketext</code>-infected minds. What we get is crazy stuff like (verbatim from the POD)</p>

<pre><code>#: Hello.pm:11
msgid "You have %quant(%1,piece) of mail."
msgstr "Sie haben %quant(%1,Poststueck,Poststuecken)."
</code></pre>

<p>instead of a proper (German spelling corrected a bit):</p>

<pre><code>#: Hello.pm:11
msgid "You have 1 piece of mail."
msgid_plural "You have %d pieces of mail"
msgstr[0] "Sie haben 1 Poststueck"
msgstr[1] "Sie haben %d Poststuecke"
</code></pre>

<p>The former has virtually no tool support (not even <code>gettext</code>&#8217;s extraction routine <code>xgettext</code>), but extraction is supported by home-grown <code>xgettext.pl</code> (notice the <code>.pl</code> suffix). And there we have some fatal stuff going on:</p>

<ul>
<li><code>Locale::Maketext::Lexicon</code> is considered <strong>the</strong> solution for using <code>Maketext</code> with <code>.po</code> files.</li>
<li>Neither <code>Locale::Maketext::Lexicon</code> nor <code>xgettext.pl</code> have any notion of proper <code>gettext</code> plurals</li>
<li><code>.po</code> files created by <code>xgettext.pl</code> are not fully supported by translation tools like PoEdit, KBabel, Launchpad Rosetta, 99translations.com etc.</li>
<li><a href="http://search.cpan.org/~mramberg/Catalyst-Plugin-I18N-0.09/lib/Catalyst/Plugin/I18N.pm">Catalyst::Plugin::I18N</a>, the only i18n plugin for the extremely popular <a href="http://catalyst.perl.org">Catalyst</a> web framework, is based on  <code>Locale::Maketext::Lexicon</code></li>
<li><code>xgettext.pl</code> has support for <a href="http://www.template-toolkit.org">Template-Toolkit</a> templates, YAML, FormFu and Mason. Original <code>gettext</code>&#8217;s <code>xgettext</code> does not.</li>
</ul>

<p>So there we have it: Perl hackers mostly use tools which are unsuitable and incompatible with the rest of the world without knowing it. The right tools actually can&#8217;t help them become &#8220;sane&#8221;, since <code>xgettext</code> can&#8217;t extract all those formats which <code>xgettext.pl</code> can and I don&#8217;t think that&#8217;ll change sometime soon.</p>

<h2>Alternatives</h2>

<p>Luckily, some hackers have produced a <a href="http://search.cpan.org/dist/libintl-perl/"><code>libintl-perl</code></a> library which basically re-implements <code>GNU gettext</code> in Perl. There is a pure Perl implementation of message catalogs called <code>Locale::gettext_pp</code>, an XS version called <code>Locale::gettext_xs</code> (Warning: this one has some problems with <code>mod_perl2</code>!), a Perl wrapper around that (<code>Locale::Messages</code>) and building upon that an excellent Perl-y implementation of the framework <code>Locale::TextDomain</code>. These tools are worth your time.</p>

<p>Even though we have <code>Locale::TextDomain</code>, what should be done to amend the whole <code>Maketext</code> situation? I&#8217;d propose several possible actions:</p>

<ul>
<li>Read the <a href="http://www.gnu.org/software/gettext/manual/gettext.html">GNU gettext Manual</a> to fully understand what these tools can do for you</li>
<li>Educate your colleagues, tell them about this article and explain the differences</li>
<li>If you can, port your current code to <code>Locale::TextDomain</code></li>
<li>Don&#8217;t use <code>Maketext</code> for any new code</li>
<li>Update important code using <code>Maketext</code> like the Catalyst plugin mentioned above to support <code>gettext</code></li>
<li>Update TPJ13 to reflect the situation</li>
<li>Port extraction routines from <code>xgettext.pl</code> to <code>xgettext</code></li>
</ul>

<p>This and general awareness of the issue should bring Perl&#8217;s i18n back on track. Thank you for reading!</p>
]]></content:encoded>
			<wfw:commentRss>http://rassie.org/archives/247/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

