Archive

Posts Tagged ‘perl’

If you do a corporate CPAN, please do it properly!

May 25th, 2009 1 comment

Coming from the “bad implementations of good ideas” department: Corporate CPAN.

It’s a good idea, but a completely wrong solution. If someone is going to implement this, the proper way is not creating a PAUSE and CPAN mirror or anything like it. Corporate requirements are different. What you really need is a revival of http://debian.pkgs.cpan.org/ and also complete repositories for RHEL5, Debian, Ubuntu LTS, SLES9/10 etc. Those can be mirrored at will by usual and well-tested tools.

Thank you.

Update: Removed last sentence, it’s been a “thinko” on my side :(

Related posts

PBP 2nd ed.? Just open it up!

May 22nd, 2009 4 comments

There are some long-overdue calls out there calling for the second edition of “Perl Best Practices”. And more often than not, people having good ideas don’t realize they are proposing a dreadful solution.

I do consider PBP a good collection of recommendations. But sometimes, I loathe the impact it has on the community. It’s the Perl bible, Part II (just after the Camel book) and many people just go on believing in it. What we get as a result is a free community’s dependency on a non-free book.

At my place of work, we run Perl::Critic on SVN commits with PBP rule set. Everytime it finds something, it tells me to “Look at page XX of PBP for details”. So much for online help… Yes, I do have a copy of PBP in the office, even on my table. Sadly, it’s a german translation which has slightly different page numbering.

Why does a code analyzer cite some particular book edition? Imagine a second edition of PBP coming out. What do we get then, a command line parameter for book version? For a translation thereof? How many modules will we have to update? Usually a book gets updated when code changes, not other way around.

An even more heretical question: what happens if I don’t actually own a copy of PBP? Am I doomed to stay ignorant of best practices just because I’m just starting to learn and can’t or don’t want to shell out money for a book?

Other language communities are different. Both Ruby and Python give you extensive online documentation and also some dead-tree docs if you need them. But you don’t have to buy a book just to learn some best practices, those are readily available in blogs, wikis and what not. Perl’s community seems to trust in holy cows (camels or dogs for that matter) and just keeps insisting on buying books. “Modern” is something very different, though.

I know I can read PBP on Google Books, with several dozens of invisible pages. But PBP should have been online a long time ago. It should have been a community work from the beginning, since best practices is one of the first things a newbie needs.

There is only one way for PBP for the future: O’Reilly and Damian should open it up, just like “Higher Order Perl” has done. Make it downloadable at first, make it a community-driven project later. O’Reilly could even release a dead-tree edition every now and then, but the first step would be to free Damian from all the “please update PBP” e-mails — it’s perfectly probable that he doesn’t have time to do so and even more probable, no personal interest in bringing the second edition out. In that case, maybe he should raise his voice and work with O’Reilly on making that vastly important book a community project.

Related posts

The CPAN’s new clothes

May 13th, 2009 11 comments

I must admit, I’m a bit underwhelmed by Enlightened Perl’s Iron Man competition. They’ve essentially replaced Planet Perl because every blogger from the Planet now also gets syndicated to the Iron Man (could you please work together guys and kill one of the planets?) However, the blog posts’ medium quality hasn’t changed at all — and neither have the subjects. It’s still the same: some “aren’t you using Perl 6 already? 10 reasons why you should!”, some “all hail Moose!”, some “new Padre released, it’s just as powerful as Emacs, but only for Perl stuff”, and also some “Did you know CPAN rocked?” That last bit of sensationalism is getting on my nerves.

Yes, I know, CPAN is great. I even agree. CPAN is great because of the sheer amount of data collected. But it’s a complete disaster otherwise. I might be a bloody newbie in Perl world, but everytime I’m confronted with CPAN, I’m lost and confused — and there is a major flaw in CPAN causing that feeling: every module in CPAN is essentially an open-source project, but nothing at CPAN works under this assumption. It’s full of closed down silos.

Let’s start with a simple example: toying with CPANPLUS::Dist::RPM (or maybe it’s this link, who knows which is the canonical one) at work I’ve noticed it hangs sometimes, consuming 100% of CPU essentially doing nothing worthy. Let’s now assume I’d like to investigate this problem, but I don’t know if this is a bug or a mistake on my side.

So I go to the CPAN page of the package. Oh, there is a discussion forum, let’s click on that! Too bad, it’s broken. Bug reports? Oh yeah, there are whole three of them — none of which is my problem as far I can see. And I can actually barely see, since the visual component of that bug tracker makes Bugzilla of 1998 look good. But I still not sure that’s a bug, so I wouldn’t file one. What’s next? Maybe there is something new and relevant in development code in the revision control system? Oh wait, there isn’t any. CVS, SVN, Git, Mercurial, anything? Nope, no such thing on CPAN. Only release tarballs and some weird release differ tool. No revision control for an open source hosting in 2009, am I looking right?! Only way to ask something is to ask the author per e-mail? What about collaboration, patches, interactive community process for single modules?

Dear Github guys, if you happen to read this, please host the CPAN for us! Revision control, bug tracking, code review, documentation parser — if you could add some discussion forums, you’d be a perfect CPAN hoster!

So CPAN is so far: EPIC FAIL in discussion forums, somewhat FAIL in bug reports, EPIC FAIL in encouraging open development. Those are basic open source functionality nowadays, you know. And those are not nearly scratching the surface of critisicm.

Every other page on CPAN is different in design and interaction, there is no common and consice web interface, many different docs/search/rating mirrors which ultimatively produce a lot of Google spam. An awesome lot of cruft, a lot of broken modules which pop up prominently as first search result, no clear indication if a module is abandoned or actively developed. Even the most potentially useful features like dependencies’ resolution are crippled — dependencies work only in one direction, whenever I’d like to know how people actually use some module, I’m lost again. This is CPAN of today, confusing and rusty. CPAN is naked and it seems nobody wants to point that out. I do not want to think that nobody actually notices.

The situation with CPAN is symptomatic for the whole Perl community. Whether it’s Perl.com, Perl.org, Perl Mongers site, Perl Monks, use Perl or CPAN, it’s always the same: unreadable and misaligned content, incomprehensive navigation and straining colors, self-representation on the web coming straight from 1999 1. All the good code in the world and the power of the language won’t help anyone as long as people are alienated by ugly tools, visually and technically. Why can’t CPAN have the visual docs design from http://perldoc.perl.org, which at least features a syntax highlighter? Some CPAN mirror I land on every now and then from Google is even uglier than the one at http://search.cpan.org. Do we care at all about how those sites look? Do we care about fellow Perlers, about how hard they have to look for information? Why isn’t there some central site for Perl information? Why is every Perl project so independant that things like Perl Iron Man happen without cooperation with Planet Perl? 2

Perl community has so many possibilities but most of them stay unused. Most people are probably content with what they have and wouldn’t want to change anything. It’s fine, Perl’s way certainly supports that, but then we can forget about Perl revival. It’d be a shame, but we’d have only us to blame, not some superstitions about Perl being a “write-only language” or “ASCII soup”. The first impression counts and many newbies might not make it to the code at all — they’ll struggle with finding tutorials first. They won’t find out why Perl is great and will leave for other, probably inferior, languages, because they’ll be reading some ugly outdated quickstart documentation from 1997. They won’t find the shiny things, but they should be able to — as their first Google search result.

  1. Let’s not forget the sheer number of sites a Perler might need to visit to get all the information
  2. Actually there is an easy explanation: at CPAN, if you have a proposal or a patch, you can’t actually do anything more useful than fork and upload your own package to CPAN. Same goes for Planets — open-source type cooperation seems mostly unknown to Perl 5 community. This changes with Perl 6, but it needs to change for Perl 5 too.

Related posts

Tags: , ,

On the state of i18n in Perl

April 26th, 2009 No comments

The following text represents an effort to describe the situation I’ve encountered when I came to the Perl world last December. I’ve done some translating for the Debian project and I was a bit shocked about the state of Perl’s i18n. I have to admit, I’m still an inexperienced hacker, but I wanted to write this article to raise some awareness for the issues described if I’m right and learn something new if I’m wrong. Anyway, I tried to keep this article constructive and it’s still just my opinion, so please comment appropriately.

Disclaimer: I’m essentially talking about l10n, but most people know it as i18n, so I’m keeping “i18n” in text.

The i18n problem

When it comes to making your application tranlatable in Perl, there are actually two schools of doing this: via Maketext and via GNU gettext. GNU gettext is the most known software translation tool used in most open-source projects while Maketext is a child of the Perl world. And the bad thing is: Maketext is currently more popular, but if you are using Maketext for making your application translatable, you are doing it wrong!

Let’s look at how Maketext works, according to its documentation and contrast that with the gettext way.

Maketext manual defines the process as following (quoting freely):

  • Decide what system you’ll use for lexicon keys (i.e. base language)
  • Create a class for your localization project
  • Create a class for the language your internal keys are in
  • Go and write your program
  • Once the program is otherwise done, and once its localization for the first language works right (via the data and methods in Projname::L10N::en_us), you can get together the data for translation.
  • Submit all messages/phrases/etc. to translators
    • Translators may request clarification of the situation in which a particular phrase is found
    • Each translator should make clear what dependencies the number causes in the sentence
    • Remind the translators to consider the case where N is 0
    • Remember to ask your translators about numeral formatting in their language
    • The basic quant method that Locale::Maketext provides should be good for many languages. […] For the particularly problematic Slavic languages, what you may need is a method which you provide with the number, the citation form of the noun to quantify, and the case and gender that the sentence’s syntax projects onto that noun slot.
  • Once you’ve localized your program/site/etc. for all desired languages, be sure to show the result (whether live, or via screenshots) to the translators.

There is a lot of sense in this and this has certainly been valid back in 1999, but a lot of work in this process is not specified. For example, the translation process itself is questionable:

  • How do you “Submit all messages/phrases/etc. to translators”?
  • How do you integrate translations back from translators?
  • How do you resubmit translation strings if they change?
  • How do you communicate “situation in which a particular phrase is found” (i.e. context)?
  • What happens if one phrase has to be translated differently depending on context? How does one implement that in a module properly?
  • How does the translator “make clear what dependencies the number causes”? At what extents does that happen? Will the developer even understand him at all?
  • Does the programmer really have to understand all of implications of each language implemented? Should every programmer on the team understand them?
  • Who actually implements that “quant” method? How? What about languages with expections?

One basic, but fatal, mistake Maketext does is off-loading a lot of linguistic work onto programmer.

  • One particularly important point is the plural forms support (‘1 apple’, ‘2 apples’), which is important for many languages outside of USA and Western Europe . Maketext requires you to write a quant function that gets a string and a number as parameters and does some voodoo to produce the right string. Voodoo is undefined. In gettext it is — a formula for producing plural forms is defined which selects one of provided plural phrases.
  • No translator in his sane mind will ever write a Perl module for a language (they aren’t programmers, remember?), the programmer will have to do it and will also have to understand the implications.
  • The quant notation ("Your search matched [quant,_1,document]!") foolishly assumes word order is the same in all languages. Implementing a quant method properly would require passing the whole sentence into the function and doing a complete linguistic transformation which is highly non-trivial and better done by human.
  • Most of those linguistic “conventions” like number formatting or plural forms do not change over time and can be compiled at one place. One such place is Unicode’s CLDR project, which also includes plural form building and number/date formatting among other country- and language-dependant data.
  • It can’t even be assumed that the translators actually know all of these conventions! They might assume they know them, but translator is not necessarily doing translations for a living, he might be a volunteer, like in most open source projects. Imagine what happens when an amateur translator explains the inner workings of his native language to a programmer?

Compared to this gettext has a saner, more practical approach — they provide a standardized translation string format, handle updates of message catalogs cleanly, provide all necessary tools for message extraction, don’t require any additional modules, work mostly language-agnostic, provide contexts and translators’ comments, even plural forms calculation formulae are explicitely noted in the manual. It also emphasizes asynchronous translation: translation strings can be extracted and imported at any time in the lifecycle of a project. A developer essentially has to do the following:

  • Implement using gettext in his project (depends on the language used)
  • Mark extractable strings
  • Run extraction and merging scripts (mostly included by gettext)
  • Submit translation files to translators
  • Copy received translations back into the project

gettext of course is not perfect. It lacks several vastly important features, like proper gender support (e.g. “He was born” and “She was born” is different in Russian). But it generally follows the “It mostly works” principle, making features needed 95% of the time available. Workflow tools make using gettext a snap. Compared to Maketext it is also easier to support for the programmer and easier for the translator to produce translations. The dreaded quant function actually makes using Maketext properly for translations impossible.

Apart from those techical shortcomings, there is a bigger threat.

Community separation

Remember TPJ13? TPJ13 is an excellent summary of i18n problems, which every developer, even non-Perl one, should read. It’s solution part is hopelessly out-of-date — don’t forget, TPJ13 is getting ten years old this year. Back in 1999 gettext hasn’t had any plural forms support and also lacked many other features so the authors’ point used to be valid at that point. However, gettext had implemented its support for plurals rather fast and at that time Maketext should have been retired immediately. Sadly, this has not happened.

That misunderstanding haunts us until this day. Every novice Perl hacker is introduced to TPJ13 and tends to believe Maketext is the way to go. Failing to see its shortcomings however, yields in well-meant but still failed creations like Locale::Maketext::Lexicon which tries hard to bring the world of gettext to Maketext-infected minds. What we get is crazy stuff like (verbatim from the POD)

#: Hello.pm:11
msgid "You have %quant(%1,piece) of mail."
msgstr "Sie haben %quant(%1,Poststueck,Poststuecken)."

instead of a proper (German spelling corrected a bit):

#: Hello.pm:11
msgid "You have 1 piece of mail."
msgid_plural "You have %d pieces of mail"
msgstr[0] "Sie haben 1 Poststueck"
msgstr[1] "Sie haben %d Poststuecke"

The former has virtually no tool support (not even gettext’s extraction routine xgettext), but extraction is supported by home-grown xgettext.pl (notice the .pl suffix). And there we have some fatal stuff going on:

  • Locale::Maketext::Lexicon is considered the solution for using Maketext with .po files.
  • Neither Locale::Maketext::Lexicon nor xgettext.pl have any notion of proper gettext plurals
  • .po files created by xgettext.pl are not fully supported by translation tools like PoEdit, KBabel, Launchpad Rosetta, 99translations.com etc.
  • Catalyst::Plugin::I18N, the only i18n plugin for the extremely popular Catalyst web framework, is based on Locale::Maketext::Lexicon
  • xgettext.pl has support for Template-Toolkit templates, YAML, FormFu and Mason. Original gettext’s xgettext does not.

So there we have it: Perl hackers mostly use tools which are unsuitable and incompatible with the rest of the world without knowing it. The right tools actually can’t help them become “sane”, since xgettext can’t extract all those formats which xgettext.pl can and I don’t think that’ll change sometime soon.

Alternatives

Luckily, some hackers have produced a libintl-perl library which basically re-implements GNU gettext in Perl. There is a pure Perl implementation of message catalogs called Locale::gettext_pp, an XS version called Locale::gettext_xs (Warning: this one has some problems with mod_perl2!), a Perl wrapper around that (Locale::Messages) and building upon that an excellent Perl-y implementation of the framework Locale::TextDomain. These tools are worth your time.

Even though we have Locale::TextDomain, what should be done to amend the whole Maketext situation? I’d propose several possible actions:

  • Read the GNU gettext Manual to fully understand what these tools can do for you
  • Educate your colleagues, tell them about this article and explain the differences
  • If you can, port your current code to Locale::TextDomain
  • Don’t use Maketext for any new code
  • Update important code using Maketext like the Catalyst plugin mentioned above to support gettext
  • Update TPJ13 to reflect the situation
  • Port extraction routines from xgettext.pl to xgettext

This and general awareness of the issue should bring Perl’s i18n back on track. Thank you for reading!

Related posts

Taming cpan2dist on Ubuntu 8.10

January 4th, 2009 1 comment

cpan2dist is great. Easily one of the examples why people at Debian (and its derivates) absolutely love Perl and CPAN. However, it requires some fiddling to get right and since I’ve just done that today I’d like to write this stuff down for generations to come.

A quick introduction for those who are unfamiliar with cpan2dist: it fetches a Perl package from CPAN and installs it into your system using distribution’s tools. Thus you can resolve possible distribution upgrades painlessly, since every Perl package is also shown in your package management. It’s part of Perl 5.10 distribution and makes native package installation easy.

There are however some catches and the first hurdle comes at the very beginning: distribution-specific packaging is done with plugins and sadly Intrepid does not include the Debian plugin for cpan2dist, CPANPLUS::Dist::Deb. Since you can’t install that one from packages yet (due to lack of CPANPLUS::Dist::Deb, welcome to recursion), you’d have to install it manually — the good thing is, it will be the only package installed manually and it also can be replaced with a packaged one after bootstrapping the whole system. Just run

1
cpanp install CPANPLUS::Dist::Deb

and the first step is done!

Now cpan2dist can be used. Let’s reinstall CPANPLUS::Dist::Deb, this time as a Debian package!

1
cpan2dist --format CPANPLUS::Dist::Deb --buildprereq --install --skiptest CPANPLUS::Dist::Deb

It should give you a sudo prompt at some point, this is because it installs the Debian package already, so you should be done in a couple of seconds.

The options’ meanings (in this order): the desired format of the packages, pre-requisites should be all built, packages should be installed, tests should not be executed before build (time saver) and module CPANPLUS::Dist::Deb is being built.

However, we are not done yet. You’d notice that cpan2dist tries to resolve some dependencies which are provided by basic Perl packages perl, perl-base and perl-modules. These might, but in most cases should not be downloaded and built from CPAN. Luckily, cpan2dist can be provided with an “ignore list”, the downside is that the default one is tiny und thus mostly useless. We’d have to recreate this list from our system:

1
dpkg -L perl perl-base perl-modules | grep \.pm$ | sed 's/^\/usr\/\(share\|lib\)\/perl\/[.0123456789]*\/\(.*\)\.pm$/^\2$/g' | sed 's/\//::/g' > ignorelist

This is my recipe. I get the list of all *.pm files from perl, perl-base and perl-modules and re-create the module name from its path. Every entry becomes a pattern (e.g. ^Test::Simple$) so that module names wouldn’t match as substrings and the whole list is dumped to ignorelist. Now we have to add an --ignorelist ignorelist option to the cpan2dist command line.

This should round it up — if you are lucky, everything goes well and you’d have an installed module as a Debian package. If you have some bad karma on a particular day, you’d end up with some errors, all of which can be solved with some manual package installation and careful reading.

One problem remains though: cpan2dist doesn’t seem to check whether a particular package is available from the archive. But that’s only a small nuisance in an otherwise really useful package.

Related posts

Private feeds in Google Reader

December 29th, 2008 2 comments

Probably the most requested feature in Google Reader has been support for private feeds. Probably, they think every good web application should support private links (the ones with a random ID attached to it) so that their users can keep control of their credentials — a very sensible thing to assume. Private links can be reset everytime you need it and thus you should be safe. Google also keeps such feeds private, you’d need to get the URL into main Google index to make it appear somewhere. Also, Google needn’t store any passwords for other services, thus reducing possible critisism on itself.

However, not every website supports private links and there we have a problem: I’d like to use a couple of private feeds with Google Reader but can’t. Or better couldn’t, since I’ve made myself a late Christmas present: a Google Reader private feeds workaround!

It’s a small Perl script intended to be running somewhere on your own hosting. What does it do? Not much: given a GUID it fetches an associated website and returns it to the requester. If GUID is not found, nothing happens. If an error occurs, nothing is returned either. So simple. You still have to take care of the feed list though.

I will upload the code and some instructions to the newly created Google Code project shortly. Stay tuned!

Related posts