Archive

Author Archive

Upgrading and deprecating

May 8th, 2011 13 comments

Sad things are happening in the Perl world. Sebastian Riedel has deprecated Perl 5.8.x support in Mojolicious without even saying why this was necessary. Despite admiring Sebastian for all the work he does and also his artistic capabilities on alternative Perl logos, I despise this step as extremely ill-advised and damaging to the whole Perl community.

In general, every software used by anyone else needs deprecation policies. If you throw a feature away, you have to announce it and make everything in your power to ease the migration pain for your users. A lot of sensible software employs a two-major-versions deprecation policy: if you deprecate some functionality in major version X, you add a warning and a migration recommendation in major version X+1 and remove the feature completely in major release X+2.

As opposed to other languages, Perl the language and Perl the community has been extremely backwards-compatible for a long time now. Quote chromatic from “Modern Perl”:

Perl 5’s design process in 1993 and 1994 tried to anticipate new
directions for the language, but it’s impossible to predict the
future. Perl 5 added many great new features, but it also kept
compatibility with the previous seven years of Perl 1 through Perl
4. Sixteen years later, the best way to write clean, maintainable,
powerful, and succinct Perl 5 code is very different from Perl
5.000. The default behaviors sometimes get in the way;
fortunately, better behaviors are available.

Anyone remember the Date::Manip disaster? I do and it has been an extremely bad move. It’s all the same for Mojolicious: it’s the fate of the libraries to support the lowest common determinator of all available platforms and a web framework is just such a library. The only question is: what do we consider an “available platform”?

The obvious reply is: take current distributions and look at the available options. That includes Linux distributions, both desktop and enterprise, Windows and a whole lot of other operating systems and hardware platform on which Perl can run. We’d probably arrive at 5.8 and 5.10 sharing the lead and 5.6 and 5.12 somewhat in the backseat.

A lot of Perl code runs in closed corporate environments and thus has to obey some rules, like using special software distributions, so unsurpisingly, a big chunk of those 5.8ers are enterprise Linux distributions like RHEL or SLES. However, just like Hacker News, where seemingly everyone assumes you are a start-up founder and thus free to go with all the current bells and whistles, Perl community started actively ignoring those distributions, proposing to “just upgrade your Perl / install perlbrew” when confronted with a problem instead of handling it in a sensible way. This is obviously wrong, since each developer is in a different situation.

My previous Perl project has been in a medium-sized department at a big electronics and IT company. It consisted of big chunks of legacy Perl code deployed as part of bigger internal website on RHEL5 servers. Developers had their freedoms, as long as certain constraints were fullfilled like not breaking the code of others (e.g. the big framework that was been wrapped around our code) and deploying with RPM packages. Thus, we could happily innovate and so I introduced Moose and many other modern Perl modules which have helped to clean that mess up.

Now, fast forward to today. Even then, I would have been happy to introduce at least some kind of web framework instead of the CGI-based mess we had then. Catalyst has obviously been on my plate for some time and today I would probably have considered Mojolicious for the task. However: being able to run it on Perl 5.8.x would have been a MUST, not a CAN feature.

Now consider this: companies buy enterprise products to get support. RedHat in provides seven years of support. RHEL3 (Perl 5.8) has been supported until October 31st 2010. RHEL4 (Perl 5.8) will be supported until February 29th 2012. End of March 2014 is when RHEL5 (Perl 5.8) will be officially dead. By comparison: Windows XP, first out in 2001 will be supported until April 2014. RHEL6 (Perl 5.10) will be supported until 2017. Which version of Perl will Mojolicious support then? Considering yearly Perl releases, probably only 2.20 upwards, since 2.24 will be current then.

So are the “evil big corporations” at fault here? Not at all. Perl 5.8 has been released in 2002 and as you see, RHEL3, released a year later, has catched up and delivered 5.8. RHEL4 and RHEL5 have seen minor Perl updates, since no major updates for Perl came out in that time. Perl 5.10 appeared half a year after RHEL5 released, so it hasn’t been included, Perl 5.12 has appeared roundabout at the same time as RHEL6 beta, so only 5.10 could have been shipped. So why does the community bitch so much about enterprise distributions if Perl itself doesn’t release often enough to get current software into them?

We, the programmers, always say “Don’t change a running system”. In that project I worked in, nobody in their right mind would have allowed me to uprade RHEL5 to 6 even if it has been available at the time. Even RHEL5 has been rather fresh, many of the servers still ran RHEL4 or even RHEL3. Nobody in their right mind would have allowed perlbrew on production servers. Nobody in their right mind would allow me to allocate hours and days of developers’ and testers’ time to retest everything with the newest Perl version (you don’t seriously assume legacy code had any amount of reasonable regression tests?)

The corporate world, which either still is or has been extremely Perl-friendly in the past, often needs to have some kind of long-term strategy, which for many has been Perl for years or even decades. Instead of supporting and cherishing that valuable connection, parts of Perl community has just decided to say a big “fuck you!” to all of those who try to create great software in restricted environments, those who try to persuade their bosses not to move to Java or .NET, because Perl makes them much more productive, to all those people who have considered using Mojolicious, to all of those who actually have. And all of those who complain get ignored, while uttering something about moving more stuff to 5.10 just to piss off all of 5.8 users.

Perl as a language is dependant on those walled gardens. Perl as a community is perfectly capable of taking care for those stuck with an older version. But obviously some people couldn’t care less and think that everything runs with HTML5 on websockets in the cloud nowadays and that everyone can and will upgrade whenever a fresh version of whatever software comes out.

It just doesn’t work this way. And it’s sad that some people in the community think it could.

christian louboutin sale,ralph lauren sale,louis vuitton bags outlet,cheap michael kors handbags sale,cheap wedding dresses online

On ducks and ducklings

July 28th, 2010 2 comments

It seems there is a new search engine in town: DuckDuckGo. While in itself it’s a good thing — if anything, Google needs some competition — it’s disturbing how much unreflected hype it produces in the Perl community.

I generally disapprove of the “because it’s not Google” argument. Just because someone is not Google, doesn’t mean they are good per definition — neither is Google evil per definition. You need to fairly rate each contender and base ratings on facts: features, existing problems, evaluations etc. On this scale, technical arguments for Duck1 has been not really compelling enough for me to switch.

But one argument shouldn’t be compelling for anyone: DuckDuckGo is written in Perl. Because it doesn’t matter at all.

You remember Frederic Brooks’ “No silver bullet” essay? In a nutshell, it’s stating “There is no single technology that’s best for every task”. A solution should use whatever is most suited for the task, not the next best hype. If you do embedded programming, you’d probably go with C, doing much mathematical stuff brings you to Haskell and in complex networking you might be better off with Erlang. Normally you wouldn’t be doing web-development in Bash, but you’d take whatever is best suited for the task instead. If you have several options then you get to choose whatever you’d be more efficient with. Like, for example, Perl.

But nobody should care about which language you chose just as nobody should care what DuckDuckGo is written in. It’s a nice bonus if your favourite search engine is made with technologies you like, but it shouldn’t matter — what matters is that it gets stuff done.

If someone were to make a survey asking geek what programming language Google’s search engine is written in, he’d probably get a hundred different answers, depending of what every one of respondents has heard on different occassions from different sources. And probably, every one of them will be right to a certain degree: Google does many languages and many technologies — whatever it takes to get stuff done. A solution is always a mixture, even in IT.

The mere fact that DuckDuckGo is written in Perl will be good publicity and a good showcase for Perl, but only if or when DuckDuckGo takes off big time. Geeks won’t be the critical mass for Duck’s takeoff, instead “normal” people will — and Duck will need a lot of them. They’ll matter a lot: if they like Duck, Google will be getting some competition, if not, well, Duck will be offline pretty soon then. Either way, its success or failure will be based on features (or lack thereof), speed (or lack thereof) or maybe free gas coupons for every millionth visitor (or lack thereof).

But not on Perl being the programming language.

  1. Noticed how I called this thing “Duck”? I’m certainly not calling it DDG or DuckDuckGo, since it’s just too difficult to pronounce. And “Duck” is nowhere near “Google” — “I’ve ducked you on the internet…” just sounds weird
Tags: ,

On generating buzz

July 12th, 2010 6 comments

As is seems, my last article has left some things to be explained in more detail. Particularly, Gabor has left the following comment:

I wish people would stop blaming Perl 6 or its name on the lack of buzz around Perl. I wish people were spending that energy and time in creating more buzz.

I would like to reply to that quote by this post, since I feel that the importance of the problem has not been understood in its completeness.

The key misunderstanding here is: You can only generate buzz for something new. Let’s just make a small practical example, shall we? Let’s go to programming reddit and pick something up from the front page. How about Little known C# feature, Conditional attributes.? What do you feel when you read that article?

I can tell you what I feel: I don’t care at all. It’s nice to know that C# has that kind of attributes, but I still don’t care because of one single fact: I’m not going to programm C#. That language is out of my scope, because an immediate connection from C# to .NET to Microsoft to Windows is established and I’m a Linux/FOSS guy. Same goes for Java — however nice the newest Hibernate or JSF or jBPM or whatever might be, I’m not interested, since I try to avoid Java, because the word that I associate with Java is “restriction”1.

Same goes for Perl[56]. You can generate as much buzz about Perl as you want, but “Perl”, as I explained previously, is a brand with definitive associations, namely “ugly” and “incomprehensible”. It doesn’t even matter which version, Perl is Perl, right? So every article about some cool new technology in Perl, be it Catalyst or DBIx::Class or Net::Twitter, will be dismissed with a comment “Yeah, it’s nice, but who’d want to code Perl nowadays?”

It’s the same problem Steve Yegge pointed out in his talk which I linked to from my last article: “Java is my father’s language, I won’t use that”. Same goes for Perl — it’s an old language and usually an old language is considered crufty and inflexible, even though it’s untrue for at least Perl and Common Lisp.

So basically, buzz for old stuff doesn’t matter. New stuff can be efficiently buzzed — look at all the attention Cassandra and all other NoSQL database engines are getting. Buzzing only matters when the stuff is new, at least from one’s point of view. And here lies a problem: Perl is extremely well-known. Just like COBOL, Perl is known by the name, even though most people can’t usually tell anything useful about the language itself. Only that they wouldn’t want to program either of these languages, since they’ve heard that they are awful. So barely anyone would consider Perl a new development and therefore a common perception exists which makes all buzzing about Perl moot.

But how can we revive Perl and show the masses that it’s still alive and kicking? We’ve got several options. Educating is probably the most time-consuming and probably also the most useless method — for one reeducated person you get couple of hundreds who’ve learned Perl is ugly. Rebranding might be a good way. Writing articles for popular magazines would probably help a lot, especially if someone were to write an article introducing a totally new and powerful programming language, revealing only at the end that it’s actually has been Perl all along. Perl desperately needs new books (I’m awaiting “Modern Perl Book” eagerly) and also a lot of showcases.

But there is really no patented recipe — Perl currently sits in a self-inflicted branding trap and it’ll be a hard ride to get out2.

And Perl 6 is not helping it at all.

  1. and also “Eclipse”, but that’s another story
  2. Maybe TPF should hire a marketing/branding consultant?
Tags: ,

Perl’s visibility and branding

July 11th, 2010 6 comments

In the recent days, several bloggers (including chromatic) have talked about Perl’s visibility in the world. Here are some thoughts of mine on Perl’s marketing and connected topics.

Self-referential promotion

I feel that most Perl marketing strategies currently in use are misguided — they appeal to people already using Perl. First and most important, the Ironman blogging competition. Its original aim has been “to promote the Perl language, and encourage more people within the Perl community to promote the language to the world outside the Perl echo-chamber”. I honestly still don’t understand why blogging about Perl in a self-contained blogging system has any influence on the people outside. Yes, we might actually get higher Google rankings, but does it help Perl itself in any way? Only people who are actively searching for Perl might find something interesting, but they won’t until the thought of actually using Perl arises. Instead, we should be writing articles and sending them to the (online) magazines about general IT or general programming, not dedicated Perl magazines. Where’s that “Ironman Article of the Month” prize I keep hearing about? We could use those articles for promoting Perl outside of its own eco-system.

The other proposal is placing “Made with Perl” banners all over your website. Seriously, do I even need to mention why this is bad? Normal people don’t give a damn about what technology your site is made with. Do you see any “.NET FTW!” banners on stackoverflow.com? Any “Proudly made with TextMate and Rails” buttons on GitHub? They are not there, because they don’t matter to the visitors.

If we want more people to choose Perl for their projects, we need to showcase Perl. Projects using Perl, be it DuckDuckGo or BBC or anything else1 should be referenced from one single place and this place should be the frontpage of perl.org. Ruby on Rails’ front page prominently presents sites using Rails, Django has the same thing on the front page, PostgreSQL has a “Featured User” front page column. Catalyst on the other hand, just publishes some testimonials by people without referenced project or company, perl.org has a huge “20,000 CPAN modules” heading, which makes me feel like “20k of what?”. Underneath a line “That’s why we love Perl”. Yeah, right, we do, does the visitor? Such introductory websites should become more embracing and welcoming to the newbies and not look like some kind of bizarre cult to them. This part of self-promotion is extremely important, first impression counts.

Talking to others

When you talk to people about stuff you make in Perl, you mostly get some condolencing faces and a question “Couldn’t you take Python or Ruby”? Many other languages (just like Python or Ruby), have become popular with a tagline “Obsoleting Perl”, because they all thought a replacement was desperately needed. It doesn’t even matter that current Perl is more flexible or better or faster or cleaner than Python or Ruby or for that matter Perl itself from 2000. Perl has a reputation, it has become a brand name, a brand name that has negative connotations.

Have you ever watched Steve Yegge’s excellent branding keynote? If not, you should do so immediately. Even though he mentions Perl several times, another quote should stay in your mind:

    Who remembers GTE? [...] GTE was for some time the most
    reviled brand. Why? [...] their infrastructure sucked and
    their service sucked and everybody hated them. The word for
    GTE in consumer's mind was "suck". [...] They've spent a
    billion dollars upgrading their infrastructure and service
    until they were actually the best in the United States. And
    ... the people still thought they sucked, of course. [...] The
    problem is: [a brand] is a const identifier. [...] GTE was
    like "we are screwed, our brand is in the toilet". So they
    went to this marketing agency and they asked [...] "If your
    service is bad today and perfect tomorrow, how long does it
    take to change people's mind about your brand?" [...] And the
    answer came back after much study that it takes *one
    generation*. And the GTE was like "Screw that!" and changed
    their name to Verizon.

This is exactly the situation Perl is currenty in. Most people associate Perl with “scripting” or “system administration” if they mean good or “ugly” or “write-only” otherwise. And since Perl has been on the market for a long time, it’ll probably take a bit more than a generation of new programmers to get away from that image. The bad news are: very few new programmers will ever get a chance to see Perl’s bright side — Java and PHP are just excellent at recruiting newbies.

So we got to educate people about Perl as soon as it gets and show them the good sides and compare it to Java and PHP and Python and Lua etc. But in my opinion, there is one big problem with that: Perl 6.

Perl 6

Perl 6 actually warrants a separate rant about its whole structure and design, but one thing is certain: it will have the same branding problems as Perl 5. They should have called it “Rakudo” or “Parrot” or anything else, just not “Perl” and it might have been (or maybe it still will be) a successful language.

When you tell people there is a Perl 6 coming, which will be somewhat incompatible to current Perl 5 and won’t have any of the libraries or frameworks, you lose them immediately, since it’s not feasible to build something new on an obsolete technology. Perl community can’t afford not getting more people involved and it can’t afford waiting for Perl 6. It should rather use all the power it can to continue improving Modern Perl. A question still remains whether Perl 6 will be needed at all if Modern Perl implements all of its tasty features.

In a way, Perl 6 drama reminds me of another language which is “worth learning [tm]” — JavaScript2. Currently implemented version of the standard is ECMAScript 3. The most recent standard version is ECMAScript 5, released in December 2009. An ECMAScript 4 has never been released, there has only been a draft of a completely incompatible overhaul of the standard, which promised to clean up the whole mess of browser scripting. Needless to say: it failed miserably, in part because of a necessary complete reimplementation and was replaced with backwards-compatible ECMAScript 5, which corrected the most obvious stupidities of the original standard and made people happy enough. To me, ECMAScript 4 has a lot of similarities to the aims of Perl 6, while ECMAScript 5 is more like Modern Perl.

Marketing Modern Perl

In my opinion, Modern Perl is the only one that should be marketed. Remember that GTE/Verizon example? While I think a new name for Perl 6 is needed, it’s possible to create a new brand for modern Perl 5 without completely sacrificing the name. How about Modern Perl for an official name for Perl 5?

5.12 could be the last official Classic Perl version and 5.14 would be Modern Perl 1.0. New filename extensions .mpl and .mpm could be introduced, which would enable warnings and strictness and Perl::Critic warnings and other new features by default, while .pl and .pm would still trigger Classic Perl mode. Disturbing features could be deprecated in the Modern Perl while retained in Classic. New features could be tested out as CPAN modules and officially introduced to the core in Modern mode.

I know it’s difficult on the compiler side. However, it’s a lot better than creating a complete new language over a decade and then creating a whole new set of libraries over another decade. We don’t actually need a perfect language, which has all the bells and whistles and fixes all of the mistakes of the previous iteration, I think the 80%/20% rule should also apply to Perl. Continuous improvement has done Perl 5 a lot of good in the last years and I think this process should be continued. With a bit of luck, Perl will then be able to live on and innovate for the upcoming decades.

  1. Oh, while we at it: did you know that Perl, Moose and Catalyst is used as a backend for a website with more traffic and CPU load than Google and YouTube? It’s called YouPorn, google it. It’s huge, it’s using the most modern Perl you can get. Also, booking.com and xing.com are huge sites running on Perl.
  2. About 99% of programmers still believe JavaScript is a toy language, much like VBA, something you learn because you can’t program a web application otherwise. Only few people really know what JavaScript is capable of and even less people know how to program JavaScript correctly and cleanly. Those people have to fight ignorance every day and prefer to talk about ECMAScript 5, just like Perl hackers prefer to talk about “Modern Perl”.

Rant #213, in which a 1200+ pages book is judged and dismissed by its table of contents

July 2nd, 2010 14 comments

DISCLAIMER This post is highly subjective, biased and mostly unfunded. Thank you for understanding.

A new book on Perl has been published. No, it’s not “Modern Perl” by chromatic (at least not yet) and also not the “Effective Perl Programming 2nd edition”. What I’m talking about is “Der Perl-Programmierer” (that’s German for “The Perl Programmer”) by Jürgen Plate. Now, I must admit I know nothing about Jürgen, and he is probably a good professor, teacher and maybe even a good Perl hacker and I don’t want to say anything bad about him. I didn’t even read his book — I’ve only seen the table of contents. Despite that, I wouldn’t recommend it, I’ll probably even recommend strongly against it and I would even recommend that everyone in the Perl community makes sure this book doesn’t get a wide reading.

My main problem with this book is that it’s totally unhelpful for Perl at large. It’s a 2010 work, but it seems very unmodern, at least not “modern” as in “modern Perl”. “Perl Best Practices” have done a tremendous job separating Perl’s functionality into “use it” and “use it only when you know what you are doing” and my personal plan for teaching Perl would be teaching newbies the “good” stuff only — and this book seems to fail miserably at this.

Here is what I don’t like in detail:

  • In the chapter about control structures, the goto statement is getting a separate subchapter. Call me purist, but I don’t believe that the only useful form of goto, the one with a subroutine reference, would be described, especially since subroutine references are explained some 70 pages later. The other goto variants should definitely be off the radar for newbies.

  • The eval function is getting one single page of attention or maybe even less. I wonder whether the $@ localizing problem is explained somewhere in the book?

  • Whole five pages are dedicated to the “Perl 6 outlook”. How much useful information about Perl 6 can you actually put in that space without alienating people? I would have understood something like “There is Perl 6 around, read all about it in our other book” in the introduction, but a separate chapter of five pages?

  • Chapter 2 is “Debugging”, including “Perl Debugger”. Way before the chapters on packages, modules and even complex data structures. Needless to say, that whole chapter is ten pages long, probably including the infamous “This page has been intentionally left blank”.

  • I’m not entirely convinced that a chapter on documentation should go before the one on packages, yet it does.

  • The chapter on regular expressions is available for free reading. It’s horrible, from what I can see.

    First, we are told that Kleene created them and “adapted for the information technology”. Fine, you could think, never mind the theory, not everyone needs to know about Chomsky hierarchy. However, a paragraph later you read “Regular expressions in Perl are based on an NFA (non-deterministic finite automata), which does the following….” Besides that, the author tells us that Perl only has matching (// or m//) and replacing (s//) regexps only to introduce tr// later on.

    Great examples for replacement are $string =~ s/ä/ä/g; for HTML, which is still common in Germany, but utter bullshit even there. Looking for something in a file? “For this special case (and for searching in arrays) there is a function called grep which we’ll discuss later, since we are solving the problem with simple pattern matching”. Yes, it’s solved with a while loop — and we can consider the readers lucky, since the author only wants the matching lines printed. Otherwise we’d probably get the infamous for-push or a while-push pattern. Oh and by the way, tr// gets almost three pages of treatment, almost as much as m// and s// together.

    Pure evil follows. Words can’t describe my feelings about the ignorance of Unicode in 2010. Probably the same words that go for Whitesmiths indentation of the code in that whole chapter.

Whether `/.+/` accepts umlauts or not depends on your setup. Therefore don’t forget to put the following lines at the beggining of your program:
1
2
3
4
use POSIX;
use locale;
# In case it's not yet set:
setlocale(LC_CTYPE, "de_DE.ISO-8859-1");
  • Following the “Regexps” chapter are the ones called “Program configuration” (either with Perl code or with key-value pairs, no idea what they tell you there), “System information” (stuff like getpwuid) and “Processes and signals” (fork and waitpid). I consider it by any means advanced material, which shouldn’t come this early in the book. The author probably has a strong system administration background, so I guess that would explain it.

  • Whole ten pages on “Internationalization”, which seems to only cover locales. I’m happy I don’t have the book here, since I’d probably go killing people. Again, i18n is a must-know topic in 2010, but covering it on ten pages won’t do it. Even the Unicode article by Joel Spolsky is probably a bit longer than that. No mention of Locale::Maketext (thank God!) but also none of any other translation frameworks. Not important I guess…

  • The “Modules” chapter begins with a “Using modules” subchapter, which makes me wonder — how did the author manage to recommend many CPAN modules without explaining how to use them? And he did recommend a lot in the earlier chapters, starting with List::Util, Regexp::Helper and Data::Dumper.

  • Whole 34 pages on object oriented Perl in a 1200+ pages book. Classic object oriented Perl. No Moose, no Class::Std or any other bless-free framework, as far as I can see. This chapter will be the one I’d be wanting to look into — I can’t believe you can write a book on Perl in 2010 and not mention Moose.

  • All of the following chapters are dealing with “Practical Perl”, which is more or less 800 pages of “look what I think is cool”. If he had taught people to use CPAN properly, there would be no need for that. Topic selection is more or less random — while I find “Text manipulation” something that’s probably worth it, taking almost 100 pages for “GUIs with Tk” is a waste of time and paper. Same goes for calling a chapter “MySQL Databases”, introducing MySQL in general for about 60 pages, probably mixing up MySQL and generic SQL while at it, and then trying to fire up DBI with MySQL over 30 pages. “Socket programming”, “E-Mail with Perl” (including writing a small MTA), “CGI and HTML” (yes, CGI as as Perl module too, no Catalyst, Dancer, CGI::Application or anything else of the sort, but instead including a sub-chapter on CAPTCHAs), “Math with Perl” and even “Hardware Programming with Perl” (first sub-chapter of which is called “you must be root”) are all interesting topics, but are they really needed in a Perl book intended for newbies?

As I went through the TOC, my WTF per second ratio went almost through the roof. This book could have been really good in 2003 or even 2005. In 2010 however, it is harmful for Perl the language, for Perl the community and ultimately for the newbies themselves, who will be easily alienated by this book — it’s not like there were no alternatives for Perl around. It’s not 1998, right?

Maybe we should forget about writing a Perl 6 book for a while and start by writing a Perl 5 one? I for myself still hope for some good mixture of PBP, HOP and chromatic’s new one. The one I can then whole-heartedly recommend — both Camel and Llama books seem a bit dusty right now.

The time has come…

July 10th, 2009 No comments

… and although it’s a bit rough along the edges, today’s Chromium build for Ubuntu (3.0.194.0~svn20090710r20374-0ubuntu1~ucd1) has Flash support! Gotta love those ads all around the pages :)

If you do a corporate CPAN, please do it properly!

May 25th, 2009 2 comments

Coming from the “bad implementations of good ideas” department: Corporate CPAN.

It’s a good idea, but a completely wrong solution. If someone is going to implement this, the proper way is not creating a PAUSE and CPAN mirror or anything like it. Corporate requirements are different. What you really need is a revival of http://debian.pkgs.cpan.org/ and also complete repositories for RHEL5, Debian, Ubuntu LTS, SLES9/10 etc. Those can be mirrored at will by usual and well-tested tools.

Thank you.

Update: Removed last sentence, it’s been a “thinko” on my side :(

PBP 2nd ed.? Just open it up!

May 22nd, 2009 4 comments

There are some long-overdue calls out there calling for the second edition of “Perl Best Practices”. And more often than not, people having good ideas don’t realize they are proposing a dreadful solution.

I do consider PBP a good collection of recommendations. But sometimes, I loathe the impact it has on the community. It’s the Perl bible, Part II (just after the Camel book) and many people just go on believing in it. What we get as a result is a free community’s dependency on a non-free book.

At my place of work, we run Perl::Critic on SVN commits with PBP rule set. Everytime it finds something, it tells me to “Look at page XX of PBP for details”. So much for online help… Yes, I do have a copy of PBP in the office, even on my table. Sadly, it’s a german translation which has slightly different page numbering.

Why does a code analyzer cite some particular book edition? Imagine a second edition of PBP coming out. What do we get then, a command line parameter for book version? For a translation thereof? How many modules will we have to update? Usually a book gets updated when code changes, not other way around.

An even more heretical question: what happens if I don’t actually own a copy of PBP? Am I doomed to stay ignorant of best practices just because I’m just starting to learn and can’t or don’t want to shell out money for a book?

Other language communities are different. Both Ruby and Python give you extensive online documentation and also some dead-tree docs if you need them. But you don’t have to buy a book just to learn some best practices, those are readily available in blogs, wikis and what not. Perl’s community seems to trust in holy cows (camels or dogs for that matter) and just keeps insisting on buying books. “Modern” is something very different, though.

I know I can read PBP on Google Books, with several dozens of invisible pages. But PBP should have been online a long time ago. It should have been a community work from the beginning, since best practices is one of the first things a newbie needs.

There is only one way for PBP for the future: O’Reilly and Damian should open it up, just like “Higher Order Perl” has done. Make it downloadable at first, make it a community-driven project later. O’Reilly could even release a dead-tree edition every now and then, but the first step would be to free Damian from all the “please update PBP” e-mails — it’s perfectly probable that he doesn’t have time to do so and even more probable, no personal interest in bringing the second edition out. In that case, maybe he should raise his voice and work with O’Reilly on making that vastly important book a community project.

The CPAN’s new clothes

May 13th, 2009 13 comments

I must admit, I’m a bit underwhelmed by Enlightened Perl’s Iron Man competition. They’ve essentially replaced Planet Perl because every blogger from the Planet now also gets syndicated to the Iron Man (could you please work together guys and kill one of the planets?) However, the blog posts’ medium quality hasn’t changed at all — and neither have the subjects. It’s still the same: some “aren’t you using Perl 6 already? 10 reasons why you should!”, some “all hail Moose!”, some “new Padre released, it’s just as powerful as Emacs, but only for Perl stuff”, and also some “Did you know CPAN rocked?” That last bit of sensationalism is getting on my nerves.

Yes, I know, CPAN is great. I even agree. CPAN is great because of the sheer amount of data collected. But it’s a complete disaster otherwise. I might be a bloody newbie in Perl world, but everytime I’m confronted with CPAN, I’m lost and confused — and there is a major flaw in CPAN causing that feeling: every module in CPAN is essentially an open-source project, but nothing at CPAN works under this assumption. It’s full of closed down silos.

Let’s start with a simple example: toying with CPANPLUS::Dist::RPM (or maybe it’s this link, who knows which is the canonical one) at work I’ve noticed it hangs sometimes, consuming 100% of CPU essentially doing nothing worthy. Let’s now assume I’d like to investigate this problem, but I don’t know if this is a bug or a mistake on my side.

So I go to the CPAN page of the package. Oh, there is a discussion forum, let’s click on that! Too bad, it’s broken. Bug reports? Oh yeah, there are whole three of them — none of which is my problem as far I can see. And I can actually barely see, since the visual component of that bug tracker makes Bugzilla of 1998 look good. But I still not sure that’s a bug, so I wouldn’t file one. What’s next? Maybe there is something new and relevant in development code in the revision control system? Oh wait, there isn’t any. CVS, SVN, Git, Mercurial, anything? Nope, no such thing on CPAN. Only release tarballs and some weird release differ tool. No revision control for an open source hosting in 2009, am I looking right?! Only way to ask something is to ask the author per e-mail? What about collaboration, patches, interactive community process for single modules?

Dear Github guys, if you happen to read this, please host the CPAN for us! Revision control, bug tracking, code review, documentation parser — if you could add some discussion forums, you’d be a perfect CPAN hoster!

So CPAN is so far: EPIC FAIL in discussion forums, somewhat FAIL in bug reports, EPIC FAIL in encouraging open development. Those are basic open source functionality nowadays, you know. And those are not nearly scratching the surface of critisicm.

Every other page on CPAN is different in design and interaction, there is no common and consice web interface, many different docs/search/rating mirrors which ultimatively produce a lot of Google spam. An awesome lot of cruft, a lot of broken modules which pop up prominently as first search result, no clear indication if a module is abandoned or actively developed. Even the most potentially useful features like dependencies’ resolution are crippled — dependencies work only in one direction, whenever I’d like to know how people actually use some module, I’m lost again. This is CPAN of today, confusing and rusty. CPAN is naked and it seems nobody wants to point that out. I do not want to think that nobody actually notices.

The situation with CPAN is symptomatic for the whole Perl community. Whether it’s Perl.com, Perl.org, Perl Mongers site, Perl Monks, use Perl or CPAN, it’s always the same: unreadable and misaligned content, incomprehensive navigation and straining colors, self-representation on the web coming straight from 1999 1. All the good code in the world and the power of the language won’t help anyone as long as people are alienated by ugly tools, visually and technically. Why can’t CPAN have the visual docs design from http://perldoc.perl.org, which at least features a syntax highlighter? Some CPAN mirror I land on every now and then from Google is even uglier than the one at http://search.cpan.org. Do we care at all about how those sites look? Do we care about fellow Perlers, about how hard they have to look for information? Why isn’t there some central site for Perl information? Why is every Perl project so independant that things like Perl Iron Man happen without cooperation with Planet Perl? 2

Perl community has so many possibilities but most of them stay unused. Most people are probably content with what they have and wouldn’t want to change anything. It’s fine, Perl’s way certainly supports that, but then we can forget about Perl revival. It’d be a shame, but we’d have only us to blame, not some superstitions about Perl being a “write-only language” or “ASCII soup”. The first impression counts and many newbies might not make it to the code at all — they’ll struggle with finding tutorials first. They won’t find out why Perl is great and will leave for other, probably inferior, languages, because they’ll be reading some ugly outdated quickstart documentation from 1997. They won’t find the shiny things, but they should be able to — as their first Google search result.

  1. Let’s not forget the sheer number of sites a Perler might need to visit to get all the information
  2. Actually there is an easy explanation: at CPAN, if you have a proposal or a patch, you can’t actually do anything more useful than fork and upload your own package to CPAN. Same goes for Planets — open-source type cooperation seems mostly unknown to Perl 5 community. This changes with Perl 6, but it needs to change for Perl 5 too.
Tags: , ,

On the state of i18n in Perl

April 26th, 2009 5 comments

The following text represents an effort to describe the situation I’ve encountered when I came to the Perl world last December. I’ve done some translating for the Debian project and I was a bit shocked about the state of Perl’s i18n. I have to admit, I’m still an inexperienced hacker, but I wanted to write this article to raise some awareness for the issues described if I’m right and learn something new if I’m wrong. Anyway, I tried to keep this article constructive and it’s still just my opinion, so please comment appropriately.

Disclaimer: I’m essentially talking about l10n, but most people know it as i18n, so I’m keeping “i18n” in text.

The i18n problem

When it comes to making your application tranlatable in Perl, there are actually two schools of doing this: via Maketext and via GNU gettext. GNU gettext is the most known software translation tool used in most open-source projects while Maketext is a child of the Perl world. And the bad thing is: Maketext is currently more popular, but if you are using Maketext for making your application translatable, you are doing it wrong!

Let’s look at how Maketext works, according to its documentation and contrast that with the gettext way.

Maketext manual defines the process as following (quoting freely):

  • Decide what system you’ll use for lexicon keys (i.e. base language)
  • Create a class for your localization project
  • Create a class for the language your internal keys are in
  • Go and write your program
  • Once the program is otherwise done, and once its localization for the first language works right (via the data and methods in Projname::L10N::en_us), you can get together the data for translation.
  • Submit all messages/phrases/etc. to translators
    • Translators may request clarification of the situation in which a particular phrase is found
    • Each translator should make clear what dependencies the number causes in the sentence
    • Remind the translators to consider the case where N is 0
    • Remember to ask your translators about numeral formatting in their language
    • The basic quant method that Locale::Maketext provides should be good for many languages. […] For the particularly problematic Slavic languages, what you may need is a method which you provide with the number, the citation form of the noun to quantify, and the case and gender that the sentence’s syntax projects onto that noun slot.
  • Once you’ve localized your program/site/etc. for all desired languages, be sure to show the result (whether live, or via screenshots) to the translators.

There is a lot of sense in this and this has certainly been valid back in 1999, but a lot of work in this process is not specified. For example, the translation process itself is questionable:

  • How do you “Submit all messages/phrases/etc. to translators”?
  • How do you integrate translations back from translators?
  • How do you resubmit translation strings if they change?
  • How do you communicate “situation in which a particular phrase is found” (i.e. context)?
  • What happens if one phrase has to be translated differently depending on context? How does one implement that in a module properly?
  • How does the translator “make clear what dependencies the number causes”? At what extents does that happen? Will the developer even understand him at all?
  • Does the programmer really have to understand all of implications of each language implemented? Should every programmer on the team understand them?
  • Who actually implements that “quant” method? How? What about languages with exceptions?

One basic, but fatal, mistake Maketext does is off-loading a lot of linguistic work onto programmer.

  • One particularly important point is the plural forms support (‘1 apple’, ‘2 apples’), which is important for many languages outside of USA and Western Europe . Maketext requires you to write a quant function that gets a string and a number as parameters and does some voodoo to produce the right string. Voodoo is undefined. In gettext it is — a formula for producing plural forms is defined which selects one of provided plural phrases.
  • No translator in his sane mind will ever write a Perl module for a language (they aren’t programmers, remember?), the programmer will have to do it and will also have to understand the implications.
  • The quant notation ("Your search matched [quant,_1,document]!") foolishly assumes word order is the same in all languages. Implementing a quant method properly would require passing the whole sentence into the function and doing a complete linguistic transformation which is highly non-trivial and better done by human.
  • Most of those linguistic “conventions” like number formatting or plural forms do not change over time and can be compiled at one place. One such place is Unicode’s CLDR project, which also includes plural form building and number/date formatting among other country- and language-dependant data.
  • It can’t even be assumed that the translators actually know all of these conventions! They might assume they know them, but translator is not necessarily doing translations for a living, he might be a volunteer, like in most open source projects. Imagine what happens when an amateur translator explains the inner workings of his native language to a programmer?

Compared to this gettext has a saner, more practical approach — they provide a standardized translation string format, handle updates of message catalogs cleanly, provide all necessary tools for message extraction, don’t require any additional modules, work mostly language-agnostic, provide contexts and translators’ comments, even plural forms calculation formulae are explicitely noted in the manual. It also emphasizes asynchronous translation: translation strings can be extracted and imported at any time in the lifecycle of a project. A developer essentially has to do the following:

  • Implement using gettext in his project (depends on the language used)
  • Mark extractable strings
  • Run extraction and merging scripts (mostly included by gettext)
  • Submit translation files to translators
  • Copy received translations back into the project

gettext of course is not perfect. It lacks several vastly important features, like proper gender support (e.g. “He was born” and “She was born” is different in Russian). But it generally follows the “It mostly works” principle, making features needed 95% of the time available. Workflow tools make using gettext a snap. Compared to Maketext it is also easier to support for the programmer and easier for the translator to produce translations. The dreaded quant function actually makes using Maketext properly for translations impossible.

Apart from those techical shortcomings, there is a bigger threat.

Community separation

Remember TPJ13? TPJ13 is an excellent summary of i18n problems, which every developer, even non-Perl one, should read. It’s solution part is hopelessly out-of-date — don’t forget, TPJ13 is getting ten years old this year. Back in 1999 gettext hasn’t had any plural forms support and also lacked many other features so the authors’ point used to be valid at that point. However, gettext had implemented its support for plurals rather fast and at that time Maketext should have been retired immediately. Sadly, this has not happened.

That misunderstanding haunts us until this day. Every novice Perl hacker is introduced to TPJ13 and tends to believe Maketext is the way to go. Failing to see its shortcomings however, yields in well-meant but still failed creations like Locale::Maketext::Lexicon which tries hard to bring the world of gettext to Maketext-infected minds. What we get is crazy stuff like (verbatim from the POD)

#: Hello.pm:11
msgid "You have %quant(%1,piece) of mail."
msgstr "Sie haben %quant(%1,Poststueck,Poststuecken)."

instead of a proper (German spelling corrected a bit):

#: Hello.pm:11
msgid "You have 1 piece of mail."
msgid_plural "You have %d pieces of mail"
msgstr[0] "Sie haben 1 Poststueck"
msgstr[1] "Sie haben %d Poststuecke"

The former has virtually no tool support (not even gettext’s extraction routine xgettext), but extraction is supported by home-grown xgettext.pl (notice the .pl suffix). And there we have some fatal stuff going on:

  • Locale::Maketext::Lexicon is considered the solution for using Maketext with .po files.
  • Neither Locale::Maketext::Lexicon nor xgettext.pl have any notion of proper gettext plurals
  • .po files created by xgettext.pl are not fully supported by translation tools like PoEdit, KBabel, Launchpad Rosetta, 99translations.com etc.
  • Catalyst::Plugin::I18N, the only i18n plugin for the extremely popular Catalyst web framework, is based on Locale::Maketext::Lexicon
  • xgettext.pl has support for Template-Toolkit templates, YAML, FormFu and Mason. Original gettext’s xgettext does not.

So there we have it: Perl hackers mostly use tools which are unsuitable and incompatible with the rest of the world without knowing it. The right tools actually can’t help them become “sane”, since xgettext can’t extract all those formats which xgettext.pl can and I don’t think that’ll change sometime soon.

Alternatives

Luckily, some hackers have produced a libintl-perl library which basically re-implements GNU gettext in Perl. There is a pure Perl implementation of message catalogs called Locale::gettext_pp, an XS version called Locale::gettext_xs (Warning: this one has some problems with mod_perl2!), a Perl wrapper around that (Locale::Messages) and building upon that an excellent Perl-y implementation of the framework Locale::TextDomain. These tools are worth your time.

Even though we have Locale::TextDomain, what should be done to amend the whole Maketext situation? I’d propose several possible actions:

  • Read the GNU gettext Manual to fully understand what these tools can do for you
  • Educate your colleagues, tell them about this article and explain the differences
  • If you can, port your current code to Locale::TextDomain
  • Don’t use Maketext for any new code
  • Update important code using Maketext like the Catalyst plugin mentioned above to support gettext
  • Update TPJ13 to reflect the situation
  • Port extraction routines from xgettext.pl to xgettext

This and general awareness of the issue should bring Perl’s i18n back on track. Thank you for reading!