Home > Main > For God’s sake: there is more to the world than just USA

For God’s sake: there is more to the world than just USA

Raganwald links to an article about e-mail address validation (which is actually “checking”, rather than validating, since validating would include asking the mail server in question) using regular expressions. This website seems to be a first-stop location for every programmer wishing to check e-mail addresses entered by his visitors or finding them in a blob of text. So far nothing wrong with it, the expression even includes support for +-addressing which some prominent sites nowadays like FON fail to support.

There is just one small thing: there is more to the world than just USA and just English language and ASCII for that matter. info@müller.info is valid. info@例子.测试 is too already and such addresses will someday be commonplace. The definitive rule of the thumb for e-mail validation should be

Before you validate an e-mail address, make sure you convert it to {en:Punycode} beforehand!

Related posts

Tags:
  1. November 2nd, 2007 at 13:44 | #1

    Yes, there is MUCH more to the world than en_US. We Canadians can attest to that. I used to work with a company that has an email product, and after ten years in the business, they realized that handling 90% or 95% or even 99% of the possible emails was not enough, sooner or later a client would have a need to handle müller.info and they would have to upgrade their validity checking. They now warn people if the email address looks unusual—they don’t even call it invalid—but accept that the only way to be sure is to send the email and see if it is rejected by the host.

    What I liked about that post is that it discussed teh trade-offs, it didn’t pretend that there was a foolproof mechanism, even for US addresses.

  2. November 2nd, 2007 at 15:06 | #2

    Hey, you’re here so soon :) Thanks for your comment; I’ve actually found that article good too, I’m just worried that the first site on Google for the query “e-mail validation regular expression” does not even mention the rest of the world. There are a lot of newbies and copy-pasters who won’t even consider thinking for themselves when a solution from an “authority” is available. I’ve also contacted regular-expressions.info on this topic, hopefully they’ll correct it someday.

  3. November 3rd, 2007 at 16:20 | #3

    No, please stop. The IDN must die. It’s a bad, bad idea and it’s only implemented so that every company with an umlaut in its name must buy both domains (or, even three of them, one with an “u”, one with “ü” and one with “ue”). Hence, more money for the registrars.

    And don’t think that I’m saying this because I am a stupid American that has no idea that there are other alphabets and scripts out there. My last name has a “ń” in it. But I don’t need it in my website URL.

    I know what you’re going to say - that the CJKV folks have it even worse. That’s exactly why we must keep the domain system to ASCII. How am I supposed to copy an email address from a business card if it looks like this: info@例子.测试?

  4. Phil
    November 3rd, 2007 at 18:29 | #4

    One thing my company, and I imagine others do also, is to give people with “international” characters in their name multiple email address aliases. So Heinrich Müller would have hmüller@company.com and hmueller@company.com. This defeats both overly restrictive regexps and the inability of people with US keyboard to easily type their name, but they can still use their “proper” name with people who speak their name’s language.

  5. November 3rd, 2007 at 21:34 | #5

    jfedor: I’m actually not defending IDN in any way. I have been growing up with two writing systems and I actually feed the need for some kind of URL internationalization. However, I do not really feel that current solution will be the ultimate one. But the fact alone that it exists is enough for me to make everyone else support it so that it’s at least usable. Normal people will not be thinking about whatever technical problems there might be when registering a domain name, they’ll ask their registrar whether they can have a .de domain written like their surname and the registrar will say, yes, it’s possible. We can’t possibly make these people liable for lack of a good general solution when they try to register at some web site which tells them they don’t have a valid e-mail address. We have to support IDN, whether we want it or not. As web developers, we’ve committed to this.

  6. November 3rd, 2007 at 21:39 | #6

    phil: This is not about e-mail names (before @), but about domain names. It’s been widely criticized that one can have umlauts in their domain name now, but not in their name without considerable effort. What your company is doing is a workaround at best, consider what you’ll be printing on a business card? Two or three different addresses won’t do, in the end you’d be printing and giving away only the most commonly understandable address, which will make other aliases obsolete. No hope for solution as far as I can see now :(

  1. No trackbacks yet.