Regexp recognition of email address hard?

I recently read somewhere that writing a regexp to match an email address, taking into account all the variations and possibilities of the standard is extremely hard and is significantly more complicated than what one would initially assume.

Can anyone provide some insight as to why that is?

Are there any known and proven regexps that actually do this fully?

What are some good alternatives to using regexps for matching email addresses?


For the formal e-mail spec, yes, it is technically impossible via Regex due to the recursion of things like comments (especially if you don't remove comments to whitespace first), and the various different formats (an e-mail address isn't always someone@somewhere.tld). You can get close (with some massive and incomprehensible Regex patterns), but a far better way of checking an e-mail is to do the very familiar handshake:

  • they tell you their e-mail
  • you e-mail them a confimation link with a Guid
  • when they click on the link you know that:

  • the e-mail is correct
  • it exists
  • they own it
  • Far better than blindly accepting an e-mail address.


    There are a number of Perl modules (for example) that do this. Don't try and write your own regexp to do it. Look at

    Mail::VRFY will do syntax and network checks (does and SMTP server somewhere accept this address)

    https://metacpan.org/pod/Mail::VRFY

    RFC::RFC822::Address - a recursive descent email address parser.

    https://metacpan.org/pod/RFC::RFC822::Address

    Mail::RFC822::Address - regexp-based address validation, worth looking at just for the insane regexp

    http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

    Similar tools exist for other languages. Insane regexp below...

    (?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t]
    )+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:
    rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(
    ?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ 
    t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-
    31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*
    ](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+
    (?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:
    (?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z
    |(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)
    ?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
    rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[
     t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)
    ?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t]
    )*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[
     t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*
    )(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t]
    )+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)
    *:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+
    |Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:r
    n)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
    rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t
    ]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31
    ]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](
    ?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?
    :(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?
    :rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 00-31]+(?:(?
    :(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?
    [ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 
    00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|
    .|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>
    @,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"
    (?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t]
    )*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
    ".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?
    :[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[
    ]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 00-
    31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(
    ?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;
    :".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([
    ^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:"
    .[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[
    ]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".
    [] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]
    r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 
    00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]
    |.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 
    00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|
    .|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,
    ;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?
    :[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*
    (?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".
    []]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[
    ^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]
    ]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(
    ?:(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
    ".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(
    ?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[
    ["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t
    ])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t
    ])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?
    :.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|
    Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:
    [^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[
    ]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)
    ?[ t])*(?:@(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["
    ()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)
    ?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>
    @,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[
     t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,
    ;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t]
    )*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
    ".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?
    (?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".
    []]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:
    rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[[
    "()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])
    *))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])
    +|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:
    .(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z
    |(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(
    ?:rn)?[ t])*))*)?;s*)
    

    Validating e-mail addresses aren't really very helpful anyway. It will not catch common typos or made-up email addresses, since these tend to look syntactically like valid addresses.

    If you want to be sure an address is valid, you have no choice but to send an confirmation mail.

    If you just want to be sure that the user inputs something that looks like an email rather than just "asdf", then check for an @. More complex validation does not really provide any benefit.

    (I know this doesn't answer your questions, but I think it's worth mentioning anyway)

    链接地址: http://www.djcxy.com/p/2708.html

    上一篇: 验证电子邮件地址的C#代码

    下一篇: 正则表达式很难识别电子邮件地址?