Regexp recognition of email address hard?
I recently read somewhere that writing a regexp to match an email address, taking into account all the variations and possibilities of the standard is extremely hard and is significantly more complicated than what one would initially assume.
Can anyone provide some insight as to why that is?
Are there any known and proven regexps that actually do this fully?
What are some good alternatives to using regexps for matching email addresses?
For the formal e-mail spec, yes, it is technically impossible via Regex due to the recursion of things like comments (especially if you don't remove comments to whitespace first), and the various different formats (an e-mail address isn't always someone@somewhere.tld). You can get close (with some massive and incomprehensible Regex patterns), but a far better way of checking an e-mail is to do the very familiar handshake:
when they click on the link you know that:
Far better than blindly accepting an e-mail address.
There are a number of Perl modules (for example) that do this. Don't try and write your own regexp to do it. Look at
Mail::VRFY
will do syntax and network checks (does and SMTP server somewhere accept this address)
https://metacpan.org/pod/Mail::VRFY
RFC::RFC822::Address
- a recursive descent email address parser.
https://metacpan.org/pod/RFC::RFC822::Address
Mail::RFC822::Address
- regexp-based address validation, worth looking at just for the insane regexp
http://ex-parrot.com/~pdw/Mail-RFC822-Address.html
Similar tools exist for other languages. Insane regexp below...
(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t]
)+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:
rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(
?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[
t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-
31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*
](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00- 31]+
(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:
(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z
|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)
?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:
rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[
t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)
?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t]
)*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[
t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*
)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t]
)+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)
*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+
|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:r
n)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:
rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t
]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00- 31
]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](
?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00- 31]+(?
:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?
:rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 00- 31]+(?:(?
:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?
[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[]
00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|
.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>
@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"
(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t]
)*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?
:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[
]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 00-
31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(
?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;
:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([
^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:"
.[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[
]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".
[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]
r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[]
00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]
|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[]
00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|
.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,
;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?
:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*
(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".
[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[
^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]
]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(
?:(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(
?:rn)?[ t])*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[
["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t
])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t
])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?
:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|
Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:
[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[
]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)
?[ t])*(?:@(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["
()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)
?[ t])*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>
@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[
t])*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,
;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t]
)*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?
(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".
[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:
rn)?[ t])*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[[
"()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])
*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])
+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:
.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z
|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(
?:rn)?[ t])*))*)?;s*)
Validating e-mail addresses aren't really very helpful anyway. It will not catch common typos or made-up email addresses, since these tend to look syntactically like valid addresses.
If you want to be sure an address is valid, you have no choice but to send an confirmation mail.
If you just want to be sure that the user inputs something that looks like an email rather than just "asdf", then check for an @. More complex validation does not really provide any benefit.
(I know this doesn't answer your questions, but I think it's worth mentioning anyway)
链接地址: http://www.djcxy.com/p/2708.html上一篇: 验证电子邮件地址的C#代码
下一篇: 正则表达式很难识别电子邮件地址?