RFC822: email address validation with regular expressions

as you know, this is how we validate an email address:

(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t]
)+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:
rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(
?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ 
t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-
31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*
](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+
(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:
(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z
|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)
?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[
 t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)
?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t]
)*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[
 t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*
)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t]
)+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)
*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+
|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:r
n)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t
]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31
]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](
?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?
:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?
:rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 00-31]+(?:(?
:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?
[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 
00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|
.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>
@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"
(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t]
)*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?
:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[
]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 00-
31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(
?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;
:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([
^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:"
.[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[
]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".
[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]
r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 
00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]
|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 
00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|
.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,
;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?
:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*
(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".
[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[
^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]
]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(
?:(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(
?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[
["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t
])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t
])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?
:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|
Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:
[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[
]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)
?[ t])*(?:@(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["
()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)
?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>
@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[
 t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,
;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t]
)*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?
(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".
[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:
rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[[
"()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])
*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])
+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:
.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z
|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(
?:rn)?[ t])*))*)?;s*)

can you please explain to me what is going on here?

are we looking at a string and deciding whether it is an email address?

can you at least explain the first line:

(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:rn)?[ t]

can you give me an example of an email address that is rarely seen, yet it is valid??


I think you should forget about this. If you're trying to be better at regular expressions, that is one thing, and there are probably better ways of learning. Otherwise, validating email addresses is an extremely complex and error-prone activity, and I'm not aware of any cut-and-paste solution that will completely cover all the corner cases. Down that path lies madness. If you have an Internet connected application, it's better to validate an address by actually sending a confirmation email.


If you want to understand what is going on, you should look at a decent module such as Email::Address and note how the pattern is built from its constituent pieces:

my $CTL            = q{x00-x1Fx7F};
my $special        = q{()<>[]:;@\,."};

my $text           = qr/[^x0Ax0D]/;

my $quoted_pair    = qr/$text/;

my $ctext          = qr/(?>[^()]+)/;
my ($ccontent, $comment) = (q{})x2;
for (1 .. $COMMENT_NEST_LEVEL) {
  $ccontent = qr/$ctext|$quoted_pair|$comment/;
  $comment  = qr/s*((?:s*$ccontent)*s*)s*/;
}
my $cfws           = qr/$comment|s+/;

my $atext          = qq/[^$CTL$specials]/;
my $atom           = qr/$cfws*$atext+$cfws*/;
my $dot_atom_text  = qr/$atext+(?:.$atext+)*/;
my $dot_atom       = qr/$cfws*$dot_atom_text$cfws*/;

my $qtext          = qr/[^"]/;
my $qcontent       = qr/$qtext|$quoted_pair/;
my $quoted_string  = qr/$cfws*"$qcontent+"$cfws*/;

my $word           = qr/$atom|$quoted_string/;

etc etc etc.

my $simple_word    = qr/$atom|.|s*"$qcontent+"s*/;
my $obs_phrase     = qr/$simple_word+/;

my $phrase         = qr/$obs_phrase|(?:$word+)/;

my $local_part     = qr/$dot_atom|$quoted_string/;
my $dtext          = qr/[^[]]/;
my $dcontent       = qr/$dtext|$quoted_pair/;
my $domain_literal = qr/$cfws*[(?:s*$dcontent)*s*]$cfws*/;
my $domain         = qr/$dot_atom|$domain_literal/;

I saw this expression yesterday

/^([w!#$%&'*+-/=?^`{|}~]+.)*[w!#$%&'*+-/=?^`{|}~]+@((((([a-z0-9]{1}[a-z0-9-]{0,62}[a-z0-9]{1})|[a-z]).)+[a-z]{2,6})|(d{1,3}.){3}d{1,3}(:d{1,5})?)$/i

from http://fightingforalostcause.net/misc/2006/compare-email-regex.php

链接地址: http://www.djcxy.com/p/92756.html

上一篇: 使用正则表达式jquery验证电子邮件

下一篇: RFC822:使用正则表达式进行电子邮件地址验证