What characters are allowed in an email address?
I'm not asking about full email validation.
I just want to know what are allowed characters in user-name
and server
parts of email address. This may be oversimplified, maybe email adresses can take other forms, but I don't care. I'm asking about only this simple form: user-name@server
(eg wild.wezyr@best-server-ever.com) and allowed characters in both parts.
See RFC 5322: Internet Message Format and, to a lesser extent, RFC 5321: Simple Mail Transfer Protocol.
RFC 822 also covers email addresses, but it deals mostly with its structure:
addr-spec = local-part "@" domain ; global address
local-part = word *("." word) ; uninterpreted
; case-preserved
domain = sub-domain *("." sub-domain)
sub-domain = domain-ref / domain-literal
domain-ref = atom ; symbolic reference
And as usual, Wikipedia has a decent article on email addresses:
The local-part of the email address may use any of these ASCII characters:
A
to Z
and a
to z
; 0
to 9
; !#$%&'*+-/=?^_`{|}~
; .
, provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (eg John..Doe@example.com
is not allowed but "John..Doe"@example.com
is allowed); "(),:;<>@[]
characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash); john.smith(comment)@example.com
and (comment)john.smith@example.com
are both equivalent to john.smith@example.com
. In addition to ASCII characters, as of 2012 you can use international characters above U+007F
, encoded as UTF-8.
For validation, see Using a regular expression to validate an email address.
The domain
part is defined as follows:
The Internet standards (Request for Comments) for protocols mandate that component hostname labels may contain only the ASCII letters a
through z
(in a case-insensitive manner), the digits 0
through 9
, and the hyphen ( -
). The original specification of hostnames in RFC 952, mandated that labels could not start with a digit or with a hyphen, and must not end with a hyphen. However, a subsequent specification (RFC 1123) permitted hostname labels to start with digits. No other symbols, punctuation characters, or blank spaces are permitted.
Watch out! There is a bunch of knowledge rot in this thread (stuff that used to be true and now isn't).
To avoid false-positive rejections of actual email addresses in the current and future world, and from anywhere in the world, you need to know at least the high-level concept of RFC 3490, "Internationalizing Domain Names in Applications (IDNA)". I know folks in US and A often aren't up on this, but it's already in widespread and rapidly increasing use around the world (mainly the non-English dominated parts).
The gist is that you can now use addresses like mason@日本.com and wildwezyr@fahrvergnügen.net. No, this isn't yet compatible with everything out there (as many have lamented above, even simple qmail-style +ident addresses are often wrongly rejected). But there is an RFC, there's a spec, it's now backed by the IETF and ICANN, and--more importantly--there's a large and growing number of implementations supporting this improvement that are currently in service.
I didn't know much about this development myself until I moved back to Japan and started seeing email addresses like hei@やる.ca and Amazon URLs like this:
http://www.amazon.co.jp/エレクトロニクス-デジタルカメラ-ポータブルオーディオ/b/ref=topnav_storetab_e?ie=UTF8&node=3210981
I know you don't want links to specs, but if you rely solely on the outdated knowledge of hackers on Internet forums, your email validator will end up rejecting email addresses that non-Enlish users increasingly expect to work. For those users, such validation will be just as annoying as the commonplace brain-dead form that we all hate, the one that can't handle a + or a three-part domain name or whatever.
So I'm not saying it's not a hassle, but the full list of characters "allowed under some/any/none conditions" is (nearly) all characters in all languages. If you want to "accept all valid email addresses (and many invalid too)" then you have to take IDN into account, which basically makes a character-based approach useless (sorry), unless you first convert the internationalized email addresses to Punycode.
After doing that you can follow (most of) the advice above.
Wikipedia has a good article on this, and the official spec is here. From Wikipdia:
The local-part of the e-mail address may use any of these ASCII characters:
Additionally, quoted-strings (ie: "John Doe"@example.com) are permitted, thus allowing characters that would otherwise be prohibited, however they do not appear in common practice. RFC 5321 also warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".
链接地址: http://www.djcxy.com/p/2712.html上一篇: 如何在Android SearchView中关闭键盘?
下一篇: 电子邮件地址允许使用哪些字符?