This answer largely and directly addresses multiple issues in the currently highest upvoted answer.
This answer also reinterprets and optimizes the regex as used in WebKit for example, for the Email Input Type.
The explicit ordering of certain parts of the expressions and characters/ranges in character classes is in some cases intentional. For example, some parts of the patterns have been intentionally optimized for lower-case alpha, digits, and upper-case alpha in that order, assuming that to cover the most frequent usages, and even if not frequent, then as canonicalized.
First, an attempt at a potentially correct implementation, barring any errata (currently, still under active development and improvement):
Email (RFCs 5322 and 5321 interpreted for Internet addresses)
- This validation RegExp for JavaScript as specified, once the foldable white space is stripped, is for addr-spec, used as Mailbox. I believe this is the most common validation use-case.
- Emphasis on INTERNET as opposed to Intranet or Local. If you need local/intranet host name version, please substitute the domain portion with your own.
- quoted-string is intentionally not allowed to be empty. This may be a slight deviation from the RFC as strictly defined. If anyone thinks this should not be so, please comment.
- I have interpreted the specs or inferred as follows: The maximum length of the domain portion of an email address is intended to be 254, excluding an implicit (usually omitted) final period (dot: ".") for the domain root. I interpret that this is intended to leave room in a 256 string buffer for the longest domain part, an implicit final period/dot, and a null terminator, as follows: 254 (full domain name without the final dot) + 1 (final dot) + 1 (\0 or \x00 etc. null terminator) = 256. The local part should have a max length of 64.
- CFWS omitted, as per spec, but strip the white space from the pattern (except the single space in the quoted-pair character class) before use, as your environment (such as JavaScript) requires. I will add a one-liner once I have finalized the expression.
Domain part for Intranet/Local:
(?=.{1,254}$)[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)(?:\.[a-zA-Z]([a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|))*
Full and Expanded RegExp:
^(?:
[-^-~/-9A-Z!#-'*+=?]+(?:\.[-^-~/-9A-Z!#-'*+=?]+)*
|
"
(?:
[!#-[\]-~]
|
\\[ -~\t]
)+
"
)@(?:
(?=.{4,254}$)(?:[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)\.)+[a-zA-Z][a-z0-9A-Z-]{0,61}[a-z0-9A-Z]
|
\[
(?:25[0-5]|(?:1[0-9]|2[0-4]|[1-9]|)[0-9])(?:\.(?:25[0-5]|(?:1[0-9]|2[0-4]|[1-9]|)[0-9])){3}
|
[a-zA-Z0-9-]*[a-zA-Z0-9]:[!-Z^-~]
\]
)$
For comparison, the original, as seems to have been reposted by @DouglasDaseeco:
^(?:
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
|
"
(?:
[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]
|
\\[\x01-\x09\x0b\x0c\x0e-\x7f]
)*
"
)@(?:
(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
|
\[
(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])
|
[a-z0-9-]*[a-z0-9]:
(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)
\]
)$
Specification
/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
Optimized: Strict #1
/^[--9^-~A-Z!#-'*+=?]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
Alternative
/^[--9^-~A-Z!#-'*+=?]{1,64}@(?=.{1,254}$)[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)(?:\.[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|))*$
From the WebKit project; This is practically the same as WhtWG
Original:
^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$
Optimized:
^[--9^-~A-Z!#-'*+=?]+@[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)(?:\.[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|))*$
Alternative:
^[--9^-~A-Z!#-'*+=?]{1,64}@(?=.{1,254}$)[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)(?:\.[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|))*$
Some definitions extracted from RFC 5322
address = mailbox / group
mailbox = name-addr / addr-spec
name-addr = [display-name] angle-addr
angle-addr = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr
group = display-name ":" [group-list] ";" [CFWS]
display-name = phrase
mailbox-list = (mailbox *("," mailbox)) / obs-mbox-list
address-list = (address *("," address)) / obs-addr-list
group-list = mailbox-list / CFWS / obs-group-list
addr-spec = local-part "@" domain
local-part = dot-atom / quoted-string / obs-local-part
domain = dot-atom / domain-literal / obs-domain
domain-literal = [CFWS] "[" *([FWS] dtext) "]" [CFWS]
dtext = %d33-90 / %d94-126 / obs-dtext
; Printable US-ASCII characters not including "[", "]", or "\"
quoted-string = [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS]
qcontent = qtext / quoted-pair
qtext = %d33 / %d35-91 / %d93-126 / obs-qtext
; Printable US-ASCII characters not including "\" or the quote character
dot-atom = [CFWS] dot-atom-text [CFWS]
dot-atom-text = 1*atext *("." atext)
atext = ALPHA / DIGIT / "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "/" /
"=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~"
Some definitions expanded or reinterpreted
dtext = !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ^_`abcdefghijklmnopqrstuvwxyz{|}~
= !-Z^-~
qtext = !#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~
= !#-[\]-~
atext = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&'*+-/=?^_`{|}~
= -^-~/-9A-Z!#-'*+=?
DIGIT = %x30-39 ; 0-9
= 0-9
= \d
ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
= A-Za-z
VCHAR = %x21-7E ; Visible (printing) characters
= !-~
WSP = SP / HTAB ; White space
= [ \t]
Seeming issues in the accepted, upvoted answer
- The original-original answer does not seem to be regex.
- Part of the answer seems to deal with parsing the whole message including
headers/content/body. The whole spec is irrelevant. You have to go
through all the RFCs and specs for the obscure points, but keep focus
on the addr-spec.
Parentheses seem to be mismatched or mispositioned
Last part of the IPv4 pattern seems to have been grouped together with the address-literal pattern.
Control characters should be prohibited
Including but not limited to \x7f which is equal to ASCII 127 or DEL.
RFC 5321 section 4.1.2:
Systems MUST NOT define mailboxes in such a way as to require the use
in SMTP of non-ASCII characters (octets with the high order bit set
to one) or ASCII "control characters" (decimal value 0-31 and 127).
These characters MUST NOT be used in MAIL or RCPT commands or other
commands that require mailbox names.
Control characters are not allowed in address-literal
RFC 5321 sections 4.1.2, 4.1.3:
address-literal = "[" ( IPv4-address-literal /
IPv6-address-literal /
General-address-literal ) "]"
IPv4-address-literal = Snum 3("." Snum)
IPv6-address-literal = "IPv6:" IPv6-addr
General-address-literal = Standardized-tag ":" 1*dcontent
Standardized-tag = Ldh-str
; Standardized-tag MUST be specified in a
; Standards-Track RFC and registered with IANA
Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig
If these solutions are cross-checked and peer-verified, anyone may incorporate this info into the original community-wiki answer, with appropriate credit.