4309

Over the years I have slowly developed a regular expression that validates most email addresses correctly, assuming they don't use an IP address as the server part.

I use it in several PHP programs, and it works most of the time. However, from time to time I get contacted by someone that is having trouble with a site that uses it, and I end up having to make some adjustment (most recently I realized that I wasn't allowing four-character TLDs).

What is the best regular expression you have or have seen for validating emails?

I've seen several solutions that use functions that use several shorter expressions, but I'd rather have one long complex expression in a simple function instead of several short expression in a more complex function.

11
  • 12
    The regex that can validate that an IDNA is correctly formatted does not fit in stackexchange. (the rules on canonicalisation ate really tortuous and particularly ill-suited to regex processing) Commented Aug 29, 2017 at 23:51
  • 15
    Why you should not do this: Can it cause harm to validate email addresses with a regex? Commented Jan 9, 2018 at 14:30
  • The regexes may be variable as in some cases, an email con can contain a space, and in other times, it cannot contain any spaces. Commented Jul 23, 2018 at 4:21
  • You can check Symfonys regex for loose and strict check: github.com/symfony/symfony/blob/5.x/src/Symfony/Component/… Commented May 16, 2021 at 16:15
  • Using just regex can harm server security but if it is just as an input pattern, i suggest use this: stackoverflow.com/questions/5601647/… Commented Jun 7, 2021 at 21:42

78 Answers 78

8

If you are fine with accepting empty values (which is not an invalid email) and are running PHP 5.2+, I would suggest:

static public function checkEmail($email, $ignore_empty = false) {
    if($ignore_empty && (is_null($email) || $email == ''))
        return true;
    return filter_var($email, FILTER_VALIDATE_EMAIL);
}
Sign up to request clarification or add additional context in comments.

1 Comment

What would be an example of an address with empty values? Please respond by editing (changing) your answer, not here in comments (without "Edit:", "Update:", or similar - the answer should appear as if it was written today).
8

I use multi-step validation. As there isn't any perfect way to validate an email address, a perfect one can't be made, but at least you can notify the user he/she is doing something wrong - here is my approach:

  1. I first validate with the very basic regex which just checks if the email contains exactly one @ sign and it is not blank before or after that sign. e.g. /^[^@\s]+@[^@\s]+$/

  2. if the first validator does not pass (and for most addresses it should although it is not perfect), then warn the user the email is invalid and do not allow him/her to continue with the input

  3. if it passes, then validate against a more strict regex - something which might disallow valid emails. If it does not pass, the user is warned about a possible error, but the user is allowed to continue. Unlike step (1) where the user is not allowed to continue because it is an obvious error.

So in other words, the first liberal validation is just to strip obvious errors and it is treated as "error". People type a blank address, address without @ sign and so on. This should be treated as an error. The second one is more strict, but it is treated as a "warning" and the user is allowed to continue with the input, but warned to at least examine if he/she entered a valid entry. The key here is in the error/warning approach - the error being something that can't under 99.99% circumstances be a valid email.

Of course, you can adjust what makes the first regex more liberal and the second one more strict.

Depending on what you need, the above approach might work for you.

2 Comments

Technically, email can contain more than 1 @. It's an astonishing weird discovery i made recently. EG: "very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com
Agreed, but I never claimed my method is 100% foolproof. It works in most cases. You gotta be realistic at some point and discard very unlikely cases. Most email addresses are [email protected]. If someone actually chooses to use an email address which is uses the most liberal syntax of all, he/she is in for a real treat of issues with various server/client programs not properly validating or allowing such email, or simply not working at all while sending/receiving. Where then such a user would be forced to use more "standard" syntax to ensure it works everywhere.
8

I'm still using:

^[A-Za-z0-9._+\-\']+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}$

But with IPv6 and Unicode coming up, perhaps this is best:

console.log(/^[\p{L}!#-'*+\-/\d=?^-~]+(.[\p{L}!#-'*+\-/\d=?^-~])*@[^@\s]{2,}$/u.test("תה.בועות@😀.fm"))

Gmail allows sequential dots, but Microsoft Exchange Server 2007 refuses them, which follows the most recent standard afaik.

7 Comments

Doesn't allow "John Smith"@example.com.
True, but when is that actually needed?
Any time an email address has a space in it?
@DavidConrad You mean "John\ Smith"@example.com according to this comment.
|
7
public bool ValidateEmail(string sEmail)
{
    if (sEmail == null)
    {
        return false;
    }

    int nFirstAT = sEmail.IndexOf('@');
    int nLastAT = sEmail.LastIndexOf('@');

    if ((nFirstAT > 0) && (nLastAT == nFirstAT) && (nFirstAT < (sEmail.Length - 1)))
    {
        return (Regex.IsMatch(sEmail, @"^[a-z|0-9|A-Z]*([_][a-z|0-9|A-Z]+)*([.][a-z|0-9|A-Z]+)*([.][a-z|0-9|A-Z]+)*(([_][a-z|0-9|A-Z]+)*)?@[a-z][a-z|0-9|A-Z]*\.([a-z][a-z|0-9|A-Z]*(\.[a-z][a-z|0-9|A-Z]*)?)$"));
    }
    else
    {
        return false;
    }
}

1 Comment

This will sometimes fail; a user in an email address may contain "@" characters if they are inside a quoted-string.
6

I don't believe the claim made by bortzmeyer that "The grammar (specified in RFC 5322) is too complicated for that" (to be handled by a regular expression).

Here is the grammar (from 3.4.1. Addr-Spec Specification):

addr-spec       =   local-part "@" domain
local-part      =   dot-atom / quoted-string / obs-local-part
domain          =   dot-atom / domain-literal / obs-domain
domain-literal  =   [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
dtext           =   %d33-90 /          ; Printable US-ASCII
                    %d94-126 /         ;  characters not including
                    obs-dtext          ;  "[", "]", or "\"

Assuming that dot-atom, quoted-string, obs-local-part, obs-domain are themselves regular languages, this is a very simple grammar. Just replace the local-part and domain in the addr-spec production with their respective productions, and you have a regular language, directly translatable to a regular expression.

4 Comments

You should investigate CFWS before you start making assumptions here. It's a nightmare.
CFWS = (1*([FWS] comment) [FWS]) / FWS. Still, I see no rule that makes the language not regular. It's complicated, for sure, but a complicated regular expression could handle it nevertheless.
This doesn't answer the question. It's in response to another answer.
CFWS is not part of the email address, it's part of the MIME syntax. See my answer stackoverflow.com/a/63841473/7117939 for why this is.
6

I know this question is about regular expressions, but I am guessing that 90% of all developers reading these solutions are trying to validate an email address in an HTML form displayed in a browser.

If this is the case, I'd suggest checking out the new HTML5 <input type="email"> form element:

HTML5:

 <input type="email" required />

CSS 3:

 input:required {
      background-color: rgba(255, 0, 0, 0.2);
 }

 input:focus:invalid {
     box-shadow: 0 0 1em red;
     border-color: red;
 }

 input:focus:valid {
     box-shadow: 0 0 1em green;
     border-color: green;
 }

It is at HTML5 Form Validation Without JS - JSFiddle - Code Playground.

This has a couple of advantages:

  1. Automatic validation and no custom solution needed: simple and easy to implement
  2. No JavaScript, and no problems if JavaScript has been disabled
  3. No server has to calculate anything for that
  4. The user has immediate feedback
  5. Old browsers should automatically fallback to input type "text"
  6. Mobile browsers can display a specialized keyboard (@-Keyboard)
  7. Form validation feedback is very easy with CSS 3

The apparent downside might be missing validation for old browsers, but that'll change over time. I'd prefer this over any of these insane regular expression masterpieces.

Also see:

2 Comments

The other down side is that this is client-side only. Good for providing a smooth user experience, bad for validating data.
The problem with the default email validation is that it has lots of false positives. You'd need to use my complete pattern to eliminate all false positives while preventing false negatives from sneaking in. That pattern can be added via the pattern attribute. See my post for more info.
5

This rule matches what our Postfix server could not send to.

Allow letters, numbers, -, _, +, ., &, /, and !

No [email protected]

No [email protected]

/^([a-z0-9\+\._\/&!][-a-z0-9\+\._\/&!]*)@(([a-z0-9][-a-z0-9]*\.)([-a-z0-9]+\.)*[a-z]{2,})$/i

Comments

5

For PHP I'm using the email address validator from the Nette Framework:

/* public static */ function isEmail($value)
{
    $atom = "[-a-z0-9!#$%&'*+/=?^_`{|}~]"; // RFC 5322 unquoted characters in local-part
    $localPart = "(?:\"(?:[ !\\x23-\\x5B\\x5D-\\x7E]*|\\\\[ -~])+\"|$atom+(?:\\.$atom+)*)"; // Quoted or unquoted
    $alpha = "a-z\x80-\xFF"; // Superset of IDN
    $domain = "[0-9$alpha](?:[-0-9$alpha]{0,61}[0-9$alpha])?"; // RFC 1034 one domain component
    $topDomain = "[$alpha](?:[-0-9$alpha]{0,17}[$alpha])?";
    return (bool) preg_match("(^$localPart@(?:$domain\\.)+$topDomain\\z)i", $value);
}

Comments

4

We have used http://www.aspnetmx.com/ with a degree of success for a few years now. You can choose the level you want to validate at (e.g. syntax check, check for the domain, MX records or the actual email).

For front-end forms we generally verify that the domain exists and the syntax is correct, and then we do stricter verification to clean out our database before doing bulk mail-outs.

2 Comments

The link is broken (it times out) - "Unable to connect. An error occurred during a connection to www.aspnetmx.com."
This was originally answered in the year 2008. :-) Where has the time gone....
4

This is one of the regexes for email:

^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$

1 Comment

It looks like line noise. Do you have an explanation and/or reference for it?
4

No one mentioned the issue of localization (i18n). What if you have clients coming from all over the world?

You will need to then need to sub-categorize your regex per country/area, which I have seen developers ending up building a large dictionary or configuration. Detecting the users' browser language setting may be a good starting point.

Comments

4

For me the right way for checking email addresses is:

  1. Check that symbol @ exists, and before and after it there are some non-@ symbols: /^[^@]+@[^@]+$/
  2. Try to send an email to this address with some "activation code".
  3. When the user "activated" his/her email address, we will see that all is right.

Of course, you can show some warning or tooltip in front-end when the user typed a "strange" email to help him/her to avoid common mistakes, like no dot in the domain part or spaces in name without quoting and so on. But you must accept the address "hello@world" if user really want it.

Also, you must remember that the email address standard was and can evolve, so you can't just type some "standard-valid" regexp once and for all times. And you must remember that some concrete internet servers can fail some details of common standard and in fact work with own "modified standard".

So, just check @, hint user on frontend and send verification emails on the given address.

Comments

4

Just about every regular expression I've seen - including some used by Microsoft will not allow the following valid email to get through: [email protected]

I just had a real customer with an email address in this format who couldn't place an order.

Here's what I settled on:

  • A minimal regular expression that won't have false negatives. Alternatively use the MailAddress constructor with some additional checks (see below):
  • Checking for common typos .cmo or .gmial.com and asking for confirmation "Are you sure this is your correct email address. It looks like there may be a mistake." Allow the user to accept what they typed if they are sure.
  • Handling bounces when the email is actually sent and manually verifying them to check for obvious mistakes.

try
{
    var email = new MailAddress(str);

    if (email.Host.EndsWith(".cmo"))
    {
        return EmailValidation.PossibleTypo;
    }

    if (!email.Host.EndsWith(".") && email.Host.Contains("."))
    {
        return EmailValidation.OK;
    }
}
catch
{
    return EmailValidation.Invalid;
}

5 Comments

This answer is misleading and unrelated to question. Allowing users to enter wrong email is a business decision, question is about validating it with regex.
The first answer to this post does pass [email protected] just fine.
What programming language? C#? Java? Something else?
The .gmial.com example is not in the example code.
I have never ever seen "Gmail" misspelled as "Gmial".
4

According to RFC 2821 and RFC 2822, the local-part of an email addresses may use any of these ASCII characters:

  1. Uppercase and lowercase letters
  2. The digits 0 through 9
  3. The characters, !#$%&'*+-/=?^_`{|}~
  4. The character "." provided that it is not the first or last character in the local-part.

Matches:

Non-Matches:

For one that is RFC 2821 and 2822 compliant, you can use:

^((([!#$%&'*+\-/=?^_`{|}~\w])|([!#$%&'*+\-/=?^_`{|}~\w][!#$%&'*+\-/=?^_`{|}~\.\w]{0,}[!#$%&'*+\-/=?^_`{|}~\w]))[@]\w+([-.]\w+)*\.\w+([-.]\w+)*)$

Email - RFC 2821, 2822 Compliant

1 Comment

Why doesn't it work on Håkan.Söderström@malmö.se ?
4

Although very detailed answers are already added, I think those are complex enough for a developer who is just looking for a simple method to validate an email address or to get all email addresses from a string in Java.

public static boolean isEmailValid(@NonNull String email) {
    return android.util.Patterns.EMAIL_ADDRESS.matcher(email).matches();
}

As per the regular expression is concerned, I always use this regular expression, which works for my problems.

"[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}"

If you are looking to find all email addresses from a string by matching the email regular expression. You can find a method at this link.

2 Comments

Re "which works for my problems.": What would those problems be? What are some examples of false positives and false negatives? How do you handle those?
What programming language? Java? This was comment number 2 and question number 2.
4

I always use the below regular expression to validate the email address. It covers all formats of email addresses based on English language characters.

"\A(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)\Z";

Given below is a C# example:

Add the assembly reference:

using System.Text.RegularExpressions;

and use the below method to pass the email address and get a boolean in return

private bool IsValidEmail(string email) {
    bool isValid = false;
    const string pattern = @"\A(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)\Z";

    isValid = email != "" && Regex.IsMatch(email, pattern);

    // Same above approach in multiple lines
    //
    //if (!email) {
    //    isValid = false;
    //} else {
    //    // email param contains a value; Pass it to the isMatch method
    //    isValid = Regex.IsMatch(email, pattern);
    //}
    return isValid;
}

This method validates the email string passed in the parameter. It will return false for all cases where param is null, empty string, undefined or the param value is not a valid email address. It will only return true when the param contains a valid email address string.

4 Comments

Does this code accept "Håkan.Söderström@malmö.se" or "试@例子.测试.مثال.آزمایشی" emails?
It's for standard Email Servers with standard characters. In case of non English language one should have to make its own customized ReGex.
Regex and email spec includes UTF-8, hence illogical response.
In what way is it the best regular expression? Most comprehensive? Simplest? Fewest false negatives? Fewest false positives? The fastest? Fewest number of user complaints in actual real-world use? Some combination of these properties? Something else? Please respond by editing (changing) your answer, not here in comments (without "Edit:", "Update:", or similar - the answer should appear as if it was written today).
3

I would not suggest to use an regex at all - email addresses are way too complicated for that. This is a common problem so I would guess there are many libraries that contain a validator - if you use Java the EmailValidator of apache commons validator is a good one.

Comments

3

Here is the one I've build. It is not a bulletproof version, but it is 'simple' and checks almost everything.

[\w+-]+(?:\.[\w+-]+)*@[\w+-]+(?:\.[\w+-]+)*(?:\.[a-zA-Z]{2,4})

I think an explanation is in place so you can modify it if you want:

(e) [\w+-]+ matches a-z, A-Z, _, +, - at least one time

(m) (?:\.[\w+-]+)* matches a-z, A-Z, _, +, - zero or more times but need to start with a . (dot)

@ = @

(i) [\w+-]+ matches a-z, A-Z, _, +, - at least one time

(l) (?:\.[\w+-]+)* matches a-z, A-Z, _, +, - zero or more times but need to start with a . (dot)

(com) (?:\.[a-zA-Z]{2,4}) matches a-z, A-Z for 2 to 4 times starting with a . (dot)

giving e(.m)@i(.l).com where (.m) and (.l) are optional but also can be repeated multiple times.

I think this validates all valid email addresses, but blocks potential invalid without using an overcomplex regular expression which won't be necessary in most cases.

Notice this will allow [email protected], but that is the compromise for keeping it simple.

1 Comment

Thanks! This worked for me. Here is a tested C/C++ escaped version used with Qt5: QRegExp rx("[\\w+-]+(?:\\.[\\w+-]+)*@[\\w+-]+(?:\\.[\\w+-]+)*(?:\\.[a-zA-Z]{2,})");
3

I’ve had a similar desire: wanting a quick check for syntax in email addresses without going overboard (the Mail::RFC822::Address answer which is the obviously correct one) for an email send utility. I went with this (I’m a POSIX regular expression person, so I don’t normally use \d and such from PCRE, as they make things less legible to me):

preg_match("_^[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*@[0-9A-Za-z]([-0-9A-Za-z]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([-0-9A-Za-z]{0,61}[0-9A-Za-z])?)*\$_", $adr)

This is RFC-correct, but it explicitly excludes the obsolete forms as well as direct IP addresses (IP addresses and legacy IP addresses both), which someone in the target group of that utility (mostly: people who bother us in #sendmail on IRC) would not normally want or need anyway.

IDNs (internationalised domain names) are explicitly not in the scope of email: addresses like “foo@cäcilienchor-bonn.de” must be written “[email protected]” on the wire instead (this includes mailto: links in HTML and such fun), only the GUI is allowed to display (and accept then convert) such names to (and from) the user.

2 Comments

Re "legacy IP addresses": Do you mean IPv4 IP addresses?
@PeterMortensen: (thanks for the syntax highlighting and English fixes, but something seems to be broken now, it says community wiki with you as author?) yes, legacy IP addresses is what IPv4 addresses have been called for a couple of years now, IP addresses are IPv6 addresses.
3

If you want to improve on a regex that has been working reasonably well over several years, then the answer depends on what exactly you want to achieve - what kinds of email addresses have been failing. Fine-tuning email regexes is very difficult, and I have yet to see a perfect solution.

  • If your application involves something very technical in nature (or something internal to organizations), then maybe you need to support IP addresses instead of domain names, or comments in the "local" part of the email address.
  • If your application is multinational, I would consider focusing on Unicode and UTF-8 support.

The leading answer to your question currently links to a "fully RFC‑822–compliant regex". However, in spite of the complexity of that regex and its presumed attention to detail in RFC rules, it completely fails when it comes to Unicode support.

The regex that I've written for most of my applications focuses on Unicode support, as well as reasonably good overall adherence to RFC standards:

/^(?!\.)((?!.*\.{2})[a-zA-Z0-9\u0080-\u00FF\u0100-\u017F\u0180-\u024F\u0250-\u02AF\u0300-\u036F\u0370-\u03FF\u0400-\u04FF\u0500-\u052F\u0530-\u058F\u0590-\u05FF\u0600-\u06FF\u0700-\u074F\u0750-\u077F\u0780-\u07BF\u07C0-\u07FF\u0900-\u097F\u0980-\u09FF\u0A00-\u0A7F\u0A80-\u0AFF\u0B00-\u0B7F\u0B80-\u0BFF\u0C00-\u0C7F\u0C80-\u0CFF\u0D00-\u0D7F\u0D80-\u0DFF\u0E00-\u0E7F\u0E80-\u0EFF\u0F00-\u0FFF\u1000-\u109F\u10A0-\u10FF\u1100-\u11FF\u1200-\u137F\u1380-\u139F\u13A0-\u13FF\u1400-\u167F\u1680-\u169F\u16A0-\u16FF\u1700-\u171F\u1720-\u173F\u1740-\u175F\u1760-\u177F\u1780-\u17FF\u1800-\u18AF\u1900-\u194F\u1950-\u197F\u1980-\u19DF\u19E0-\u19FF\u1A00-\u1A1F\u1B00-\u1B7F\u1D00-\u1D7F\u1D80-\u1DBF\u1DC0-\u1DFF\u1E00-\u1EFF\u1F00-\u1FFFu20D0-\u20FF\u2100-\u214F\u2C00-\u2C5F\u2C60-\u2C7F\u2C80-\u2CFF\u2D00-\u2D2F\u2D30-\u2D7F\u2D80-\u2DDF\u2F00-\u2FDF\u2FF0-\u2FFF\u3040-\u309F\u30A0-\u30FF\u3100-\u312F\u3130-\u318F\u3190-\u319F\u31C0-\u31EF\u31F0-\u31FF\u3200-\u32FF\u3300-\u33FF\u3400-\u4DBF\u4DC0-\u4DFF\u4E00-\u9FFF\uA000-\uA48F\uA490-\uA4CF\uA700-\uA71F\uA800-\uA82F\uA840-\uA87F\uAC00-\uD7AF\uF900-\uFAFF\.!#$%&'*+-/=?^_`{|}~\-\d]+)@(?!\.)([a-zA-Z0-9\u0080-\u00FF\u0100-\u017F\u0180-\u024F\u0250-\u02AF\u0300-\u036F\u0370-\u03FF\u0400-\u04FF\u0500-\u052F\u0530-\u058F\u0590-\u05FF\u0600-\u06FF\u0700-\u074F\u0750-\u077F\u0780-\u07BF\u07C0-\u07FF\u0900-\u097F\u0980-\u09FF\u0A00-\u0A7F\u0A80-\u0AFF\u0B00-\u0B7F\u0B80-\u0BFF\u0C00-\u0C7F\u0C80-\u0CFF\u0D00-\u0D7F\u0D80-\u0DFF\u0E00-\u0E7F\u0E80-\u0EFF\u0F00-\u0FFF\u1000-\u109F\u10A0-\u10FF\u1100-\u11FF\u1200-\u137F\u1380-\u139F\u13A0-\u13FF\u1400-\u167F\u1680-\u169F\u16A0-\u16FF\u1700-\u171F\u1720-\u173F\u1740-\u175F\u1760-\u177F\u1780-\u17FF\u1800-\u18AF\u1900-\u194F\u1950-\u197F\u1980-\u19DF\u19E0-\u19FF\u1A00-\u1A1F\u1B00-\u1B7F\u1D00-\u1D7F\u1D80-\u1DBF\u1DC0-\u1DFF\u1E00-\u1EFF\u1F00-\u1FFF\u20D0-\u20FF\u2100-\u214F\u2C00-\u2C5F\u2C60-\u2C7F\u2C80-\u2CFF\u2D00-\u2D2F\u2D30-\u2D7F\u2D80-\u2DDF\u2F00-\u2FDF\u2FF0-\u2FFF\u3040-\u309F\u30A0-\u30FF\u3100-\u312F\u3130-\u318F\u3190-\u319F\u31C0-\u31EF\u31F0-\u31FF\u3200-\u32FF\u3300-\u33FF\u3400-\u4DBF\u4DC0-\u4DFF\u4E00-\u9FFF\uA000-\uA48F\uA490-\uA4CF\uA700-\uA71F\uA800-\uA82F\uA840-\uA87F\uAC00-\uD7AF\uF900-\uFAFF\-\.\d]+)((\.([a-zA-Z\u0080-\u00FF\u0100-\u017F\u0180-\u024F\u0250-\u02AF\u0300-\u036F\u0370-\u03FF\u0400-\u04FF\u0500-\u052F\u0530-\u058F\u0590-\u05FF\u0600-\u06FF\u0700-\u074F\u0750-\u077F\u0780-\u07BF\u07C0-\u07FF\u0900-\u097F\u0980-\u09FF\u0A00-\u0A7F\u0A80-\u0AFF\u0B00-\u0B7F\u0B80-\u0BFF\u0C00-\u0C7F\u0C80-\u0CFF\u0D00-\u0D7F\u0D80-\u0DFF\u0E00-\u0E7F\u0E80-\u0EFF\u0F00-\u0FFF\u1000-\u109F\u10A0-\u10FF\u1100-\u11FF\u1200-\u137F\u1380-\u139F\u13A0-\u13FF\u1400-\u167F\u1680-\u169F\u16A0-\u16FF\u1700-\u171F\u1720-\u173F\u1740-\u175F\u1760-\u177F\u1780-\u17FF\u1800-\u18AF\u1900-\u194F\u1950-\u197F\u1980-\u19DF\u19E0-\u19FF\u1A00-\u1A1F\u1B00-\u1B7F\u1D00-\u1D7F\u1D80-\u1DBF\u1DC0-\u1DFF\u1E00-\u1EFF\u1F00-\u1FFF\u20D0-\u20FF\u2100-\u214F\u2C00-\u2C5F\u2C60-\u2C7F\u2C80-\u2CFF\u2D00-\u2D2F\u2D30-\u2D7F\u2D80-\u2DDF\u2F00-\u2FDF\u2FF0-\u2FFF\u3040-\u309F\u30A0-\u30FF\u3100-\u312F\u3130-\u318F\u3190-\u319F\u31C0-\u31EF\u31F0-\u31FF\u3200-\u32FF\u3300-\u33FF\u3400-\u4DBF\u4DC0-\u4DFF\u4E00-\u9FFF\uA000-\uA48F\uA490-\uA4CF\uA700-\uA71F\uA800-\uA82F\uA840-\uA87F\uAC00-\uD7AF\uF900-\uFAFF]){2,63})+)$/i

I'll avoid copy-pasting complete answers, so I'll just link this to a similar answer I provided here: How to validate a unicode email?

There is also a live demo available for the regex above at: http://jsfiddle.net/aossikine/qCLVH/3/

Comments

3

The regular expressions posted for this question are out of date now, because of the new generic top-level domains (gTLDs) coming in (e.g. .london, .basketball, .通販). To validate an email address there are two answers (that would be relevant to the vast majority).

  1. As the main answer says - don't use a regular expression. Just validate it by sending an email to the address (catch exceptions for invalid addresses)
  2. Use a very generic regex to at least make sure that they are using an email structure like {something}@{something}.{something}. There's no point in going for a detailed regex, because you won't catch them all and there'll be a new batch in a few years and you'll have to update your regular expression again.

I have decided to use the regular expression because, unfortunately, some users don't read forms and put the wrong data in the wrong fields. This will at least alert them when they try to put something which isn't an email into the email input field and should save you some time supporting users on email issues.

(.+)@(.+){2,}\.(.+){2,}

3 Comments

What is the difference between a gTLD and a TLD?
They're all the same really, but just categorised differently. There are mainly Country Code TLDS (ccTLD), like .co.uk or .fr. These are assigned to each country and contribute as a factor for search engines understand the location/target audience. Sponsored TLDS (sTLD) are assigned to organisations or governments, e.g. .gov The generics (gTLD) cover the extensions which are generic, e.g. .com, .london, .mail, etc. There are some restrictions on which ones you can use, prices can be very different, but Google also says it doesn't matter too much whether you're on a .com or a .whatever.
The initial {2,} constraint will cause valid 1-letter domains to fail, e.g.: [email protected]
3

Following is the regular expression for validating an email address:

^.+@\w+(\.\w+)+$

1 Comment

Given all the previous answers, such a simple regular expression requires an explanation (e.g., why weren't the huge complexity in the previous answers necessary?). What are its properties? What does it fail for? What are some examples that it does work for? What are some examples that it doesn't work for? Please respond by editing (changing) your answer, not here in comments (without "Edit:", "Update:", or similar - the answer should appear as if it was written today).
3

This answer largely and directly addresses multiple issues in the currently highest upvoted answer.

This answer also reinterprets and optimizes the regex as used in WebKit for example, for the Email Input Type.

The explicit ordering of certain parts of the expressions and characters/ranges in character classes is in some cases intentional. For example, some parts of the patterns have been intentionally optimized for lower-case alpha, digits, and upper-case alpha in that order, assuming that to cover the most frequent usages, and even if not frequent, then as canonicalized.

First, an attempt at a potentially correct implementation, barring any errata (currently, still under active development and improvement):

Email (RFCs 5322 and 5321 interpreted for Internet addresses)

  1. This validation RegExp for JavaScript as specified, once the foldable white space is stripped, is for addr-spec, used as Mailbox. I believe this is the most common validation use-case.
  2. Emphasis on INTERNET as opposed to Intranet or Local. If you need local/intranet host name version, please substitute the domain portion with your own.
  3. quoted-string is intentionally not allowed to be empty. This may be a slight deviation from the RFC as strictly defined. If anyone thinks this should not be so, please comment.
  4. I have interpreted the specs or inferred as follows: The maximum length of the domain portion of an email address is intended to be 254, excluding an implicit (usually omitted) final period (dot: ".") for the domain root. I interpret that this is intended to leave room in a 256 string buffer for the longest domain part, an implicit final period/dot, and a null terminator, as follows: 254 (full domain name without the final dot) + 1 (final dot) + 1 (\0 or \x00 etc. null terminator) = 256. The local part should have a max length of 64.
  5. CFWS omitted, as per spec, but strip the white space from the pattern (except the single space in the quoted-pair character class) before use, as your environment (such as JavaScript) requires. I will add a one-liner once I have finalized the expression.

Domain part for Intranet/Local:

(?=.{1,254}$)[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)(?:\.[a-zA-Z]([a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|))*

Full and Expanded RegExp:

^(?:
    [-^-~/-9A-Z!#-'*+=?]+(?:\.[-^-~/-9A-Z!#-'*+=?]+)*
    |
    "
        (?:
            [!#-[\]-~]
            |
            \\[ -~\t]
        )+
    "
)@(?:
    (?=.{4,254}$)(?:[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)\.)+[a-zA-Z][a-z0-9A-Z-]{0,61}[a-z0-9A-Z]
    |
    \[
        (?:25[0-5]|(?:1[0-9]|2[0-4]|[1-9]|)[0-9])(?:\.(?:25[0-5]|(?:1[0-9]|2[0-4]|[1-9]|)[0-9])){3}
        |
        [a-zA-Z0-9-]*[a-zA-Z0-9]:[!-Z^-~]
    \]
)$

For comparison, the original, as seems to have been reposted by @DouglasDaseeco:

^(?:
    [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
    |
    "
        (?:
            [\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]
            |
            \\[\x01-\x09\x0b\x0c\x0e-\x7f]
        )*
    "
)@(?:
    (?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
    |
    \[
        (?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])
        |
        [a-z0-9-]*[a-z0-9]:
            (?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)
    \]
)$

From the WhatWG

Specification

/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

Optimized: Strict #1

/^[--9^-~A-Z!#-'*+=?]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

Alternative

/^[--9^-~A-Z!#-'*+=?]{1,64}@(?=.{1,254}$)[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)(?:\.[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|))*$

From the WebKit project; This is practically the same as WhtWG

Original:

^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$

Optimized:

^[--9^-~A-Z!#-'*+=?]+@[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)(?:\.[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|))*$

Alternative:

^[--9^-~A-Z!#-'*+=?]{1,64}@(?=.{1,254}$)[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|)(?:\.[a-z0-9A-Z](?:[a-z0-9A-Z-]{0,61}[a-z0-9A-Z]|))*$

Some definitions extracted from RFC 5322

address         = mailbox / group
mailbox         = name-addr / addr-spec
name-addr       = [display-name] angle-addr
angle-addr      = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr
group           = display-name ":" [group-list] ";" [CFWS]
display-name    = phrase
mailbox-list    = (mailbox *("," mailbox)) / obs-mbox-list
address-list    = (address *("," address)) / obs-addr-list
group-list      = mailbox-list / CFWS / obs-group-list

addr-spec       = local-part "@" domain
local-part      = dot-atom / quoted-string / obs-local-part
domain          = dot-atom / domain-literal / obs-domain
domain-literal  = [CFWS] "[" *([FWS] dtext) "]" [CFWS]
dtext           = %d33-90 / %d94-126 / obs-dtext
                        ; Printable US-ASCII characters not including "[", "]", or "\"

quoted-string   = [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS]
qcontent        = qtext / quoted-pair
qtext           = %d33 / %d35-91 / %d93-126 / obs-qtext
                        ; Printable US-ASCII characters not including "\" or the quote character

dot-atom        = [CFWS] dot-atom-text [CFWS]
dot-atom-text   = 1*atext *("." atext)
atext           = ALPHA / DIGIT / "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "/" /
                    "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~"

Some definitions expanded or reinterpreted

dtext           = !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ^_`abcdefghijklmnopqrstuvwxyz{|}~
                = !-Z^-~
qtext           = !#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~
                = !#-[\]-~
atext           = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&'*+-/=?^_`{|}~
                = -^-~/-9A-Z!#-'*+=?

DIGIT           = %x30-39               ; 0-9
                = 0-9
                = \d
ALPHA           = %x41-5A / %x61-7A     ; A-Z / a-z
                = A-Za-z
VCHAR           = %x21-7E               ; Visible (printing) characters
                = !-~
WSP             = SP / HTAB             ; White space
                = [ \t]

Seeming issues in the accepted, upvoted answer

  • The original-original answer does not seem to be regex.
  • Part of the answer seems to deal with parsing the whole message including headers/content/body. The whole spec is irrelevant. You have to go through all the RFCs and specs for the obscure points, but keep focus on the addr-spec.

Parentheses seem to be mismatched or mispositioned

Last part of the IPv4 pattern seems to have been grouped together with the address-literal pattern.

Control characters should be prohibited

Including but not limited to \x7f which is equal to ASCII 127 or DEL.

RFC 5321 section 4.1.2:

Systems MUST NOT define mailboxes in such a way as to require the use in SMTP of non-ASCII characters (octets with the high order bit set to one) or ASCII "control characters" (decimal value 0-31 and 127). These characters MUST NOT be used in MAIL or RCPT commands or other commands that require mailbox names.

Control characters are not allowed in address-literal

RFC 5321 sections 4.1.2, 4.1.3:

address-literal  = "[" ( IPv4-address-literal /
                 IPv6-address-literal /
                 General-address-literal ) "]"

IPv4-address-literal  = Snum 3("."  Snum)

IPv6-address-literal  = "IPv6:" IPv6-addr

General-address-literal  = Standardized-tag ":" 1*dcontent

Standardized-tag  = Ldh-str
                  ; Standardized-tag MUST be specified in a
                  ; Standards-Track RFC and registered with IANA

Ldh-str        = *( ALPHA / DIGIT / "-" ) Let-dig

If these solutions are cross-checked and peer-verified, anyone may incorporate this info into the original community-wiki answer, with appropriate credit.

1 Comment

This answer is underappreciated... Thanks!
2

A regex that does exactly what the standards say is allowed, according to what I've seen about them, is this:

/^(?!(^[.-].*|.*[.-]@|.*\.{2,}.*)|^.{254}.+@)([a-z\xC0-\xFF0-9!#$%&'*+\/=?^_`{|}~.-]+@)(?!.{253}.+$)((?!-.*|.*-\.)([a-z0-9-]{1,63}\.)+[a-z]{2,63}|(([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9])\.){3}([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9]))$/gim

Demo / Debuggex analysis (interactive)

Split up:

^(?!(^[.-].*|.*[.-]@|.*\.{2,}.*)|^.{254}.+@)
([a-z\xC0-\xFF0-9!#$%&'*+\/=?^_`{|}~.-]+@)
(?!.{253}.+$)
(
    (?!-.*|.*-\.)
    ([a-z0-9-]{1,63}\.)+
    [a-z]{2,63}
    |
    (([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9])\.){3}
    ([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9])
)$

Analysis:

(?!(^[.-].*|.*[.-]@|.*\.{2,}.*)|^.{254}.+@)

Negative lookahead for either an address starting with a ., ending with one, having .. in it, or exceeding the 254 character max length


([a-z\xC0-\xFF0-9!#$%&'*+\/=?^_`{|}~.-]+@)

matching 1 or more of the permitted characters, with the negative look applying to it


(?!.{253}.+$)

Negative lookahead for the domain name part, restricting it to 253 characters in total


(?!-.*|.*-\.)

Negative lookahead for each of the domain names, which are don't allow starting or ending with .


([a-z0-9-]{1,63}\.)+

simple group match for the allowed characters in a domain name, which are limited to 63 characters each


[a-zA-Z]{2,63}

simple group match for the allowed top-level domain, which currently still is restricted to letters only, but does include >4 letter TLDs.


(([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9])\.){3}
([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9])

the alternative for domain names: this matches the first 3 numbers in an IP address with a . behind it, and then the fourth number in the IP address without . behind it.

1 Comment

Don't use this. It's will reject international domains like "öåüñ". blog.cloudflare.com/non-latinutf8-domains-now-fully-supported
2

As per my understanding, it will most probably be covered by...

/^([a-z0-9_-]+)(@[a-z0-9-]+)(\.[a-z]+|\.[a-z]+\.[a-z]+)?$/is

3 Comments

improvement/suggestion always act as catalyst so pls be catalyzed and catalyzed me also.
Gmail users often use . and + in their email nick, and some comments on this page mention ' and !.
This is too restrictive, and does not permit numbers in domain names, characters in the user part. o'[email protected], [email protected], and [email protected] are all valid email addresses that this does not validate.
2

I found a regular expression that is compliant with RFC 2822. The preceding standard to RFC 5322. This regular expression appears to perform fairly well and will cover most cases, however with RFC 5322 becoming the standard there may be some holes that ought to be plugged.

^(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])$

The documentation says you shouldn't use the above regular expression, but instead favour this flavour, which is a bit more manageable.

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

I noticed this is case-sensitive, so I actually made an alteration to this landing.

^[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?$

Comments

2

There has nearly been added a new domain, "yandex". Possible emails: [email protected]. And also uppercase letters are supported, so a bit modified version of acrosman's solution is:

^[_a-zA-Z0-9-]+(\.[_a-zA-Z0-9-]+)*@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*(\.[a-zA-Z]{2,6})$

2 Comments

This is too restrictive, and disallows valid email addresses like o'[email protected]
Re "acrosman's solution": User acrosman has not posted a solution or answer, only a question. What answer does this refer to?
2

Java Mail API does magic for us.

try
{
    InternetAddress internetAddress = new InternetAddress(email);
    internetAddress.validate();
    return true;
}
catch(Exception ex)
{
    return false;
}

I got this from here.

1 Comment

Java Mail API is an optional package for use with Java SE platform and is included in the Java EE platform.
2

Writing a regular expression for all the things will take a lot of effort. Instead, you can use pyIsEmail package.

Below text is taken from pyIsEmail website.

pyIsEmail is a no-nonsense approach for checking whether that user-supplied email address could be real.

Regular expressions are cheap to write, but often require maintenance when new top-level domains come out or don’t conform to email addressing features that come back into vogue. pyIsEmail allows you to validate an email address – and even check the domain, if you wish – with one simple call, making your code more readable and faster to write. When you want to know why an email address doesn’t validate, they even provide you with a diagnosis.

Usage

For the simplest usage, import and use the is_email function:

from pyisemail import is_email

address = "[email protected]"
bool_result = is_email(address)
detailed_result = is_email(address, diagnose=True)

You can also check whether the domain used in the email is a valid domain and whether or not it has a valid MX record:

from pyisemail import is_email

address = "[email protected]"
bool_result_with_dns = is_email(address, check_dns=True)
detailed_result_with_dns = is_email(address, check_dns=True, diagnose=True)

These are primary indicators of whether an email address can even be issued at that domain. However, a valid response here is not a guarantee that the email exists, merely that is can exist.

In addition to the base is_email functionality, you can also use the validators by themselves. Check the validator source doc to see how this works.

2 Comments

Re "...when new top-level domains come out": Aren't there literally thousands by now?
This sounds more like an advert. What does it actually do? What is the gist? Does it go live over the Internet to do some lookups or checks (that involves some DNS stuff)? Effectively trying to send the email to see what happens? Or something else?
2

I did not find any that deals with a top-level domain name, but it should be considered.

So for me the following worked:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}AAA|AARP|ABB|ABBOTT|ABOGADO|AC|ACADEMY|ACCENTURE|ACCOUNTANT|ACCOUNTANTS|ACO|ACTIVE|ACTOR|AD|ADAC|ADS|ADULT|AE|AEG|AERO|AF|AFL|AG|AGENCY|AI|AIG|AIRFORCE|AIRTEL|AL|ALIBABA|ALIPAY|ALLFINANZ|ALSACE|AM|AMICA|AMSTERDAM|ANALYTICS|ANDROID|AO|APARTMENTS|APP|APPLE|AQ|AQUARELLE|AR|ARAMCO|ARCHI|ARMY|ARPA|ARTE|AS|ASIA|ASSOCIATES|AT|ATTORNEY|AU|AUCTION|AUDI|AUDIO|AUTHOR|AUTO|AUTOS|AW|AX|AXA|AZ|AZURE|BA|BAIDU|BAND|BANK|BAR|BARCELONA|BARCLAYCARD|BARCLAYS|BARGAINS|BAUHAUS|BAYERN|BB|BBC|BBVA|BCN|BD|BE|BEATS|BEER|BENTLEY|BERLIN|BEST|BET|BF|BG|BH|BHARTI|BI|BIBLE|BID|BIKE|BING|BINGO|BIO|BIZ|BJ|BLACK|BLACKFRIDAY|BLOOMBERG|BLUE|BM|BMS|BMW|BN|BNL|BNPPARIBAS|BO|BOATS|BOEHRINGER|BOM|BOND|BOO|BOOK|BOOTS|BOSCH|BOSTIK|BOT|BOUTIQUE|BR|BRADESCO|BRIDGESTONE|BROADWAY|BROKER|BROTHER|BRUSSELS|BS|BT|BUDAPEST|BUGATTI|BUILD|BUILDERS|BUSINESS|BUY|BUZZ|BV|BW|BY|BZ|BZH|CA|CAB|CAFE|CAL|CALL|CAMERA|CAMP|CANCERRESEARCH|CANON|CAPETOWN|CAPITAL|CAR|CARAVAN|CARDS|CARE|CAREER|CAREERS|CARS|CARTIER|CASA|CASH|CASINO|CAT|CATERING|CBA|CBN|CC|CD|CEB|CENTER|CEO|CERN|CF|CFA|CFD|CG|CH|CHANEL|CHANNEL|CHAT|CHEAP|CHLOE|CHRISTMAS|CHROME|CHURCH|CI|CIPRIANI|CIRCLE|CISCO|CITIC|CITY|CITYEATS|CK|CL|CLAIMS|CLEANING|CLICK|CLINIC|CLINIQUE|CLOTHING|CLOUD|CLUB|CLUBMED|CM|CN|CO|COACH|CODES|COFFEE|COLLEGE|COLOGNE|COM|COMMBANK|COMMUNITY|COMPANY|COMPARE|COMPUTER|COMSEC|CONDOS|CONSTRUCTION|CONSULTING|CONTACT|CONTRACTORS|COOKING|COOL|COOP|CORSICA|COUNTRY|COUPONS|COURSES|CR|CREDIT|CREDITCARD|CREDITUNION|CRICKET|CROWN|CRS|CRUISES|CSC|CU|CUISINELLA|CV|CW|CX|CY|CYMRU|CYOU|CZ|DABUR|DAD|DANCE|DATE|DATING|DATSUN|DAY|DCLK|DE|DEALER|DEALS|DEGREE|DELIVERY|DELL|DELTA|DEMOCRAT|DENTAL|DENTIST|DESI|DESIGN|DEV|DIAMONDS|DIET|DIGITAL|DIRECT|DIRECTORY|DISCOUNT|DJ|DK|DM|DNP|DO|DOCS|DOG|DOHA|DOMAINS|DOOSAN|DOWNLOAD|DRIVE|DUBAI|DURBAN|DVAG|DZ|EARTH|EAT|EC|EDEKA|EDU|EDUCATION|EE|EG|EMAIL|EMERCK|ENERGY|ENGINEER|ENGINEERING|ENTERPRISES|EPSON|EQUIPMENT|ER|ERNI|ES|ESQ|ESTATE|ET|EU|EUROVISION|EUS|EVENTS|EVERBANK|EXCHANGE|EXPERT|EXPOSED|EXPRESS|FAGE|FAIL|FAIRWINDS|FAITH|FAMILY|FAN|FANS|FARM|FASHION|FAST|FEEDBACK|FERRERO|FI|FILM|FINAL|FINANCE|FINANCIAL|FIRESTONE|FIRMDALE|FISH|FISHING|FIT|FITNESS|FJ|FK|FLIGHTS|FLORIST|FLOWERS|FLSMIDTH|FLY|FM|FO|FOO|FOOTBALL|FORD|FOREX|FORSALE|FORUM|FOUNDATION|FOX|FR|FRESENIUS|FRL|FROGANS|FUND|FURNITURE|FUTBOL|FYI|GA|GAL|GALLERY|GAME|GARDEN|GB|GBIZ|GD|GDN|GE|GEA|GENT|GENTING|GF|GG|GGEE|GH|GI|GIFT|GIFTS|GIVES|GIVING|GL|GLASS|GLE|GLOBAL|GLOBO|GM|GMAIL|GMO|GMX|GN|GOLD|GOLDPOINT|GOLF|GOO|GOOG|GOOGLE|GOP|GOT|GOV|GP|GQ|GR|GRAINGER|GRAPHICS|GRATIS|GREEN|GRIPE|GROUP|GS|GT|GU|GUCCI|GUGE|GUIDE|GUITARS|GURU|GW|GY|HAMBURG|HANGOUT|HAUS|HEALTH|HEALTHCARE|HELP|HELSINKI|HERE|HERMES|HIPHOP|HITACHI|HIV|HK|HM|HN|HOCKEY|HOLDINGS|HOLIDAY|HOMEDEPOT|HOMES|HONDA|HORSE|HOST|HOSTING|HOTELES|HOTMAIL|HOUSE|HOW|HR|HSBC|HT|HU|HYUNDAI|IBM|ICBC|ICE|ICU|ID|IE|IFM|IINET|IL|IM|IMMO|IMMOBILIEN|IN|INDUSTRIES|INFINITI|INFO|ING|INK|INSTITUTE|INSURANCE|INSURE|INT|INTERNATIONAL|INVESTMENTS|IO|IPIRANGA|IQ|IR|IRISH|IS|ISELECT|IST|ISTANBUL|IT|ITAU|IWC|JAGUAR|JAVA|JCB|JE|JETZT|JEWELRY|JLC|JLL|JM|JMP|JO|JOBS|JOBURG|JOT|JOY|JP|JPRS|JUEGOS|KAUFEN|KDDI|KE|KFH|KG|KH|KI|KIA|KIM|KINDER|KITCHEN|KIWI|KM|KN|KOELN|KOMATSU|KP|KPN|KR|KRD|KRED|KW|KY|KYOTO|KZ|LA|LACAIXA|LAMBORGHINI|LAMER|LANCASTER|LAND|LANDROVER|LANXESS|LASALLE|LAT|LATROBE|LAW|LAWYER|LB|LC|LDS|LEASE|LECLERC|LEGAL|LEXUS|LGBT|LI|LIAISON|LIDL|LIFE|LIFEINSURANCE|LIFESTYLE|LIGHTING|LIKE|LIMITED|LIMO|LINCOLN|LINDE|LINK|LIVE|LIVING|LIXIL|LK|LOAN|LOANS|LOL|LONDON|LOTTE|LOTTO|LOVE|LR|LS|LT|LTD|LTDA|LU|LUPIN|LUXE|LUXURY|LV|LY|MA|MADRID|MAIF|MAISON|MAKEUP|MAN|MANAGEMENT|MANGO|MARKET|MARKETING|MARKETS|MARRIOTT|MBA|MC|MD|ME|MED|MEDIA|MEET|MELBOURNE|MEME|MEMORIAL|MEN|MENU|MEO|MG|MH|MIAMI|MICROSOFT|MIL|MINI|MK|ML|MM|MMA|MN|MO|MOBI|MOBILY|MODA|MOE|MOI|MOM|MONASH|MONEY|MONTBLANC|MORMON|MORTGAGE|MOSCOW|MOTORCYCLES|MOV|MOVIE|MOVISTAR|MP|MQ|MR|MS|MT|MTN|MTPC|MTR|MU|MUSEUM|MUTUELLE|MV|MW|MX|MY|MZ|NA|NADEX|NAGOYA|NAME|NAVY|NC|NE|NEC|NET|NETBANK|NETWORK|NEUSTAR|NEW|NEWS|NEXUS|NF|NG|NGO|NHK|NI|NICO|NINJA|NISSAN|NL|NO|NOKIA|NORTON|NOWRUZ|NP|NR|NRA|NRW|NTT|NU|NYC|NZ|OBI|OFFICE|OKINAWA|OM|OMEGA|ONE|ONG|ONL|ONLINE|OOO|ORACLE|ORANGE|ORG|ORGANIC|ORIGINS|OSAKA|OTSUKA|OVH|PA|PAGE|PAMPEREDCHEF|PANERAI|PARIS|PARS|PARTNERS|PARTS|PARTY|PE|PET|PF|PG|PH|PHARMACY|PHILIPS|PHOTO|PHOTOGRAPHY|PHOTOS|PHYSIO|PIAGET|PICS|PICTET|PICTURES|PID|PIN|PING|PINK|PIZZA|PK|PL|PLACE|PLAY|PLAYSTATION|PLUMBING|PLUS|PM|PN|POHL|POKER|PORN|POST|PR|PRAXI|PRESS|PRO|PROD|PRODUCTIONS|PROF|PROMO|PROPERTIES|PROPERTY|PROTECTION|PS|PT|PUB|PW|PY|QA|QPON|QUEBEC|RACING|RE|READ|REALTOR|REALTY|RECIPES|RED|REDSTONE|REDUMBRELLA|REHAB|REISE|REISEN|REIT|REN|RENT|RENTALS|REPAIR|REPORT|REPUBLICAN|REST|RESTAURANT|REVIEW|REVIEWS|REXROTH|RICH|RICOH|RIO|RIP|RO|ROCHER|ROCKS|RODEO|ROOM|RS|RSVP|RU|RUHR|RUN|RW|RWE|RYUKYU|SA|SAARLAND|SAFE|SAFETY|SAKURA|SALE|SALON|SAMSUNG|SANDVIK|SANDVIKCOROMANT|SANOFI|SAP|SAPO|SARL|SAS|SAXO|SB|SBS|SC|SCA|SCB|SCHAEFFLER|SCHMIDT|SCHOLARSHIPS|SCHOOL|SCHULE|SCHWARZ|SCIENCE|SCOR|SCOT|SD|SE|SEAT|SECURITY|SEEK|SELECT|SENER|SERVICES|SEVEN|SEW|SEX|SEXY|SFR|SG|SH|SHARP|SHELL|SHIA|SHIKSHA|SHOES|SHOW|SHRIRAM|SI|SINGLES|SITE|SJ|SK|SKI|SKIN|SKY|SKYPE|SL|SM|SMILE|SN|SNCF|SO|SOCCER|SOCIAL|SOFTBANK|SOFTWARE|SOHU|SOLAR|SOLUTIONS|SONY|SOY|SPACE|SPIEGEL|SPREADBETTING|SR|SRL|ST|STADA|STAR|STARHUB|STATEFARM|STATOIL|STC|STCGROUP|STOCKHOLM|STORAGE|STUDIO|STUDY|STYLE|SU|SUCKS|SUPPLIES|SUPPLY|SUPPORT|SURF|SURGERY|SUZUKI|SV|SWATCH|SWISS|SX|SY|SYDNEY|SYMANTEC|SYSTEMS|SZ|TAB|TAIPEI|TAOBAO|TATAMOTORS|TATAR|TATTOO|TAX|TAXI|TC|TCI|TD|TEAM|TECH|TECHNOLOGY|TEL|TELEFONICA|TEMASEK|TENNIS|TF|TG|TH|THD|THEATER|THEATRE|TICKETS|TIENDA|TIFFANY|TIPS|TIRES|TIROL|TJ|TK|TL|TM|TMALL|TN|TO|TODAY|TOKYO|TOOLS|TOP|TORAY|TOSHIBA|TOURS|TOWN|TOYOTA|TOYS|TR|TRADE|TRADING|TRAINING|TRAVEL|TRAVELERS|TRAVELERSINSURANCE|TRUST|TRV|TT|TUBE|TUI|TUSHU|TV|TW|TZ|UA|UBS|UG|UK|UNIVERSITY|UNO|UOL|US|UY|UZ|VA|VACATIONS|VANA|VC|VE|VEGAS|VENTURES|VERISIGN|VERSICHERUNG|VET|VG|VI|VIAJES|VIDEO|VILLAS|VIN|VIP|VIRGIN|VISION|VISTA|VISTAPRINT|VIVA|VLAANDEREN|VN|VODKA|VOLKSWAGEN|VOTE|VOTING|VOTO|VOYAGE|VU|WALES|WALTER|WANG|WANGGOU|WATCH|WATCHES|WEATHER|WEBCAM|WEBER|WEBSITE|WED|WEDDING|WEIR|WF|WHOSWHO|WIEN|WIKI|WILLIAMHILL|WIN|WINDOWS|WINE|WME|WORK|WORKS|WORLD|WS|WTC|WTF|XBOX|XEROX|XIN|XN--11B4C3D|XN--1QQW23A|XN--30RR7Y|XN--3BST00M|XN--3DS443G|XN--3E0B707E|XN--3PXU8K|XN--42C2D9A|XN--45BRJ9C|XN--45Q11C|XN--4GBRIM|XN--55QW42G|XN--55QX5D|XN--6FRZ82G|XN--6QQ986B3XL|XN--80ADXHKS|XN--80AO21A|XN--80ASEHDB|XN--80ASWG|XN--90A3AC|XN--90AIS|XN--9DBQ2A|XN--9ET52U|XN--B4W605FERD|XN--C1AVG|XN--C2BR7G|XN--CG4BKI|XN--CLCHC0EA0B2G2A9GCD|XN--CZR694B|XN--CZRS0T|XN--CZRU2D|XN--D1ACJ3B|XN--D1ALF|XN--ECKVDTC9D|XN--EFVY88H|XN--ESTV75G|XN--FHBEI|XN--FIQ228C5HS|XN--FIQ64B|XN--FIQS8S|XN--FIQZ9S|XN--FJQ720A|XN--FLW351E|XN--FPCRJ9C3D|XN--FZC2C9E2C|XN--G2XX48C|XN--GECRJ9C|XN--H2BRJ9C|XN--HXT814E|XN--I1B6B1A6A2E|XN--IMR513N|XN--IO0A7I|XN--J1AEF|XN--J1AMH|XN--J6W193G|XN--JLQ61U9W7B|XN--KCRX77D1X4A|XN--KPRW13D|XN--KPRY57D|XN--KPU716F|XN--KPUT3I|XN--L1ACC|XN--LGBBAT1AD8J|XN--MGB9AWBF|XN--MGBA3A3EJT|XN--MGBA3A4F16A|XN--MGBAAM7A8H|XN--MGBAB2BD|XN--MGBAYH7GPA|XN--MGBB9FBPOB|XN--MGBBH1A71E|XN--MGBC0A9AZCG|XN--MGBERP4A5D4AR|XN--MGBPL2FH|XN--MGBT3DHD|XN--MGBTX2B|XN--MGBX4CD0AB|XN--MK1BU44C|XN--MXTQ1M|XN--NGBC5AZD|XN--NGBE9E0A|XN--NODE|XN--NQV7F|XN--NQV7FS00EMA|XN--NYQY26A|XN--O3CW4H|XN--OGBPF8FL|XN--P1ACF|XN--P1AI|XN--PBT977C|XN--PGBS0DH|XN--PSSY2U|XN--Q9JYB4C|XN--QCKA1PMC|XN--QXAM|XN--RHQV96G|XN--S9BRJ9C|XN--SES554G|XN--T60B56A|XN--TCKWE|XN--UNUP4Y|XN--VERMGENSBERATER-CTB|XN--VERMGENSBERATUNG-PWB|XN--VHQUV|XN--VUQ861B|XN--WGBH1C|XN--WGBL6A|XN--XHQ521B|XN--XKC2AL3HYE2A|XN--XKC2DL3A5EE0H|XN--Y9A3AQ|XN--YFRO4I67O|XN--YGBI2AMMX|XN--ZFR164B|XPERIA|XXX|XYZ|YACHTS|YAMAXUN|YANDEX|YE|YODOBASHI|YOGA|YOKOHAMA|YOUTUBE|YT|ZA|ZARA|ZERO|ZIP|ZM|ZONE|ZUERICH|ZW)\b

That easily discarded emails like [email protected], [email protected], etc.

The domain name can be further edited if needed, e.g., specific country domain, etc.

Another list of top level domains that updates frequently.

3 Comments

Like pointed out in multiple comments to other answers here already, the list of valid TLDs is growing rapidly. Your "2-letter ccTLD or one of big-6, info, mobi, etc" would have been reasonable five years ago, but no longer works at all reliably.
Even at time of original writing, this was already invalid by a couple hundred TLD's. As of currently, you're missing a little under 1200 possibilities (and growing at a pretty regular rate) Current list of valid domains: data.iana.org/TLD/tlds-alpha-by-domain.txt
Nearly 8,000 characters in a regular expression? There is something seriously wrong. Can't it be layered, refactored, split into several pieces, or similar instead of one big regular expression? There is presumably a lot of redundancy in it.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.