Regular expression for email addresses (replacing the Visual Studio 2008 default).

Published 9 January 9 11:19 AM | Phil Gilmore

Email validation regular expression

Phil Gilmore

I recently had a user report that they couldn't register on my client's site.  The site's registration page reported that their email address was invalid.  The email address was indeed strange but valid nonetheless.  We were using a regular expression validator control to check the email address on registration.  The problem is that the regular expression that Visual Studio designer put in there for email addresses doesn't recognize it.

 

Here is the regular expression that the Visual Studio 2008 designer conveniently put into the page on our behalf.

 

\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*

 

This has the following problems:

  • Rules only apply at a single point in the input.  Any character can preceed or follow any valid email address including spaces, punctuation characters, periods, @ symbols, etc.  For example, the input ~!@#$%^&*()_+=-[]{}';":/.,?>< nobody@nowhere.com~! @#$%^&*()_+=-[]{}';":/.,?>< would pass this validation.
  • User name may contain multiple consecutive periods.  For example, the input nobody..lives........here@nowhere.com would pass this validation.
  • Domain name may contain multiple consecutive periods after the first single period.  For example, the input nobody@nowhere.co....uk would pass this validation.
  • Many valid email addresses will not pass this validation.  For example, the input nobody-@nowhere.com would fail.

 

Obviously, there are some flaws here.  If you are familiar with regular expressions, you can see immediately that it's missing the ^ and $ restrictions, for example.  Rather than try to massage this one into compliance, I started with a new one.  There are probably a million of these on the web and no doubt some are better than mine.  But I thought I'd blog it anyway since it's done and working.  Here is what I came up with.

 

^([\w-_]+\.)*[\w-_]+@([\w-_]+\.)*[\w-_]+\.[\w-_]+$

 

This regular expression has the following attributes:

  1. User name must be one character or more.
  2. User name may contain one ore more periods.
  3. User name must not begin or end with a period.
  4. Double contiguous periods are not allowed in the user name.
  5. User name must only contain characters a-z, A-Z, 0-9, hyphens, underscores and single periods.
  6. Domain name must be three characters or more.
  7. Domain name must contain one or more periods.
  8. Domain name may not begin or end with a period.
  9. Double contiguous periods are not allowed in the domain name.
  10. User name must only contain characters a-z, A-Z, 0-9, hyphens, underscores and single periods.
  11. And single @ symbol is required between the user name and the domain name.
  12. Allowed special characters (hyphen and underscore) are permitted at any frequency anywhere in the user name or domain name.

Of course, you may choose to add more allowed special characters.   Although I've never seen one in an email address, you may need to allow the plus (+) character, for example.  This is easy.  Just change all instances of \w to \w\+ and it will permit it.

I tested this regular expression against all the email addresses in the client's user database (about 5500 addresses).  All email addresses that failed were either obviously invalid or outright blank (imported from another process, never validated against a regular expression).  Out of 5500 email addresses, only 9 failed which weren't blank and they were all invalid.

 

Phil Gilmore (www.interactiveasp.net)

Filed under:

Comments

# Naveen said on June 9, 2009 7:59 AM:

hi  had used this regular expression to validate the email id \w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)* in java script but it is not working correctly please offer me a correct expression.....!

here is the code i had used for validation:

var email=document.getElementById("TextBox1").value;

    if(email =="")

{

       alert("Enter U R E-Mail Id.");

document.getElementById("TextBox1").value="";

document.getElementById("TextBox1").focus();

// document.getElementById("TextBox1").select();

//

return false;

}

else

{

//var mail=/^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/;

var mail=/^[\w][\w\.-]*[\w]@[\w][\w\.-]*[\w]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$/

if(mail.test(document.frm.email.value))

{

}

else

{

alert("Invalid E-mail Address! Please Re-enter.");

document.getElementById("TextBox1").value="";

document.getElementById("TextBox1").focus();

// document.getElementById("TextBox1").select();

return false;

}

}

return true;

}

# Bear said on June 30, 2009 2:49 AM:

.NET treats expressions used with regular expression validator controls as if they begin with ^ and end with $ even if they don't which is why it isn't in the default.

\w already encompasses letters, numbers, AND UNDERSCORES so [\w-_] could be written as [\w-] instead... plus, including an unescaped hyphen anywhere other than the end is bad practise because if neither side is a class shorthand you will either allow all characters between those either side of the hyphen (but not the hyphen unless it is part of that range) or will get a big fat error if the range is impossible.

Beside all this I recommend you read the RFCs that apply to email addresses and domains as your list of target features misses the mark by a considerable margin even if your own database doesn't exploit any of the missing options.