Does PHP's Filter_Var Filter_Validate_Email Actually Work

Does PHP's filter_var FILTER_VALIDATE_EMAIL actually work?

The regular expression used in the PHP 5.3.3 filter code is based on Michael Rushton's blog about Email Address Validation. It does seem to work for the case you mention.

You could also check out some of the options in Comparing E-mail Address Validating Regular Expressions (the regexp currently used in PHP is one of those tested).

Then you could choose a regexp you like better, and use it in a call to preg_match().

Or else you could take the regexp and replace the one in file PHP/ext/filter/logical_filter.c, function php_filter_validate_email(), and rebuild PHP.

Should I use filter_var to validate email?

Yes, you should.

Using the standard library validation instead of a home brew one has multiple benefits:

  1. Many eyeballs already seen the code (well, at least two) you will be using, hopefully with experience in email validation even before it's merged into a release.
  2. It's unit tested.
  3. Other people will use the same check and report bugs and you get these fixes for free on php updates.

However checking the format of an email address is only the first line of defense, if you really want to know that it's real or not, you will have to send a message to it.

PHP FILTER_VALIDATE_EMAIL does not work correctly

Validating e-mail adresses is kinda complicated.
Take a look at this list:

Valid email addresses

  1. niceandsimple@example.com
  2. very.common@example.com
  3. a.little.lengthy.but.fine@dept.example.com
  4. disposable.style.email.with+symbol@example.com
  5. user@[IPv6:2001:db8:1ff::a0b:dbd0]
  6. "much.more unusual"@example.com
  7. "very.unusual.@.unusual.com"@example.com
  8. "very.(),:;<>[]".VERY."very@\
    "very".unusual"@strange.example.com
  9. postbox@com (top-level domains are valid hostnames)
  10. admin@mailserver1 (local domain name with no TLD)
  11. !#$%&'*+-/=?^_`{}|~@example.org
  12. "()<>[]:,;@\"!#$%&'*+-/=?^_`{}| ~.a"@example.org
  13. " "@example.org (space between the quotes)
  14. üñîçøðé@example.com (Unicode characters in local part)

Invalid email addresses

  1. Abc.example.com (an @ character must separate the local and domain
    parts)
  2. A@b@c@example.com (only one @ is allowed outside quotation marks)
  3. a"b(c)d,e:f;gi[j\k]l@example.com (none of the special characters
    in this local part are allowed outside quotation marks)
  4. just"not"right@example.com (quoted strings must be dot separated, or
    the only element making up the local-part)
  5. this is"not\allowed@example.com (spaces, quotes, and backslashes may
    only exist when within quoted strings and preceded by a backslash)
  6. this\ still"not\allowed@example.com (even if escaped (preceded by
    a backslash), spaces, quotes, and backslashes must still be
    contained by quotes)

Source http://en.wikipedia.org/wiki/Email_address

Allmost all e-mail validation implementations are "bugged" but the php implementation is fine to work with because it accepts all common e-mail adresses

UPDATE:

Found on http://www.php.net/manual/en/filter.filters.validate.php

Regarding "partial" addresses with no . in the domain part, a comment in the source code (in ext/filter/logical_filters.c) justifies this rejection thus:

 * The regex below is based on a regex by Michael Rushton.
* However, it is not identical. I changed it to only consider routeable
* addresses as valid. Michael's regex considers a@b a valid address
* which conflicts with section 2.3.5 of RFC 5321 which states that:
*
* Only resolvable, fully-qualified domain names (FQDNs) are permitted
* when domain names are used in SMTP. In other words, names that can
* be resolved to MX RRs or address (i.e., A or AAAA) RRs (as discussed
* in Section 5) are permitted, as are CNAME RRs whose targets can be
* resolved, in turn, to MX or address RRs. Local nicknames or
* unqualified names MUST NOT be used.

And here is a link to the class from Michael Rushton (link broken see source below)
Which supports both RFC 5321/5322

<?php
/**
* Squiloople Framework
*
* LICENSE: Feel free to use and redistribute this code.
*
* @author Michael Rushton <michael@squiloople.com>
* @link http://squiloople.com/
* @package Squiloople
* @version 1.0
* @copyright © 2012 Michael Rushton
*/
/**
* Email Address Validator
*
* Validate email addresses according to the relevant standards
*/
final class EmailAddressValidator
{
// The RFC 5321 constant
const RFC_5321 = 5321;
// The RFC 5322 constant
const RFC_5322 = 5322;
/**
* The email address
*
* @access private
* @var string $_email_address
*/
private $_email_address;
/**
* A quoted string local part is either allowed (true) or not (false)
*
* @access private
* @var boolean $_quoted_string
*/
private $_quoted_string = FALSE;
/**
* An obsolete local part is either allowed (true) or not (false)
*
* @access private
* @var boolean $_obsolete
*/
private $_obsolete = FALSE;
/**
* A basic domain name is either required (true) or not (false)
*
* @access private
* @var boolean $_basic_domain_name
*/
private $_basic_domain_name = TRUE;
/**
* A domain literal domain is either allowed (true) or not (false)
*
* @access private
* @var boolean $_domain_literal
*/
private $_domain_literal = FALSE;
/**
* Comments and folding white spaces are either allowed (true) or not (false)
*
* @access private
* @var boolean $_cfws
*/
private $_cfws = FALSE;
/**
* Set the email address and turn on the relevant standard if required
*
* @access public
* @param string $email_address
* @param null|integer $standard
*/
public function __construct($email_address, $standard = NULL)
{
// Set the email address
$this->_email_address = $email_address;
// Set the relevant standard or throw an exception if an unknown is requested
switch ($standard)
{
// Do nothing if no standard requested
case NULL:
break;
// Otherwise if RFC 5321 requested
case self::RFC_5321:
$this->setStandard5321();
break;
// Otherwise if RFC 5322 requested
case self::RFC_5322:
$this->setStandard5322();
break;
// Otherwise throw an exception
default:
throw new Exception('Unknown RFC standard for email address validation.');
}
}
/**
* Call the constructor fluently
*
* @access public
* @static
* @param string $email_address
* @param null|integer $standard
* @return EmailAddressValidator
*/
public static function setEmailAddress($email_address, $standard = NULL)
{
return new self($email_address, $standard);
}
/**
* Validate the email address using a basic standard
*
* @access public
* @return EmailAddressValidator
*/
public function setStandardBasic()
{
// A quoted string local part is not allowed
$this->_quoted_string = FALSE;
// An obsolete local part is not allowed
$this->_obsolete = FALSE;
// A basic domain name is required
$this->_basic_domain_name = TRUE;
// A domain literal domain is not allowed
$this->_domain_literal = FALSE;
// Comments and folding white spaces are not allowed
$this->_cfws = FALSE;
// Return the EmailAddressValidator object
return $this;
}
/**
* Validate the email address using RFC 5321
*
* @access public
* @return EmailAddressValidator
*/
public function setStandard5321()
{
// A quoted string local part is allowed
$this->_quoted_string = TRUE;
// An obsolete local part is not allowed
$this->_obsolete = FALSE;
// Only a basic domain name is not required
$this->_basic_domain_name = FALSE;
// A domain literal domain is allowed
$this->_domain_literal = TRUE;
// Comments and folding white spaces are not allowed
$this->_cfws = FALSE;
// Return the EmailAddressValidator object
return $this;
}
/**
* Validate the email address using RFC 5322
*
* @access public
* @return EmailAddressValidator
*/
public function setStandard5322()
{
// A quoted string local part is disallowed
$this->_quoted_string = FALSE;
// An obsolete local part is allowed
$this->_obsolete = TRUE;
// Only a basic domain name is not required
$this->_basic_domain_name = FALSE;
// A domain literal domain is allowed
$this->_domain_literal = TRUE;
// Comments and folding white spaces are allowed
$this->_cfws = TRUE;
// Return the EmailAddressValidator object
return $this;
}
/**
* Either allow (true) or do not allow (false) a quoted string local part
*
* @access public
* @param boolean $allow
* @return EmailAddressValidator
*/
public function setQuotedString($allow = TRUE)
{
// Either allow (true) or do not allow (false) a quoted string local part
$this->_quoted_string = $allow;
// Return the EmailAddressValidator object
return $this;
}
/**
* Either allow (true) or do not allow (false) an obsolete local part
*
* @access public
* @param boolean $allow
* @return EmailAddressValidator
*/
public function setObsolete($allow = TRUE)
{
// Either allow (true) or do not allow (false) an obsolete local part
$this->_obsolete = $allow;
// Return the EmailAddressValidator object
return $this;
}
/**
* Either require (true) or do not require (false) a basic domain name
*
* @access public
* @param boolean $allow
* @return EmailAddressValidator
*/
public function setBasicDomainName($allow = TRUE)
{
// Either require (true) or do not require (false) a basic domain name
$this->_basic_domain_name = $allow;
// Return the EmailAddressValidator object
return $this;
}
/**
* Either allow (true) or do not allow (false) a domain literal domain
*
* @access public
* @param boolean $allow
* @return EmailAddressValidator
*/
public function setDomainLiteral($allow = TRUE)
{
// Either allow (true) or do not allow (false) a domain literal domain
$this->_domain_literal = $allow;
// Return the EmailAddressValidator object
return $this;
}
/**
* Either allow (true) or do not allow (false) comments and folding white spaces
*
* @access public
* @param boolean $allow
* @return EmailAddressValidator
*/
public function setCFWS($allow = TRUE)
{
// Either allow (true) or do not allow (false) comments and folding white spaces
$this->_cfws = $allow;
// Return the EmailAddressValidator object
return $this;
}
/**
* Return the regular expression for a dot atom local part
*
* @access private
* @return string
*/
private function _getDotAtom()
{
return "([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*";
}
/**
* Return the regular expression for a quoted string local part
*
* @access private
* @return string
*/
private function _getQuotedString()
{
return '"(?>[ !#-\[\]-~]|\\\[ -~])*"';
}
/**
* Return the regular expression for an obsolete local part
*
* @access private
* @return string
*/
private function _getObsolete()
{
return '([!#-\'*+\/-9=?^-~-]+|"(?>'
. $this->_getFWS()
. '(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\xFF]))*'
. $this->_getFWS()
. '")(?>'
. $this->_getCFWS()
. '\.'
. $this->_getCFWS()
. '(?1))*';
}
/**
* Return the regular expression for a domain name domain
*
* @access private
* @return string
*/
private function _getDomainName()
{
// Return the basic domain name format if required
if ($this->_basic_domain_name)
{
return '(?>' . $this->_getDomainNameLengthLimit()
. '[a-z\d](?>[a-z\d-]*[a-z\d])?'
. $this->_getCFWS()
. '\.'
. $this->_getCFWS()
. '){1,126}[a-z]{2,6}';
}
// Otherwise return the full domain name format
return $this->_getDomainNameLengthLimit()
. '([a-z\d](?>[a-z\d-]*[a-z\d])?)(?>'
. $this->_getCFWS()
. '\.'
. $this->_getDomainNameLengthLimit()
. $this->_getCFWS()
. '(?2)){0,126}';
}
/**
* Return the regular expression for an IPv6 address
*
* @access private
* @return string
*/
private function _getIPv6()
{
return '([a-f\d]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f\d][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?';
}
/**
* Return the regular expression for an IPv4-mapped IPv6 address
*
* @access private
* @return string
*/
private function _getIPv4MappedIPv6()
{
return '(?3)(?>:(?3)){5}:|(?!(?:.*[a-f\d]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?';
}
/**
* Return the regular expression for an IPv4 address
*
* @access private
* @return string
*/
private function _getIPv4()
{
return '(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?>\.(?6)){3}';
}
/**
* Return the regular expression for a domain literal domain
*
* @access private
* @return string
*/
private function _getDomainLiteral()
{
return '\[(?:(?>IPv6:(?>'
. $this->_getIPv6()
. '))|(?>(?>IPv6:(?>'
. $this->_getIPv4MappedIPv6()
. '))?'
. $this->_getIPv4()
. '))\]';
}
/**
* Return either the regular expression for folding white spaces or its backreference
*
* @access private
* @param boolean $define
* @return string
*/
private function _getFWS($define = FALSE)
{
// Return the backreference if $define is set to FALSE otherwise return the regular expression
if ($this->_cfws)
{
return !$define ? '(?P>fws)' : '(?<fws>(?>(?>(?>\x0D\x0A)?[\t ])+|(?>[\t ]*\x0D\x0A)?[\t ]+)?)';
}
}
/**
* Return the regular expression for comments
*
* @access private
* @return string
*/
private function _getComments()
{
return '(?<comment>\((?>'
. $this->_getFWS()
. '(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?P>comment)))*'
. $this->_getFWS()
. '\))';
}
/**
* Return either the regular expression for comments and folding white spaces or its backreference
*
* @access private
* @param boolean $define
* @return string
*/
private function _getCFWS($define = FALSE)
{
// Return the backreference if $define is set to FALSE
if ($this->_cfws && !$define)
{
return '(?P>cfws)';
}
// Otherwise return the regular expression
if ($this->_cfws)
{
return '(?<cfws>(?>(?>(?>'
. $this->_getFWS(TRUE)
. $this->_getComments()
. ')+'
. $this->_getFWS()
. ')|'
. $this->_getFWS()
. ')?)';
}
}
/**
* Establish and return the valid format for the local part
*
* @access private
* @return string
*/
private function _getLocalPart()
{
// The local part may be obsolete if allowed
if ($this->_obsolete)
{
return $this->_getObsolete();
}
// Otherwise the local part must be either a dot atom or a quoted string if the latter is allowed
if ($this->_quoted_string)
{
return '(?>' . $this->_getDotAtom() . '|' . $this->_getQuotedString() . ')';
}
// Otherwise the local part must be a dot atom
return $this->_getDotAtom();
}
/**
* Establish and return the valid format for the domain
*
* @access private
* @return string
*/
private function _getDomain()
{
// The domain must be either a domain name or a domain literal if the latter is allowed
if ($this->_domain_literal)
{
return '(?>' . $this->_getDomainName() . '|' . $this->_getDomainLiteral() . ')';
}
// Otherwise the domain must be a domain name
return $this->_getDomainName();
}
/**
* Return the email address length limit
*
* @access private
* @return string
*/
private function _getEmailAddressLengthLimit()
{
return '(?!(?>' . $this->_getCFWS() . '"?(?>\\\[ -~]|[^"])"?' . $this->_getCFWS() . '){255,})';
}
/**
* Return the local part length limit
*
* @access private
* @return string
*/
private function _getLocalPartLengthLimit()
{
return '(?!(?>' . $this->_getCFWS() . '"?(?>\\\[ -~]|[^"])"?' . $this->_getCFWS() . '){65,}@)';
}
/**
* Establish and return the domain name length limit
*
* @access private
* @return string
*/
private function _getDomainNameLengthLimit()
{
return '(?!' . $this->_getCFWS() . '[a-z\d-]{64,})';
}
/**
* Check to see if the domain can be resolved to MX RRs
*
* @access private
* @param array $domain
* @return integer|boolean
*/
private function _verifyDomain($domain)
{
// Return 0 if the domain cannot be resolved to MX RRs
if (!checkdnsrr(end($domain), 'MX'))
{
return 0;
}
// Otherwise return true
return TRUE;
}
/**
* Perform the validation check on the email address's syntax and, if required, call _verifyDomain()
*
* @access public
* @param boolean $verify
* @return boolean|integer
*/
public function isValid($verify = FALSE)
{
// Return false if the email address has an incorrect syntax
if (!preg_match(
'/^'
. $this->_getEmailAddressLengthLimit()
. $this->_getLocalPartLengthLimit()
. $this->_getCFWS()
. $this->_getLocalPart()
. $this->_getCFWS()
. '@'
. $this->_getCFWS()
. $this->_getDomain()
. $this->_getCFWS(TRUE)
. '$/isD'
, $this->_email_address
))
{
return FALSE;
}
// Otherwise check to see if the domain can be resolved to MX RRs if required
if ($verify)
{
return $this->_verifyDomain(explode('@', $this->_email_address));
}
// Otherwise return 1
return 1;
}
}

filter_var W/ FILTER_VALIDATE_EMAIL vs custom REGEX

I would use Filter_validte_email if you don't have a lot of experience validating email or writing regex. It is also important to remember that any regex or filter will not prevent emails that bounce back, they will only check to make sure the entered address looks like it should. Thus you will get a valid check on noaddress@nodomain.tld even though this clearly will not be delivered unless nodomain.tld happens to exist and there happens to be a user noaddress with a mail account there.

See How to check if an email address exists without sending an email?

How consistent is FILTER_VALIDATE_EMAIL?

As you can see on http://3v4l.org/vKONS the usage of the filter FILTER_VALIDATE_EMAIL it is not consistent!

http://3v4l.org/vKONS outputs for PHP 5.2.0, 5.2.14 - 5.2.17, 5.3.3 - 5.3.18, 5.4.0 - 5.4.8

string(37) ""this is a valid address"@example.com" 
bool(false)

and for 5.2.1 - 5.2.13, 5.3.0 - 5.3.2

string(37) ""this is a valid address"@example.com" 
string(37) ""this is a valid address"@example.com"

It is remarkable that it worked for 5.2.0 but not 5.2.1-5.2.13 and then again for 5.2.14!!!

Btw 3v4l.org is a great resource to check such behavior changes across all available PHP versions.

There are several bugs open including the term FILTER_VALIDATE_EMAIL, but none seems to match your kind of error. You might add it to the PHP bugtracker...

Zend_Validate_EmailAddress versus filter_var(..., FILTER_VALIDATE_EMAIL)

Both can be used to validate an email address, but Zend_Validate_EmailAddress is more powerful. While filter_var is a simple yes or no validator, there are many options which can change how strict Zend_Validate_EmailAddress is.

You can choose which parts to validate, rules for validating those parts, and even to validate MX records.

Finally, Zend_Validate_EmailAddress can be readily used in combination with Zend_Filter_Input and Zend_Form where filter_var can't.

PHP are patterns still necessary or does filter_var take care of it all

If by patterns you mean regular expressions, then the answer to your question is yes. Why? The built in filters may not sanitize or validate your data exactly how you want. The filters may be overly broad, or they may conform too rigidly to standards for your particular circumstance. The filters many not actually conform to standards at all.

For example, FILTER_SANITIZE_EMAIL and FILTER_VALIDATE_EMAIL might might allow strange email addresses that, while technically legal in the RFC sense, may be undesirable depending on your needs. It is up to you as the developer, the creator of your application, to decide what you really want to accept for an e-mail address.

The PHP filter creators understood that one size fits all is an impractical proposition. Therefore, you can supply your own sanitizing/validating filter with FILTER_CALLBACK and your own validating filter using FILTER_VALIDATE_REGEXP. Are we back at square one? Are we better off?

The real question is are you willing to buy in and accept the "filtering framework/methodology" established by the PHP filter system. Do I? I use their filter system as a first pass, then I use my own carefully crafted sanitizers and validators (yes, I use both FILTER_CALLBACK and FILTER_VALIDATE_REGEXP on top of the generic sanitizers/validators). This is especially true for me when processing HTML forms, as I no longer use $_POST and $_GET. I use filter_input_array() .

So, Mr. Smithyyy, don't reinvent the wheel, but do think for yourself. The key to using the PHP filter system is to create a system, and for some (like me) that means wrapping the filter functions in class. Using various class properties that might store predefined filters, one could imagine a system where various methods, using loops, filter all your data, leaving you with the final output of a good array, or a bad one (which you can take action on, based on your particular circumstance). But, as Mr. Wall of the Perl community notes, "There's more than one way to do it."



Related Topics



Leave a reply



Submit