Filter_Sanitize VS Filter Validate, Whats the Difference - and Which to Use

FILTER_SANITIZE vs FILTER VALIDATE, whats the difference - and which to use?

It depends on what you need or is suitable for your application, really. One would validate it, and say "Yes, this is (or isn't) a valid float", while the other would clean it for any non-acceptable value and return that, and not say anything if the original input was valid or not to begin with.

The same applies for the other FILTER_SANITIZE_* and FILTER_VALIDATE_*constants, but in this example we'll look at floating-point validation and sanitation, as asked in the original question.

Let's take a look!

$float = 0.032;
$not_float = "0.03b2";

var_dump(filter_var($float, FILTER_SANITIZE_NUMBER_FLOAT, FILTER_FLAG_ALLOW_FRACTION));
var_dump(filter_var($not_float, FILTER_SANITIZE_NUMBER_FLOAT, FILTER_FLAG_ALLOW_FRACTION));

var_dump(filter_var($float, FILTER_VALIDATE_FLOAT));
var_dump(filter_var($not_float, FILTER_VALIDATE_FLOAT));

The return from the above dumps would be

string(5) "0.032"  // $float          FILTER_SANITIZE_NUMBER_FLOAT
string(5) "0.032" // $not_float FILTER_SANITIZE_NUMBER_FLOAT
float(0.032) // $float FILTER_VALIDATE_FLOAT
bool(false) // $not_float FILTER_VALIDATE_FLOAT

FILTER_SANITIZE_NUMBER_FLOAT would return a string of the sanitized value (PHP isn't a strongly typed language, so "0.032" == 0.032).

You should also note the FILTER_FLAG_ALLOW_FRACTION flag, which keeps the decimal in place (without that flag it would return 0032).

As you can see, any FILTER_VALIDATE_FLOAT would return a boolean false if it isn't a valid float, and the actual floating value if it was valid (which is a "truthy" value). Keep in mind that 0.00 would be a "falsy" value, so if you wish to check if the validation failed, you should use strict comparison, in case the input was zero, but still valid.

if (filter_var($input, FILTER_VALIDATE_FLOAT) === false) {
// Oh noes! $input wasn't a valid float!
}

You can see it for yourself in this live demo.

To conclude
If you want to use it in calculations, you might want to validate it, and let the user know that its invalid format, but you could sanitize it, and use it anyway.

Other filters
The examle here shows the usage of FILTER_SANITIZE_FLOAT, but there are other validation and santation filters. See the below links for a full description.

  • List of validation filters
  • List of sanitation filters

When to filter/sanitize data: before database insertion or before display?

When it comes to displaying user submitted data, the generally accepted mantra is to "Filter input, escape output."

I would recommend against escaping things like html entities, etc, before going into the database, because you never know when HTML will not be your display medium. Also, different types of situations require different types of output escaping. For example, embedding a string in Javascript requires different escaping than in HTML. Doing this before may lull yourself into a false sense of security.

So, the basic rule of thumb is, sanitize before use and specifically for that use; not pre-emptively.

(Please note, I am not talking about escaping output for SQL, just for display. Please still do escape data bound for an SQL string).

What does FILTER_SANITIZE_STRING do?

According to PHP Manual:

Strip tags, optionally strip or encode special characters.

According to W3Schools:

The FILTER_SANITIZE_STRING filter strips or encodes unwanted characters.

This filter removes data that is potentially harmful for your application. It is used to strip tags and remove or encode unwanted characters.

Now, that doesn't tell us much. Let's go see some PHP sources.

ext/filter/filter.c:

static const filter_list_entry filter_list[] = {                                       
/*...*/
{ "string", FILTER_SANITIZE_STRING, php_filter_string },
{ "stripped", FILTER_SANITIZE_STRING, php_filter_string },
{ "encoded", FILTER_SANITIZE_ENCODED, php_filter_encoded },
/*...*/

Now, let's go see how php_filter_string is defined.

ext/filter/sanitizing_filters.c:

/* {{{ php_filter_string */
void php_filter_string(PHP_INPUT_FILTER_PARAM_DECL)
{
size_t new_len;
unsigned char enc[256] = {0};

/* strip high/strip low ( see flags )*/
php_filter_strip(value, flags);

if (!(flags & FILTER_FLAG_NO_ENCODE_QUOTES)) {
enc['\''] = enc['"'] = 1;
}
if (flags & FILTER_FLAG_ENCODE_AMP) {
enc['&'] = 1;
}
if (flags & FILTER_FLAG_ENCODE_LOW) {
memset(enc, 1, 32);
}
if (flags & FILTER_FLAG_ENCODE_HIGH) {
memset(enc + 127, 1, sizeof(enc) - 127);
}

php_filter_encode_html(value, enc);

/* strip tags, implicitly also removes \0 chars */
new_len = php_strip_tags_ex(Z_STRVAL_P(value), Z_STRLEN_P(value), NULL, NULL, 0, 1);
Z_STRLEN_P(value) = new_len;

if (new_len == 0) {
zval_dtor(value);
if (flags & FILTER_FLAG_EMPTY_STRING_NULL) {
ZVAL_NULL(value);
} else {
ZVAL_EMPTY_STRING(value);
}
return;
}
}

I'll skip commenting flags since they're already explained on the Internet, like you said, and focus on what is always performed instead, which is not so well documented.

First - php_filter_strip. It doesn't do much, just takes the flags you pass to the function and processes them accordingly. It does the well-documented stuff.

Then we construct some kind of map and call php_filter_encode_html. It's more interesting: it converts stuff like ", ', & and chars with their ASCII codes lower than 32 and higher than 127 to HTML entities, so & in your string becomes &. Again, it uses flags for this.

Then we get call to php_strip_tags_ex, which just strips HTML, XML and PHP tags (according to its definition in /ext/standard/string.c) and removes NULL bytes, like the comment says.

The code that follows it is used for internal string management and doesn't really do any sanitization. Well, not exactly - passing undocumented flag FILTER_FLAG_EMPTY_STRING_NULL will return NULL if the sanitized string is empty, instead of returning just an empty string, but it's not really that much useful. An example:

var_dump(filter_var("yo", FILTER_SANITIZE_STRING, FILTER_FLAG_EMPTY_STRING_NULL));
var_dump(filter_var("\0", FILTER_SANITIZE_STRING, FILTER_FLAG_EMPTY_STRING_NULL));
var_dump(filter_var("yo", FILTER_SANITIZE_STRING));
var_dump(filter_var("\0", FILTER_SANITIZE_STRING));

string(2) "yo"
NULL
string(2) "yo"
string(0) ""

There isn't much more going on, so the manual was fairly correct - to sum it up:

  • Always: strip HTML, XML and PHP tags, strip NULL bytes.
  • FILTER_FLAG_NO_ENCODE_QUOTES - This flag does not encode quotes.
  • FILTER_FLAG_STRIP_LOW - Strip characters with ASCII value below 32.
  • FILTER_FLAG_STRIP_HIGH - Strip characters with ASCII value above 127.
  • FILTER_FLAG_ENCODE_LOW - Encode characters with ASCII value below 32.
  • FILTER_FLAG_ENCODE_HIGH - Encode characters with ASCII value above 127.
  • FILTER_FLAG_ENCODE_AMP - Encode the & character to & (not &).
  • FILTER_FLAG_EMPTY_STRING_NULL - Return NULL instead of empty strings.

Is there a way to pass more than one filter flag options in a given PHP sanitize or validate filter?

You need to take the bitwise OR (| operator) of the flag values and pass that:

filter_var($data, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_LOW | FILTER_FLAG_STRIP_HIGH);

From the manual:

options

Associative array of options or bitwise disjunction of flags

"bitwise disjunction" is another way of saying bitwise OR.

What is difference between php strip_tags and filter_var function?

strip_tags() does just that. According to PHP documentation it:

strips HTML and PHP tags from a string

filter_var() gives you a bit more to work with as you can use different filters with it i.e. FILTER_SANITIZE_EMAIL will sanitize the string to return a valid email.

In terms difference between strip_tags and filter_var with FILTER_SANITIZE_STRIPPED specifically strip_tags will allow less than symbol and filter_var with FILTER_SANITIZE_STRIPPED will remove it.

I.e.:

strip_tags("testing < practice") will return "testing < practice"
filter_var("testing < practice", FILTER_SANITIZE_STRIPPED) will return "testing "


Related Topics



Leave a reply



Submit