Strip_Tags and HTMLentities

PHP Security (strip_tags, htmlentities)

I applaud your efforts.
You must, friendly community member, consider decoupling your operations.

1) Have one function/routine/class/method for filtering input (filter_input_array(), strip_tags(), str_ireplace(), trim(), etc ...). You may want to create functions that use loops to do filtering. Tricks such as double encoding, one-time-strip-spoofing, and more can defeat single usage of things like strip_tags().

Here is a strip_tags() wrapper method from my Sanitizer class.
Notice how it compares the old value to the new value to see if they are equal. If they are not equal, it keeps on using strip_tags(). Although, there is quite of bit of preliminary INPUT_POST / $_POST checking done before this method is executed. Another version of this using trim() is actually executed before this one.

private function removeHtml(&$value)
{
if (is_scalar($value)) {
do {
$old = $value;
$value = strip_tags($value);

if ($value === $old) {
break;
}
} while(1);
} else if (is_array($value) && !empty($value)) {
foreach ($value as $field => &$string) {
do {
$old = $string;
$string = strip_tags($string);

if ($string === $old) {
break;
}
} while (1);
}
} else {
throw new Exception('The data being HTML sanitized is neither scalar nor in an array.');
}

return;
}

2) Have another one for validating input (filter_var_array(), preg_match(), mb_strlen, etc...)

Then, when your data needs to switch contexts ...

A) For databases, use prepared statements (PDO, preferably).

B) For returning / transmitting user input to the browser, escape the output with htmlentities() or htmlspecialchars accordingly.

In terms of magic quotes, the best thing to do is just disable that in the php.ini.

Now, with those various constructs having their own areas of responsibility, all you have to do is manage the flow of logic and data inside of your handler file. This includes providing error messages to the user (when necessary) and handling errors/exceptions.

There is no need to use htmlentities() or htmlspecialchars immediately if the data is going from the HTML form directly into the database. The point of escaping data is to prevent it from being interpreted as executable instructions inside a new context. There is no danger htmlentities() or htmlspecialchars can resolve when passing data to a SQL query engine (that is why you filter and validate the input, and use (PDO) prepared statements).

However, after the data is retrieved from database tables and is directly destined for the browser, ok, now use htmlentities() or htmlspecialchars. Create a function that uses a for or foreach loop to handle that scenario.

Here is a snippet from my Escaper class

public function superHtmlSpecialChars($html)
{
return htmlspecialchars($html, ENT_QUOTES | ENT_HTML5, 'UTF-8', false);
}

public function superHtmlEntities(&$html)
{
$html = htmlentities($html, ENT_QUOTES | ENT_HTML5, 'UTF-8', false);
}

public function htmlSpecialCharsArray(array &$html)
{
foreach ($html as &$value) {
$value = $this->superHtmlSpecialChars($value);
}

unset($value);
}

public function htmlEntitiesArray(array &$html)
{
foreach ($html as &$value) {
$this->superHtmlEntities($value);
}

unset($value);
}

You'll have to tailor your code to your own personal tastes and situation.

Note, if you plan on processing the data before sending it to the browser, do the processing first, then escape with your handy-dandy htmlentities() or htmlspecialchars looping function.

You can do it!

Strip-tags vs htmlentities vs other. Which has the better security in php?

You have it confused. strip_tags() should only be used when you want to remove HTML tags from a block of text. It shouldn't be applied on a user-input password. For example, if your user had a password hello<there>, it would strip away <there> and the password then would just be hello. This could cause a lot of problems.

Similarly, htmlentities aren't helpful against SQL injection either. This function just converts the HTML tags and characters to their entity formats, thus preventing it from causing issues while being displayed on a webpage.

To be safe against SQL injections, you don't need htmlentities or strip_tags. What you need is prepared statements. Properly prepare your query with placeholders. Also, note that mysql_* functions are now deprecated; you should use mysqli_* or PDO.

Off the top of my head, untested:

/* Create a new mysqli object with database connection parameters */
$mysqli = new mysqli('localhost', 'username', 'password', 'db');

if(mysqli_connect_errno()) {
echo "Connection Failed: " . mysqli_connect_errno();
exit();
}

/* Create a prepared statement */
$stmt = $mysqli->prepare('SELECT first_name, last_name, bio FROM users WHERE username=?');

/* Bind parameters
s - string, b - blob, i - int, etc */
$stmt->bind_param('s', $user);

/* Execute it */
$stmt->execute();

/* Get a result set from the prepared statement */
$result = $stmt->get_result();

/* Fetch result rows as an associative array */
while ($row = $result->fetch_assoc()) {
echo '<pre>'.print_r($row, 1).'</pre>';
// do something with $row
}

For more information, refer:

  • Cross Site Scripting - OWASP
  • Prepared Statements - PHP Manual

html_entities Vs strip_tags with mysql_real_escape_string

You have to clear purpose at first. There's no all-around escapeing.

htmlspecialchars or htmlentities

Purpose

Display user inputs as HTML.

Example

<div><?php echo htmlspecialchars($_POST['data'], ENT_QUOTES, 'UTF-8') ?></div>

Note

Use this function when you just about to display it. Do not apply it to store into variables beforehand.

Has this variable been already escaped...?

... You may trouble yourself.

strip_tags

Do not use this function. SANITIZE is a wrong way.

mysql_real_escape_string

Purpose

Set user inputs in SQL.

Example

<?php
$sql = sprintf(
"SELECT * FROM table WHERE name = '%s' AND address = '%s'",
mysql_real_escape_string($_POST['name'], $link),
mysql_real_escape_string($_POST['address'], $link)
);

Note

mysql_* functions are all deprecated. You'd better use PDO. PDO provides us Prepared Statements and Placeholders, instead of escaping.

PHP Manual - PDO::prepare

When to use Strip_tags and when to use htmlspecialchars()

htmlspecialchars and htmlentities are for displaying text in web pages. It will translate the characters that have special meaning in HTML, such as the < and > characters that surround tags, into their entity codes. For instance, if the string contains

Use <table> to create a table on a web page.

it will be converted to

Use <table> to create a table on a web page.

When you display the string on a web page, you'll then see the intended message correctly.

strip_tags completely removes all the HTML tags. So the above string would be converted to:

Use  to create a table on a web page.

If you display this, it doesn't make much sense. This is often used to sanitize input that isn't really meant for display, and shouldn't contain anything that looks like an HTML tag in the first place, such as usernames. Although it would probably be better to just validate it against whatever rules you have for those values (e.g. usernames should just be alphanumeric characters).

In my opinion, strip_tags() is almost always the wrong tool. It's a simple crutch to prevent XSS attacks, since code without any HTML tags can't introduce scripts. But it's a broad brush that doesn't usually match the specific needs.

And it's generally wrong to do these conversions when processing input. Do them when you're using the data, performing whatever escaping is necessary at that time. So you use mysqli_real_escape_string() if you're substituting the variable into a query (but you really should use prepared statements instead of this), htmlentities() when you're displaying it on a web page, urlencode() when you're putting it into a URL query string, etc.

strip_tags and html_entity_decode combination doesn't work as expected

If you look closer you will see that you have

&lt; and it's not <.

So, you first htmlspecialchars_decode just converts

&lt; to <

which is surely a special entity, which can later be decoded with second run of htmlspecialchars_decode (or html_entity_decode) to <, but not removed with strip_tags.

PHP: htmlentities/strip_tags

You only need to apply htmlentities() to the raw content. So you can apply htmlentities() to the raw content (the article text) and then invoke a function to add syntax highlighting after that. So long as you check that your syntax highlighting code cannot introduce unexpected nasties, you don't need to call htmlentities() again.

And if you're saying that you use the a element to highlight code, I strongly suggest you use the code element instead, which is designed to provide markup for lines or blocks of programming code. The a element should only be used as an anchor for a hyperlink.

For instance, you could use

<code class="highlighted-code">/* line of code here /*</code>

Then you could use a cascading style sheet to provide background colour for any element of type code with class equal to "highlighted-code", for instance:

code.highlighted-code {background-color: yellow}

strip_tags + html entities to get only numbers

You need to remove also the special pieces of text used to define entities, so you need at least another pass:

$total_price_paid = strip_tags($total_price_paid);
$total_price_paid = preg_replace("/&#?[a-z0-9]{2,8};/i", "", $total_price_paid);

Code snippet is available here.



Related Topics



Leave a reply



Submit