Htmlspecialchars VS HTMLentities When Concerned with Xss

Should I use htmlentities() on all output? (preventing XSS attacks)

There are two benefits to using htmlentities():

  • XSS prevention
  • Converting special characters to proper HTML entities, for example it converts the copyright character to ©. In HTML content you should use the appropriate HTML entity instead of inserting a raw special character.

For XSS prevention, you could use htmlspecialchars() instead, but it will only convert some basic characters to HTML entities, namely quotes, ampersand and the less than/greater than characters.

In answer to your question, you should use htmlentities() when outputting any content that could contain user input or special characters.

Is htmlentities() bullet proof?

You will need to explicitly specify proper encoding (e.g: utf-8), Chris had a post on how to inject code even calling htmlentities without appropriate encoding.

http://shiflett.org/blog/2005/dec/google-xss-example

xss attack - regex or htmlspecialchars

PDO does a very effective job of protecting your queries from XSS attacks. No need to worry about whether or not you remembered to protect your queries, because it is automatic. Several other frameworks support this feature as well.

If I'm not using PDO because of a client requirement or the like, I will at the very least build into my connection class an automatic htmlspecialchars function so that I never forget to do it (though this is my least favorite option)

As a UI guy, I always attack my security issues starting on the --front-- end first. Proper and well-designed front-end validation can stop unintentional issues from even getting to the query in the first place, and they're the most effective UI pattern for reporting problems to the user. Blocking elements such as < or ; makes sense in most fields, because they just don't fit. You can't rely on the front end solely, though, because a person could bypass it by turning off javascript. But, it's a good first step and a great way to limit improper queries on heavily traffic-ed sites. My validation of choice for quick and effective front-end validation of fields is here.

When used correctly, is htmlspecialchars sufficient for protection against all XSS?

htmlspecialchars() is enough to prevent document-creation-time HTML injection with the limitations you state (ie no injection into tag content/unquoted attribute).

However there are other kinds of injection that can lead to XSS and:

There are no <script> tags in the document.

this condition doesn't cover all cases of JS injection. You might for example have an event handler attribute (requires JS-escaping inside HTML-escaping):

<div onmouseover="alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!

or, even worse, a javascript: link (requires JS-escaping inside URL-escaping inside HTML-escaping):

<a href="javascript:alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!

It is usually best to avoid these constructs anyway, but especially when templating. Writing <?php echo htmlspecialchars(urlencode(json_encode($something))) ?> is quite tedious.

And... injection issues can happen on the client-side as well (DOM XSS); htmlspecialchars() won't protect you against a piece of JavaScript writing to innerHTML (commonly .html() in poor jQuery scripts) without explicit escaping.

And... XSS has a wider range of causes than just injections. Other common causes are:

  • allowing the user to create links, without checking for known-good URL schemes (javascript: is the most well-known harmful scheme but there are more)

  • deliberately allowing the user to create markup, either directly or through light-markup schemes (like bbcode which is invariably exploitable)

  • allowing the user to upload files (which can through various means be reinterpreted as HTML or XML)

XSS attack still works despite htmlspecialchars() doing its work

You can't sanitize all type of XSS with htmlspecialchars.
htmlspecialchars may help you to protect against XSS inside HTML tags or some quoted HTML attributes.

You have to sanitize the different type of XSS with their own sanitization method.



  1. User input placed inside HTML:
<p><?php echo $user_entered_variable; ?></p>

Attack vector:
<script>alert(1)</script>

This type of XSS can be sanitized using htmlspecialchars function because attacker need to use < and > to create new HTML tag.

Solution:

<p><?php echo htmlspecialchars($user_entered_variable); ?></p>



  1. User input placed inside single quoted attribute:
<img title='<?php echo htmlspecialchars($user_entered_variable);?>'/>

Attack vector:
' onload='alert(1)' '

htmlspecialchars will not encode single quote ' by default. You must turn it on using ENT_QUOTES option.

Solution:

    <img title='<?php echo htmlspecialchars($user_entered_variable,ENT_QUOTES);?>'/>



  1. User input placed inside URL attributes:
    src, href, formaction, ...
    <iframe src="<?php echo htmlspecialchars($user_entered_variable); ?>"></iframe>
<img src="<?php echo htmlspecialchars($user_entered_variable); ?>">
<a href="<?php echo htmlspecialchars($user_entered_variable); ?>">Link</a>

<script>function openLink(link){window.open(link);}</script>
<button onclick="openLink('<?php echo htmlspecialchars($user_entered_variable); ?>')">JavaScript Window XSS</button>

Attack vector: javascript:alert(1), javscript://alert(1)

htmlspecialchars Document

PHP Document

This function will not prevent those vectors because they haven't any HTML special character.
To prevent such attacks, you need to validate input as a URL.

Solution:

 <?php

$user_entered_variable = htmlspecialchars($user_entered_variable);
$isValidURL = filter_var($user_entered_variable, FILTER_VALIDATE_URL) !== false;
if(!$isValidURL)
$user_entered_variable = 'invalid://invalid';
?>
<iframe src="<?php echo $user_entered_variable; ?>"></iframe>
<img src="<?php echo $user_entered_variable; ?>">
<a href="<?php echo $user_entered_variable; ?>">Link</a>

<script>function openLink(link){window.open(link);}</script>
<button onclick="openLink('<?php echo $user_entered_variable; ?>')">JavaScript Window XSS</button>




  1. User input placed inside JavaScript tag without any quote
<script>
var inputNumber = <?php echo $user_entered_variable; ?>
</script>

Attack vector: 1;alert(1)

in some cases, we can easily quote input and prevent attack by sanitizing it using htmlspecialchars but if we need input to be integer we can prevent XSS by using input validation.

Solution:

<script>
var inputNumber = <?php echo intval($user_entered_variable); ?>
</script>

Always quote variables when it placed inside a HTML attribute and do a proper sanitization.

Is htmlspecalchars in html value attributes enough to prevent xss?

Assuming you're using a modern version of php, htmlspecialchars should do the trick.

It's important to note that you also must provide the same encoding (utf8) for the whole page via headers and meta tags. Otherwise, you're subject to UTF-7 injection.

Also do note, that htmlspecialchars is fine only for attributes like value, that don't interpret javascript. It's not enough for src and friends.

htmlentities() vs. htmlspecialchars()

From the PHP documentation for htmlentities:

This function is identical to htmlspecialchars() in all ways, except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.

From the PHP documentation for htmlspecialchars:

Certain characters have special significance in HTML, and should be represented by HTML entities if they are to preserve their meanings. This function returns a string with some of these conversions made; the translations made are those most useful for everyday web programming. If you require all HTML character entities to be translated, use htmlentities() instead.

The difference is what gets encoded. The choices are everything (entities) or "special" characters, like ampersand, double and single quotes, less than, and greater than (specialchars).

I prefer to use htmlspecialchars whenever possible.

For example:

    echo htmlentities('<Il était une fois un être>.');
// Output: <Il était une fois un être>.
// ^^^^^^^^ ^^^^^^^

echo htmlspecialchars('<Il était une fois un être>.');
// Output: <Il était une fois un être>.
// ^ ^


Related Topics



Leave a reply



Submit