Should I use htmlentities() on all output? (preventing XSS attacks)
There are two benefits to using htmlentities()
:
- XSS prevention
- Converting special characters to proper HTML entities, for example it converts the copyright character to
©
. In HTML content you should use the appropriate HTML entity instead of inserting a raw special character.
For XSS prevention, you could use htmlspecialchars()
instead, but it will only convert some basic characters to HTML entities, namely quotes, ampersand and the less than/greater than characters.
In answer to your question, you should use htmlentities()
when outputting any content that could contain user input or special characters.
Is htmlentities() bullet proof?
You will need to explicitly specify proper encoding (e.g: utf-8), Chris had a post on how to inject code even calling htmlentities without appropriate encoding.
http://shiflett.org/blog/2005/dec/google-xss-example
xss attack - regex or htmlspecialchars
PDO does a very effective job of protecting your queries from XSS attacks. No need to worry about whether or not you remembered to protect your queries, because it is automatic. Several other frameworks support this feature as well.
If I'm not using PDO because of a client requirement or the like, I will at the very least build into my connection class an automatic htmlspecialchars function so that I never forget to do it (though this is my least favorite option)
As a UI guy, I always attack my security issues starting on the --front-- end first. Proper and well-designed front-end validation can stop unintentional issues from even getting to the query in the first place, and they're the most effective UI pattern for reporting problems to the user. Blocking elements such as <
or ;
makes sense in most fields, because they just don't fit. You can't rely on the front end solely, though, because a person could bypass it by turning off javascript. But, it's a good first step and a great way to limit improper queries on heavily traffic-ed sites. My validation of choice for quick and effective front-end validation of fields is here.
When used correctly, is htmlspecialchars sufficient for protection against all XSS?
htmlspecialchars()
is enough to prevent document-creation-time HTML injection with the limitations you state (ie no injection into tag content/unquoted attribute).
However there are other kinds of injection that can lead to XSS and:
There are no <script> tags in the document.
this condition doesn't cover all cases of JS injection. You might for example have an event handler attribute (requires JS-escaping inside HTML-escaping):
<div onmouseover="alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!
or, even worse, a javascript: link (requires JS-escaping inside URL-escaping inside HTML-escaping):
<a href="javascript:alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!
It is usually best to avoid these constructs anyway, but especially when templating. Writing <?php echo htmlspecialchars(urlencode(json_encode($something))) ?>
is quite tedious.
And... injection issues can happen on the client-side as well (DOM XSS); htmlspecialchars()
won't protect you against a piece of JavaScript writing to innerHTML
(commonly .html()
in poor jQuery scripts) without explicit escaping.
And... XSS has a wider range of causes than just injections. Other common causes are:
allowing the user to create links, without checking for known-good URL schemes (
javascript:
is the most well-known harmful scheme but there are more)deliberately allowing the user to create markup, either directly or through light-markup schemes (like bbcode which is invariably exploitable)
allowing the user to upload files (which can through various means be reinterpreted as HTML or XML)
XSS attack still works despite htmlspecialchars() doing its work
You can't sanitize all type of XSS with htmlspecialchars
.htmlspecialchars
may help you to protect against XSS inside HTML tags or some quoted HTML attributes.
You have to sanitize the different type of XSS with their own sanitization method.
- User input placed inside HTML:
<p><?php echo $user_entered_variable; ?></p>
Attack vector:<script>alert(1)</script>
This type of XSS can be sanitized using htmlspecialchars
function because attacker need to use <
and >
to create new HTML tag.
Solution:
<p><?php echo htmlspecialchars($user_entered_variable); ?></p>
- User input placed inside single quoted attribute:
<img title='<?php echo htmlspecialchars($user_entered_variable);?>'/>
Attack vector:' onload='alert(1)' '
htmlspecialchars
will not encode single quote '
by default. You must turn it on using ENT_QUOTES
option.
Solution:
<img title='<?php echo htmlspecialchars($user_entered_variable,ENT_QUOTES);?>'/>
- User input placed inside URL attributes:
src
,href
,formaction
,...
<iframe src="<?php echo htmlspecialchars($user_entered_variable); ?>"></iframe>
<img src="<?php echo htmlspecialchars($user_entered_variable); ?>">
<a href="<?php echo htmlspecialchars($user_entered_variable); ?>">Link</a>
<script>function openLink(link){window.open(link);}</script>
<button onclick="openLink('<?php echo htmlspecialchars($user_entered_variable); ?>')">JavaScript Window XSS</button>
Attack vector: javascript:alert(1)
, javscript://alert(1)
htmlspecialchars Document
This function will not prevent those vectors because they haven't any HTML special character.
To prevent such attacks, you need to validate input as a URL.
Solution:
<?php
$user_entered_variable = htmlspecialchars($user_entered_variable);
$isValidURL = filter_var($user_entered_variable, FILTER_VALIDATE_URL) !== false;
if(!$isValidURL)
$user_entered_variable = 'invalid://invalid';
?>
<iframe src="<?php echo $user_entered_variable; ?>"></iframe>
<img src="<?php echo $user_entered_variable; ?>">
<a href="<?php echo $user_entered_variable; ?>">Link</a>
<script>function openLink(link){window.open(link);}</script>
<button onclick="openLink('<?php echo $user_entered_variable; ?>')">JavaScript Window XSS</button>
- User input placed inside JavaScript tag without any quote
<script>
var inputNumber = <?php echo $user_entered_variable; ?>
</script>
Attack vector: 1;alert(1)
in some cases, we can easily quote input and prevent attack by sanitizing it using htmlspecialchars
but if we need input to be integer we can prevent XSS by using input validation.
Solution:
<script>
var inputNumber = <?php echo intval($user_entered_variable); ?>
</script>
Always quote variables when it placed inside a HTML attribute and do a proper sanitization.
Is htmlspecalchars in html value attributes enough to prevent xss?
Assuming you're using a modern version of php, htmlspecialchars
should do the trick.
It's important to note that you also must provide the same encoding (utf8
) for the whole page via headers and meta tags. Otherwise, you're subject to UTF-7 injection.
Also do note, that htmlspecialchars
is fine only for attributes like value
, that don't interpret javascript. It's not enough for src
and friends.
htmlentities() vs. htmlspecialchars()
From the PHP documentation for htmlentities:
This function is identical to
htmlspecialchars()
in all ways, except withhtmlentities()
, all characters which have HTML character entity equivalents are translated into these entities.
From the PHP documentation for htmlspecialchars:
Certain characters have special significance in HTML, and should be represented by HTML entities if they are to preserve their meanings. This function returns a string with some of these conversions made; the translations made are those most useful for everyday web programming. If you require all HTML character entities to be translated, use
htmlentities()
instead.
The difference is what gets encoded. The choices are everything (entities) or "special" characters, like ampersand, double and single quotes, less than, and greater than (specialchars).
I prefer to use htmlspecialchars
whenever possible.
For example:
echo htmlentities('<Il était une fois un être>.');
// Output: <Il était une fois un être>.
// ^^^^^^^^ ^^^^^^^
echo htmlspecialchars('<Il était une fois un être>.');
// Output: <Il était une fois un être>.
// ^ ^
Related Topics
Reconnection of Client When Server Reboots in Websocket
How to Password-Protect PHP Page
Display Output in Parts in PHP
PHP Integer and Float Comparison Mismatch
How to Avoid Code Repetition with PHP SQL Prepared Statements
How to Bind SQL Variables in PHP
Why PHP Iteration by Reference Returns a Duplicate Last Record
Sum Specific Values in a Multidimensional Array (Php)
How to Dynamically Build Queries with Pdo
Display Custom Order Meta Data Value in Email Notifications Woocommerce
Laravel Checking If a Record Exists
Use Openssl_Encrypt to Replace Mcrypt for 3Des-Ecb Encryption
How to Check If a Directory Exists? "Is_Dir", "File_Exists" or Both
How to Dynamically Populate Options on Dropdown Lists Based on Selection in Another Dropdown