Sanitize/Rewrite HTML on the Client Side

How safe is client-side HTML Sanitization?

I believe there is a little misunderstanding about the purpose and nature of such "sanitizers".

The purpose of a sanitizer (e.g. Angular's ngSanitize) is not to prevent "bad" data from being sent to the server-side. It is rather the other way around: A sanitizer is there to protect the non-malicious user from malicious data (being either a result of a security hole on the server side (yeah, no setup is perfect) or being fetched from other sources (ones that you are not in control of)).

Of course, being a client-side feature, a sanitizer could be bypassed, but (since the sanitizer is there to protect the user (not the server)) bypassing it would only leave the bypasser unprotected (which you can't do anything about, nor shouldn't you care - it's their choice).

Furthermore, sanitizers (can) have another (potentially more important) role: A sanitizer is a tool that helps the developer to better organize their code in a way that it is more easily testable for certain kinds of vulnerabilities (e.g. XSS attacks) and even helps in the actual code auditing for such kind of security holes.

In my opinion, the Angular docs summarize the concept pretty neatly:

Strict Contextual Escaping (SCE) is a mode in which AngularJS requires bindings in certain contexts to result in a value that is marked as safe to use for that context.

[...]

SCE assists in writing code in way that (a) is secure by default and (b) makes auditing for security vulnerabilities such as XSS, clickjacking, etc. a lot easier.

[...]

In a more realistic example, one may be rendering user comments, blog articles, etc. via bindings. (HTML is just one example of a context where rendering user controlled input creates security vulnerabilities.)

For the case of HTML, you might use a library, either on the client side, or on the server side, to sanitize unsafe HTML before binding to the value and rendering it in the document.

How would you ensure that every place that used these types of bindings was bound to a value that was sanitized by your library (or returned as safe for rendering by your server?) How can you ensure that you didn't accidentally delete the line that sanitized the value, or renamed some properties/fields and forgot to update the binding to the sanitized value?

To be secure by default, you want to ensure that any such bindings are disallowed unless you can determine that something explicitly says it's safe to use a value for binding in that context. You can then audit your code (a simple grep would do) to ensure that this is only done for those values that you can easily tell are safe - because they were received from your server, sanitized by your library, etc. You can organize your codebase to help with this - perhaps allowing only the files in a specific directory to do this. Ensuring that the internal API exposed by that code doesn't markup arbitrary values as safe then becomes a more manageable task.


Note 1: Emphasis is mine.

Note 2: Sorry for the lengthy quote, but I consider this to be a very improtant (as much as sensitive) matter and one that is too often misunderstood.

aurelia: properly sanitizing innerHTML-bound data

The sanatizeHTML value converter is a very simple sanitizer, and only remove the scripts tags. See the code here.

You can create your own value converter with a more complex santizer. Check this answer for more details about how to sanitize html in a browser.

But don't forget to never trust the browser, if you can it's better to sanitize the html in the server side before to send it to the browser to display it.

Sanitizing html string with javascript using browser to interpret html

No script embedded in the HTML can execute until it is put in the document. Try running this code on any page:

var html = "<script>document.body.innerHTML = '';</script>";
var div = document.createElement('div');
div.innerHTML = html;

You will notice nothing change. If the "malicious" script in the HTML was run, then the document should have vanished. So, you can use the DOM to sanitize HTML without worrying about bad JS being in the HTML. As long as you snip out the script in your sanitizer of course.


By the way, your approach is pretty safe and smarter than what most people try (parse it with regex, the poor fools). However, it's best to rely on good, trusted HTML sanitizing libraries for this, like HTML Purifier. Or, if you want to do it client-side, you can use ESAPI-JS (recommended by @Brett Zamir)



Related Topics



Leave a reply



Submit