How to Prevent JavaScript Injection Attacks Within User-Generated Html

How to prevent Javascript injection attacks within user-generated HTML

You think that's it? Check this out.

Whatever approach you take, you definitely need to use a whitelist. It's the only way to even come close to being safe about what you're allowing on your site.

EDIT:

I'm not familiar with .NET, unfortunately, but you can check out stackoverflow's own battle with XSS (https://blog.stackoverflow.com/2008/06/safe-html-and-xss/) and the code that was written to parse HTML posted on this site: Archive.org link - obviously you might need to change this because your whitelist is bigger, but that should get you started.

Preventing HTML and Script injections in Javascript

You can encode the < and > to their HTML equivelant.

html = html.replace(/</g, "<").replace(/>/g, ">");

How to display HTML tags as plain text

How to inject HTML code safely (avoid HTML injection)

You should sanitize the HTML code entered by your users.

One way to do it is by defining a whitelist: you define a list of tags (and for each one of them you can also define a list of allowed attributes) which are allowed to be in the output HTML. Every other tag which is not explicitly allowed will be removed by the sanitization.

There are plenty of whitelist already available out there. You can use the one for the TinyMCE HTML editor for example (see here):

<?xml version="1.0" encoding="UTF-8" ?>

<!--
TinyMCE
-->

<anti-samy-rules xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="antisamy.xsd">

<directives>
<directive name="omitXmlDeclaration" value="true" />
<directive name="omitDoctypeDeclaration" value="false" />
<directive name="maxInputSize" value="100000" />
<directive name="embedStyleSheets" value="false" />
<directive name="useXHTML" value="true" />
<directive name="formatOutput" value="true" />
</directives>

<common-regexps>

<!--
From W3C:
This attribute assigns a class name or set of class names to an
element. Any number of elements may be assigned the same class
name or names. Multiple class names must be separated by white
space characters.
-->
<regexp name="htmlTitle" value="[a-zA-Z0-9\s\-_',:\[\]!\./\\\(\)&]*" />

<!-- force non-empty with a '+' at the end instead of '*'
-->
<regexp name="onsiteURL" value="([\p{L}\p{N}\p{Zs}/\.\?=&\-~])+" />

<!-- ([\w\\/\.\?=&;\#-~]+|\#(\w)+)
-->

<!-- ([\p{L}/ 0-9&\#-.?=])*
-->
<regexp name="offsiteURL" value="(\s)*((ht|f)tp(s?)://|mailto:)[A-Za-z0-9]+[~a-zA-Z0-9-_\.@\#\$%&;:,\?=/\+!\(\)]*(\s)*" />
</common-regexps>

<!--
Tag.name = a, b, div, body, etc.
Tag.action = filter: remove tags, but keep content, validate: keep content as long as it passes rules, remove: remove tag and contents
Attribute.name = id, class, href, align, width, etc.
Attribute.onInvalid = what to do when the attribute is invalid, e.g., remove the tag (removeTag), remove the attribute (removeAttribute), filter the tag (filterTag)
Attribute.description = What rules in English you want to tell the users they can have for this attribute. Include helpful things so they'll be able to tune their HTML
-->

<!--
Some attributes are common to all (or most) HTML tags. There aren't many that qualify for this. You have to make sure there's no
collisions between any of these attribute names with attribute names of other tags that are for different purposes.
-->

<common-attributes>

<attribute name="lang"
description="The 'lang' attribute tells the browser what language the element's attribute values and content are written in">

<regexp-list>
<regexp value="[a-zA-Z]{2,20}" />
</regexp-list>
</attribute>

<attribute name="title"
description="The 'title' attribute provides text that shows up in a 'tooltip' when a user hovers their mouse over the element">

<regexp-list>
<regexp name="htmlTitle" />
</regexp-list>
</attribute>

<attribute name="href" onInvalid="filterTag">

<regexp-list>
<regexp name="onsiteURL" />
<regexp name="offsiteURL" />

<!--
-->
</regexp-list>
</attribute>

<attribute name="align"
description="The 'align' attribute of an HTML element is a direction word, like 'left', 'right' or 'center'">

<literal-list>
<literal value="center" />
<literal value="left" />
<literal value="right" />
<literal value="justify" />
<literal value="char" />
</literal-list>
</attribute>
<attribute name="style"
description="The 'style' attribute provides the ability for users to change many attributes of the tag's contents using a strict syntax" />
</common-attributes>

<!--
This requires normal updates as browsers continue to diverge from the W3C and each other. As long as the browser wars continue
this is going to continue. I'm not sure war is the right word for what's going on. Doesn't somebody have to win a war after
a while?


-->

<global-tag-attributes>
<attribute name="title" />
<attribute name="lang" />
<attribute name="style" />
</global-tag-attributes>

<tags-to-encode>
<tag>g</tag>
<tag>grin</tag>
</tags-to-encode>

<tag-rules>

<!-- Remove -->

<tag name="script" action="remove" />
<tag name="noscript" action="remove" />
<tag name="iframe" action="remove" />
<tag name="frameset" action="remove" />
<tag name="frame" action="remove" />
<tag name="noframes" action="remove" />
<tag name="head" action="remove" />
<tag name="title" action="remove" />
<tag name="base" action="remove" />
<tag name="style" action="remove" />
<tag name="link" action="remove" />
<tag name="input" action="remove" />
<tag name="textarea" action="remove" />

<!-- Truncate -->
<tag name="br" action="truncate" />

<!-- Validate -->

<tag name="p" action="validate">
<attribute name="align" />
</tag>
<tag name="div" action="validate" />
<tag name="span" action="validate" />
<tag name="i" action="validate" />
<tag name="b" action="validate" />
<tag name="strong" action="validate" />
<tag name="s" action="validate" />
<tag name="strike" action="validate" />
<tag name="u" action="validate" />
<tag name="em" action="validate" />
<tag name="blockquote" action="validate" />
<tag name="tt" action="truncate" />

<tag name="a" action="validate">
<attribute name="href" onInvalid="filterTag" />

<attribute name="nohref">

<literal-list>
<literal value="nohref" />
<literal value="" />
</literal-list>
</attribute>

<attribute name="rel">

<literal-list>
<literal value="nofollow" />
</literal-list>
</attribute>
</tag>

<!-- List tags
-->
<tag name="ul" action="validate" />
<tag name="ol" action="validate" />
<tag name="li" action="validate" />
<tag name="dl" action="validate" />
<tag name="dt" action="validate" />
<tag name="dd" action="validate" />
</tag-rules>

<css-rules>

<property name="text-decoration" default="none"
description="">

<category-list>
<category value="visual" />
</category-list>

<literal-list>
<literal value="underline" />
<literal value="overline" />
<literal value="line-through" />
</literal-list>
</property>
</css-rules>
</anti-samy-rules>

This policy is designed to sanitize the HTML entered in an HTML editor, so it may fit your needs.
You can also find more policies here.

If you use Java, you can implement that policy through the java-html-sanitizer, or you can also define a custom one.

How to prevent JavaScript Injection Attacks

You can obfuscate or hash variable names and/or values. However,

Don't use JavaScript, do every logic in the server-side instead.

Prevent user from injecting javascript/markup into element

You need to filter your input to take out unwanted tags. There really isn"t much else to it.

How can I secure my website from injections (Cookie editing)

There are several flags you can set for cookie to mitigate the security risk.

  1. The secure flag. It mandates that cookie should only be sent through https. This is important, even when your site is using https. Why? Let's say a bank site forces end user to use https. The user connects to the bank site through https, login successfully. The bank site sends the cookie back to the user browser through https. Assuming a hacker can sniff the network and see all the cleartext communcation between the end user and the bank site. Obviously for now, the hacker cannot see anything, as the cookie is transmitted through secure channel. Here is what the hacker can do if the HttpOnly flag is not set

    a) The hacker creates a site www.maliciouscode.com.

    b) He sends an email to the end user with the link, luring the end user to the site.

    c) The user takes the bait, connecting to http://www.maliciouscode.com site through another browser tab.

    d) The malicious site redirects the user browser to the bank site through http.

    e) The user browser sends the cookie through HTTP to the bank site.

    f) The hacker intercepts the cookie since it is sent as cleartext, places it in his own browser and login as the user

  2. The HttpOnly flag. The name is misleading. It is not the opposite of the secure flag. It means the cookie should only be used by browser through http (and https) protocol only, but not used by other means such as JavaScript. If a browser that supports HttpOnly detects a cookie containing the HttpOnly flag, and client side script code attempts to read the cookie, the browser returns an empty string as the result. This is to prevent the XSS attack.

Best way to handle security and avoid XSS with user entered URLs

If you think URLs can't contain code, think again!

https://owasp.org/www-community/xss-filter-evasion-cheatsheet

Read that, and weep.

Here's how we do it on Stack Overflow:

/// <summary>
/// returns "safe" URL, stripping anything outside normal charsets for URL
/// </summary>
public static string SanitizeUrl(string url)
{
return Regex.Replace(url, @"[^-A-Za-z0-9+&@#/%?=~_|!:,.;\(\)]", "");
}

Prevent XSS attacks site-wide

I hate to break it out to you, but -

  1. XSS is an Output problem, not an Input problem. Filtering/Validating input is an additional layer of defence, but it can never protect you completely from XSS. Take a look at XSS cheatsheet by RSnake - there's just too many ways to escape a filter.
  2. There is no easy way to fix a legacy application. You have to properly encode anything that you put in your html or javascript files, and that does mean revisiting every piece of code that generates html.

See OWASP's XSS prevention cheat sheet for information on how to prevent XSS.



Some comments below suggest that input validation is a better strategy rather than encoding/escaping at the time of output. I'll just quote from OWASP's XSS prevention cheat sheet -

Traditionally, input validation has been the preferred approach for handling untrusted data. However, input validation is not a great solution for injection attacks. First, input validation is typically done when the data is received, before the destination is known. That means that we don't know which characters might be significant in the target interpreter. Second, and possibly even more importantly, applications must allow potentially harmful characters in. For example, should poor Mr. O'Malley be prevented from registering in the database simply because SQL considers ' a special character?

To elaborate - when the user enters a string like O'Malley, you don't know whether you need that string in javascript, or in html or in some other language. If its in javascript, you have to render it as O\x27Malley, and if its in HTML, it should look like O'Malley. Which is why it is recommended that in your database the string should be stored exactly the way the user entered, and then you escape it appropriately according to the final destination of the string.



Related Topics



Leave a reply



Submit