Sanitizing HTML input value
There really are two questions that you're asking (or at least can be interpreted):
Can the quoted
value
attribute ofinput[type="text"]
be injected if quotes are disallowed?Can an arbitrary quoted attribute of an element be injected if quotes are disallowed.
The second is trivially demonstrated by the following:
<a href="javascript:alert(1234);">Foo</a>
Or
<div onmousemove="alert(123);">...
The first is a bit more complicated.
HTML5
According to the HTML5 spec:
Attribute values are a mixture of text and character references, except with the additional restriction that the text cannot contain an ambiguous ampersand.
Which is further refined in quoted attributes to:
The attribute name, followed by zero or more space characters, followed by a single U+003D EQUALS SIGN character, followed by zero or more space characters, followed by a single """ (U+0022) character, followed by the attribute value, which, in addition to the requirements given above for attribute values, must not contain any literal U+0022 QUOTATION MARK characters ("), and finally followed by a second single """ (U+0022) character.
So in short, any character except an "ambiguous ampersand" (&[a-zA-Z0-9]+;
when the result is not a valid character reference) and a quote character is valid inside of an attribute.
HTML 4.01
HTML 4.01 is less descriptive than HTML5 about the syntax (one of the reasons HTML5 was created in the first place). However, it does say this:
When script or style data is the value of an attribute (either style or the intrinsic event attributes), authors should escape occurrences of the delimiting single or double quotation mark within the value according to the script or style language convention. Authors should also escape occurrences of "&" if the "&" is not meant to be the beginning of a character reference.
Note, this is saying what an author should do, not what a parser should do. So a parser could technically accept or reject invalid input (or mangle it to be valid).
XML 1.0
The XML 1.0 Spec defines an attribute as:
Attribute ::= Name Eq AttValue
where AttValue
is defined as:
AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
The &
is similar to the concept of an "ambiguous ampersand" from HTML5, however it's basically saying "any unencoded ampersand".
Note though that it explicitly denies <
from attribute values.
So while HTML5 allows it, XML1.0 explicitly denies it.
What Does It Mean
It means that for a compliant and bug free parser, HTML5 will ignore <
characters in an attribute, and XML will error.
It also means that for a compliant and bug free parser, HTML 4.01 will behave in unspecified and potentially odd ways (since the specification doesn't detail the behavior).
And this gets down to the crux of the issue. In the past, HTML was such a loose spec, that every browser had slightly different rules for how it would deal with malformed html. Each would try to "fix" it, or "interpret" what you meant. So that means that while a HTML5 compliant browser wouldn't execute the JS in <input type="text" value="<script>alert(0)</script>">
, there's nothing to say that a HTML 4.01 compliant browser wouldn't. And there's nothing to say that a bug may not exist in the XML or HTML5 parser that causes it to be executed (though that would be a pretty significant problem).
THAT is why OWASP (and most security experts) recommend you encode either all non-alpha-numeric characters or &<"
inside of an attribute value. There's no cost in doing so, only the added security of knowing how the browser's parser will interpret the value.
Do you have to? no. But defense in depth suggests that, since there's no cost to doing so, the potential benefit is worth it.
How to sanitize JS and HTML in inputs?
Because JavaScript can be disabled, sanitation is not an operation for the frontend; this task should be performed on the backend. Best practice says...
- Validate input (frontend)
- Ensure that the data conforms to what you expect before submission
- Sanitize input (backend)
- Employ means on the backend to escape or remove unsafe characters before it reaches your application's storage layer
- Escape output (backend)
- As an additional safety measure, before outputting, be sure to escape anything coming from a 3rd party source
You are encouraged to validate data input on the frontend, notifying the user that certain characters are not permitted when trying to submit invalid data. In the event that JavaScript then gets disabled, your backend will still know what to with the malformed data.
Sanitizing HTML input
You will have to decide between good and lightweight. The recommended choice is 'HTMLPurifier', because it provide no-fuss secure defaults. As faster alternative it is often advised to use 'htmLawed'.
See also this quite objective overview from the HTMLPurifier author: http://htmlpurifier.org/comparison
Sanitize html inputs with php
The validation depending mainly on the context of your website, what's should be confirmed to keep database consistent as possible.
Also there are some validations which are like a global or public, such as trim() and stripslashes()
The main function of validation is to check user inputs that will stored in database and used in future, such as email or phone number and password of user when login or sign-up.
You should validate that phone number is numeric and only 12 length. Or validate that email is in correct format.
About what to use for validation, you can search about:
FILTERs here https://www.w3schools.com/php/php_filter.asp ,
REGULAR EXPERSSIONS here https://www.w3schools.com/php/php_regex.asp
Other way is by using string functions: here https://www.w3schools.com/php/php_ref_string.asp
Javascript Form Sanitization
You don't have to pattern match every character you could just match the string, and you could just return a match for any character outside of A-z or 0-9. The regexp match method returns an object if it finds a match and a null if nothing is found so in this to turn it to a boolean just prepend with an !, this will invert it, if you want to just turn it to a boolean then prepend with a !!.
function jsValidationAndSanitization() {
/**
Validate and sanitize every input that comes from an HTML form.
@return boolean
**/
var submittedInput = document.forms["form"]["search_input"].value;
if (submittedInput == "") {
console.log("error: empty input");
return false;
}
if (submittedInput != "") {
// non-admitted chars ( black list )
var wl_pattern = /[^A-z0-9]+/;
var result = submittedInput.match(wl_pattern);
if (result) { console.log(result); }
return !result;
}
return false; // Catch all to return false
}
Todo List Sanitizing Input jQuery
You can strip all html
tags on input
with a regex
pattern:
$("#todoInput").val().replace(/(<([^>]+)>)/ig,"");
var todoList = [];
$("#add").on("click", function() {
var todoItem = $("#todoInput").val().replace(/(<([^>]+)>)/ig,"");
if (!todoItem.trim()) {
alert("please enter a to do");
} else {
todoList.push(todoItem);
//empty the input field on click
$("#todoInput").val("");
//add a mark complete button to every array item
//publish array
var addedTodo = todoList[todoList.length - 1];
console.log(addedTodo);
$(".todoContainer")
.append(
'<li class="eachItem">' +
'<p class="todoItemStyle">' + addedTodo +
'</p><button class="sm-btn" id="deleteButton"> delete </button> <button class="sm-btn" id="completeButton"> complete </button></li>'
)
.addClass("todoStyle");
}
});
//add button to complete all items
$("#completeAll").on("click", function() {
$(".todoItemStyle").toggleClass("completed");
});
//add button to restart list
$("#newList").on("click", function() {
todoList = [];
$(".todoContainer").html("");
});
//button to remove a todo
$("body").on("click", "#deleteButton", function() {
$(this).parent().remove();
console.log("delete button pressed");
});
//button to complete a todo
$("body").on("click", "#completeButton", function() {
$(this).siblings(".todoItemStyle").toggleClass("completed");
console.log("completed clicked");
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div class="main container">
<div class="all container">
<h1> My List </h1>
<div class="buttons">
<button id="newList" class="global-buttons"> New List </button>
<button id="completeAll" class="global-buttons"> Complete All</button>
</div>
<div class="container list">
<label> Input your to do </label>
<input type="text" id="todoInput" placeholder="Input your to do here" name="todo">
<button id="add" class="global-buttons"> Add to list </button>
<ul class="todoContainer">
</ul>
</div>
</div>
</div>
Sanitizing HTML in submitted form data
Django comes with a template filter called striptags, which you can use in a template:
value|striptags
It uses the function strip_tags
which lives in django.utils.html
. You can utilize it also to clean your form data:
from django.utils.html import strip_tags
message = strip_tags(form.cleaned_data['message'])
When is it best to sanitize user input?
I like to sanitize it as early as possible, which means the sanitizing happens when the user tries to enter in invalid data. If there's a TextBox for their age, and they type in anything other that a number, I don't let the keypress for the letter go through.
Then, whatever is reading the data (often a server) I do a sanity check when I read in the data, just to make sure that nothing slips in due to a more determined user (such as hand-editing files, or even modifying packets!)
Edit: Overall, sanitize early and sanitize any time you've lost sight of the data for even a second (e.g. File Save -> File Open)
Related Topics
More Concise Way to Check to See If an Array Contains Only Numbers (Integers)
How to Protect from Downloading a Video from a Site
Move_Uploaded_File() Function Is Not Working
Why Don't PHP Attributes Allow Functions
Generating Random Numbers Without Repeats
Undefined Variable Problem with PHP Function
Binding Parameters for Where in Clause with Pdo
Google_Service_Directory - (403) Not Authorized to Access This Resource/Api
Is Include()/Require() with "Side Effects" a Bad Practice
Trying to Access Array Offset on Value of Type Bool in PHP 7.4
PHP Remove Special Character from String
Uploading in Codeigniter - the Filetype You Are Attempting to Upload Is Not Allowed
Two Simultaneous Ajax Requests Won't Run in Parallel
Get the Sum of All Digits in a Numeric String
Prevent Browser's Back Button Login After Logout in Laravel 5