What Are All the Escape Characters

What are all the escape characters?

You can find the full list here.

  • \t Insert a tab in the text at this point.
  • \b Insert a backspace in the text at this point.
  • \n Insert a newline in the text at this point.
  • \r Insert a carriage return in the text at this point.
  • \f Insert a formfeed in the text at this point.
  • \s Insert a space in the text at this point.
  • \' Insert a single quote character in the text at this point.
  • \" Insert a double quote character in the text at this point.
  • \\ Insert a backslash character in the text at this point.

What does it mean to escape a string?

Escaping a string means to reduce ambiguity in quotes (and other characters) used in that string. For instance, when you're defining a string, you typically surround it in either double quotes or single quotes:

"Hello World."

But what if my string had double quotes within it?

"Hello "World.""

Now I have ambiguity - the interpreter doesn't know where my string ends. If I want to keep my double quotes, I have a couple options. I could use single quotes around my string:

'Hello "World."'

Or I can escape my quotes:

"Hello \"World.\""

Any quote that is preceded by a slash is escaped, and understood to be part of the value of the string.

When it comes to queries, MySQL has certain keywords it watches for that we cannot use in our queries without causing some confusion. Suppose we had a table of values where a column was named "Select", and we wanted to select that:

SELECT select FROM myTable

We've now introduced some ambiguity into our query. Within our query, we can reduce that ambiguity by using back-ticks:

SELECT `select` FROM myTable

This removes the confusion we've introduced by using poor judgment in selecting field names.

A lot of this can be handled for you by simply passing your values through mysql_real_escape_string(). In the example below you can see that we're passing user-submitted data through this function to ensure it won't cause any problems for our query:

// Query
$query = sprintf("SELECT * FROM users WHERE user='%s' AND password='%s'",
mysql_real_escape_string($user),
mysql_real_escape_string($password));

Other methods exist for escaping strings, such as add_slashes, addcslashes, quotemeta, and more, though you'll find that when the goal is to run a safe query, by and large developers prefer mysql_real_escape_string or pg_escape_string (in the context of PostgreSQL.

Why do some characters have Escape character before them?

See here for information about the GSM 03.38 encoding.

“Why” questions are always difficult to answer precisely, but my guess is that the goal is to be able to encode the characters deemed most common with 7 bits, while other, less frequent characters will require 14 bits.

There are only 1120 bits per SMS, so saving space is desirable. With the above encoding, you can get more than 140 characters encoded for a “normal” text message.

Javascript - How to show escape characters in a string?

If your goal is to have

str = "Hello\nWorld";

and output what it contains in string literal form, you can use JSON.stringify:

console.log(JSON.stringify(str)); // ""Hello\nWorld""

const str = "Hello\nWorld";
const json = JSON.stringify(str);
console.log(json); // ""Hello\nWorld""
for (let i = 0; i < json.length; ++i) {
console.log(`${i}: ${json.charAt(i)} (0x${json.charCodeAt(i).toString(16).toUpperCase().padStart(4, "0")})`);
}
.as-console-wrapper {
max-height: 100% !important;
}

Escaping characters in C

You need to special case the replacement of '\n' to '\\' + 'n' etc.

There is no need to make a local copy of src to scan for special characters. You can simplify the code this way:

char *escapeChars(const char *src) {
int i, j;
char *pw;

for (i = j = 0; src[i] != '\0'; i++) {
if (src[i] == '\n' || src[i] == '\t' ||
src[i] == '\\' || src[i] == '\"') {
j++;
}
}
pw = malloc(i + j + 1);

for (i = j = 0; src[i] != '\0'; i++) {
switch (src[i]) {
case '\n': pw[i+j] = '\\'; pw[i+j+1] = 'n'; j++; break;
case '\t': pw[i+j] = '\\'; pw[i+j+1] = 't'; j++; break;
case '\\': pw[i+j] = '\\'; pw[i+j+1] = '\\'; j++; break;
case '\"': pw[i+j] = '\\'; pw[i+j+1] = '\"'; j++; break;
default: pw[i+j] = src[i]; break;
}
}
pw[i+j] = '\0';
return pw;
}

Note that you should also escape some other characters: '\r', and the non printing or non portable characters in the range 1 to 31 and 127 to 255 for ASCII. Escaping these as octal sequences is more work but manageable at your skill level.

Which characters need to be escaped in HTML?

If you're inserting text content in your document in a location where text content is expected1, you typically only need to escape the same characters as you would in XML. Inside of an element, this just includes the entity escape ampersand & and the element delimiter less-than and greater-than signs < >:

& becomes &
< becomes <
> becomes >

Inside of attribute values you must also escape the quote character you're using:

" becomes "
' becomes '

In some cases it may be safe to skip escaping some of these characters, but I encourage you to escape all five in all cases to reduce the chance of making a mistake.

If your document encoding does not support all of the characters that you're using, such as if you're trying to use emoji in an ASCII-encoded document, you also need to escape those. Most documents these days are encoded using the fully Unicode-supporting UTF-8 encoding where this won't be necessary.

In general, you should not escape spaces as  .   is not a normal space, it's a non-breaking space. You can use these instead of normal spaces to prevent a line break from being inserted between two words, or to insert          extra        space       without it being automatically collapsed, but this is usually a rare case. Don't do this unless you have a design constraint that requires it.


1 By "a location where text content is expected", I mean inside of an element or quoted attribute value where normal parsing rules apply. For example: <p>HERE</p> or <p title="HERE">...</p>. What I wrote above does not apply to content that has special parsing rules or meaning, such as inside of a script or style tag, or as an element or attribute name. For example: <NOT-HERE>...</NOT-HERE>, <script>NOT-HERE</script>, <style>NOT-HERE</style>, or <p NOT-HERE="...">...</p>.

In these contexts, the rules are more complicated and it's much easier to introduce a security vulnerability. I strongly discourage you from ever inserting dynamic content in any of these locations. I have seen teams of competent security-aware developers introduce vulnerabilities by assuming that they had encoded these values correctly, but missing an edge case. There's usually a safer alternative, such as putting the dynamic value in an attribute and then handling it with JavaScript.

If you must, please read the Open Web Application Security Project's XSS Prevention Rules to help understand some of the concerns you will need to keep in mind.

Storing escape characters in unix variable

The problem is caused by backticks. Use $( ) instead, and it goes away:

var="*a<br>*b<br>*c"
var=$(printf '%s\n' "$var" | sed 's/\*/\\*/g')
printf '%s\n' "$var"

(Why is this problem caused by backticks? Because the only way to nest them is to escape the inner ones with backslashes, so they necessarily change how backslashes behave; whereas $( ), because it uses different starting and ending sigils, can be nested natively).


That said, if your shell is one (like bash) with ksh-inspired extensions, you don't need sed at all here, as the shell can perform simple string replacements natively via parameter expansion:

var="*a<br>*b<br>*c"
printf '%s\n' "${var//'*'/'\*'}"

For background on why this answer uses printf instead of echo, see Why is printf better than echo? at [unix.se], or the APPLICATION USAGE section of the POSIX specification for echo.

Rules for C++ string literals escape character

Control characters:

(Hex codes assume an ASCII-compatible character encoding.)

  • \a = \x07 = alert (bell)
  • \b = \x08 = backspace
  • \t = \x09 = horizonal tab
  • \n = \x0A = newline (or line feed)
  • \v = \x0B = vertical tab
  • \f = \x0C = form feed
  • \r = \x0D = carriage return
  • \e = \x1B = escape (non-standard GCC extension)

Punctuation characters:

  • \" = quotation mark (backslash not required for '"')
  • \' = apostrophe (backslash not required for "'")
  • \? = question mark (used to avoid trigraphs)
  • \\ = backslash

Numeric character references:

  • \ + up to 3 octal digits
  • \x + any number of hex digits
  • \u + 4 hex digits (Unicode BMP, new in C++11)
  • \U + 8 hex digits (Unicode astral planes, new in C++11)

\0 = \00 = \000 = octal ecape for null character

If you do want an actual digit character after a \0, then yes, I recommend string concatenation. Note that the whitespace between the parts of the literal is optional, so you can write "\0""0".

What characters do I need to escape in XML documents?

If you use an appropriate class or library, they will do the escaping for you. Many XML issues are caused by string concatenation.

XML escape characters

There are only five:

"   "
' '
< <
> >
& &

Escaping characters depends on where the special character is used.

The examples can be validated at the W3C Markup Validation Service.

Text

The safe way is to escape all five characters in text. However, the three characters ", ' and > needn't be escaped in text:

<?xml version="1.0"?>
<valid>"'></valid>

Attributes

The safe way is to escape all five characters in attributes. However, the > character needn't be escaped in attributes:

<?xml version="1.0"?>
<valid attribute=">"/>

The ' character needn't be escaped in attributes if the quotes are ":

<?xml version="1.0"?>
<valid attribute="'"/>

Likewise, the " needn't be escaped in attributes if the quotes are ':

<?xml version="1.0"?>
<valid attribute='"'/>

Comments

All five special characters must not be escaped in comments:

<?xml version="1.0"?>
<valid>
<!-- "'<>& -->
</valid>

CDATA

All five special characters must not be escaped in CDATA sections:

<?xml version="1.0"?>
<valid>
<![CDATA["'<>&]]>
</valid>

Processing instructions

All five special characters must not be escaped in XML processing instructions:

<?xml version="1.0"?>
<?process <"'&> ?>
<valid/>

XML vs. HTML

HTML has its own set of escape codes which cover a lot more characters.



Related Topics



Leave a reply



Submit