HTML, JS, and CSS Entity Characters

The character set or character encoding is to define the numerical number and the storage format of the numerical number for the characters. The character set defines the binary numerical relationship corresponding to each character, that is, defines a numerical number for the character. The character encoding defines how the value is stored, that is, expressed in several bytes, in the binary format of the value (with or without a specific mark).

But sometimes it can be regarded as the same meaning, because of the only correspondence. For example, ASCII, IOS-8859-1, GB2312, GBK, etc. are both character sets and character encodings. When a character set corresponds to multiple implementation forms, it does not have the same meaning. For example, Unicode is just a character set, but there are multiple corresponding character encodings, like UTF-8, UTF-16, and UTF-32.

Next, let's take a look at the entity characters of HTML, JS, and CSS.

HTML Entity Characters

Generally speaking, some reserved characters in HTML (characters with special meaning (such as the "<" mark of a tag)) cannot be used directly in the text, and character entities must be used, as well as characters that cannot be directly entered by the keyboard.

Common HTML Entity Names and Characters

Entity Name Character Description
<   < Less than a sign or display mark
>   > greater than a sign or display mark
&   & can be used to display other special characters
"   " quotation marks
®   ® registered
©   © Copyright
™   ™ trademark
    spaces

A character entity has the following three parts.

An ampersand '&' and an entity name (or a '#' and an entity number), and a semicolon ';'.

Example: Character entity for "<"

Two representations of "<" :
Entity name method: <
Entity numbering method: < //or < (hexadecimal format (&#x;))

Character Number

The character number is actually the number of the ASCII code, and both decimal and hexadecimal can be used. Some special characters have entity names and can be represented by entity names. You can check the ISO-8859-1 character set table below. Not just reserved characters. In fact, all characters can be represented by character entities, which are represented by the entity number, which is the number of the ASCII code. Both ISO-8859-1 (default in HTML 4.01) and UTF-8 (default in HTML5) are built on ASCII.

ASCII character set

ASCII stands for "American Standard Code for Information Interchange".

Character Numeric Description
     0 - 31 Control characters (see below)
     32 space
! 33 exclamation mark
" 34 quotation mark
# 35 number sign
$36 dollar sign
% 37 percent sign
& 38 ampersand
' 39 apostrophe
( 40 left parenthesis
) 41 right parenthesis
* 42 asterisk
+ 43 plus sign
, 44 comma
- 45 hyphen
.46 period
/ 47 slash
0 48 digit 0
1 49 digit 1
2 50 digit 2
3 51 digit 3
4 52 digits 4
5 53 digit 5
6 54 digit 6
7 55 digit 7
8 56 digit 8
9 57 digit 9
: 58 colons
; 59 semicolon
< 60 less-than
= 61 equals-to
> 62 greater-than
? 63 question mark
@ 64 at sign
A 65 uppercase A
B 66 uppercase B
C 67 uppercase C
D 68 uppercase D
E 69 uppercase E
F 70 uppercase F
G 71 uppercase G
H 72 uppercase H
I 73 uppercase I
J 74 uppercase J
K 75 uppercase K
L 76 uppercase L
M 77 uppercase M
N 78 uppercase N
O 79 uppercase O
P 80 uppercase P
Q 81 uppercase Q
R 82 uppercase R
S 83 uppercase S
T 84 uppercase T
U 85 uppercase U
V 86 uppercase V
W 87 uppercase W
X 88 uppercase X
Y 89 uppercase Y
Z 90 uppercase Z
[ 91 left square bracket
\92 backslash
] 93 right square bracket
^ 94 caret
_ 95 underscore
` 96 grave accent
a 97 lowercase a
b 98 lowercase b
c 99 lowercase c
d 100 lowercase d
e 101 lowercase e
f 102 lowercase f
g 103 lowercase g
h 104 lowercase h
i 105 lowercase i
j 106 lowercase j
k 107 lowercase k
l 108 lowercase l
m 109 lowercase m
n 110 lowercase n
o 111 lowercase o
p 112 lowercase p
q 113 lowercase q
r 114 lowercase r
s 115 lowercase s
t 116 lowercase t
u 117 lowercase u
v 118 lowercase v
w 119 lowercase w
x 120 lowercase x
y 121 lowercase y
z 122 lowercase z
{ 123 left curly brace
| 124 vertical bar
} 125 right curly brace
~ 126 tilde

ASCII control characters (range 00-31, plus 127) are designed to control hardware devices. Control characters (except horizontal tabs, newlines, and carriage returns) are not relevant to HTML documents.

Character Numeric Description
NUL 00 null character
SOH 01 start of header
STX 02 start of text
ETX 03 end of text
EOT 04 end of transmission
ENQ 05 enquiry
ACK 06 acknowledge
BEL 07 bell (ring)
BS 08 backspace
HT 09 horizontal tab
LF 10 line feed
VT 11 vertical tab
FF 12 form feed
CR 13 carriage return
SO 14 shift out
SI 15 shift in
DLE 16 data link escape
DC1 17 device control 1
DC2 18 device control 2
DC3 19 device control 3
DC4 20 device control 4
NAK 21 negative acknowledgement
SYN 22 synchronize
ETB 23 end transmission block
CAN 24 cancel
EM 25 end of medium
SUB 26 substitute
ESC 27 escape
FS 28 file separator
GS 29 group separator
RS 30 record separator
US 31 unit separator          
DEL 127 delete (rubout)

ISO-8859-1 Character Set

Character Entity Number Entity Name Description
     0 - 31 Control characters
     32 space
! 33 exclamation mark
" 34 " quotation mark
# 35 number sign
$36 dollar sign
% 37 percent sign
& 38 & ampersand
' 39 apostrophe
( 40 left parenthesis
) 41 right parenthesis
* 42 asterisk
+ 43 plus sign
, 44 comma
- 45 hyphen-minus
. 46 full stop
/ 47 solidus
0 48 digit zero
1 49 digit one
2 50 digit two
3 51 digit three
4 52 digit four
5 53 digits five
6 54 digit six
7 55 digit seven
8 56 digit eight
9 57 digit nine
: 58 colons
; 59 semicolon
< 60 < less-than sign
= 61 equals sign
> 62 > greater-than sign
? 63 question mark
@ 64 commercial at
A 65 Latin capital letter A
B 66 Latin capital letter B
C 67 Latin capital letter C
D 68 Latin capital letter D
E 69 Latin capital letter E
F 70 Latin capital letter F
G 71 Latin capital letter G
H 72 Latin capital letter H
I 73 Latin capital letter I
J 74 Latin capital letter J
K 75 Latin capital letter K
L 76 Latin capital letter L
M 77 Latin capital letter M
N 78 Latin capital letter N
O 79 Latin capital letter O
P 80 Latin capital letter P
Q 81 Latin capital letter Q
R 82 Latin capital letter R
S 83 Latin capital letter S
T 84 Latin capital letter T
U 85 Latin capital letter U
V 86 Latin capital letter V
W 87 Latin capital letter W
X 88 Latin capital letter X
Y 89 Latin capital letter Y
Z 90 Latin capital letter Z
[ 91 left square bracket
\92 reverse solidus
] 93 right square bracket
^ 94 circumflex accent
_ 95 low line
` 96 grave accent
a 97 Latin small letter a
b 98 Latin small letter b
c 99 Latin small letter c
d 100 Latin small letter d
e 101 Latin small letter e
f 102 Latin small letter f
g 103 Latin small letter g
h 104 Latin small letter h
i 105 Latin small letter i
j 106 Latin small letter j
k 107 Latin small letter k
l 108 Latin small letter l
m 109 Latin small letter m
n 110 Latin small letter n
o 111 Latin small letter o
p 112 Latin small letter p
q 113 Latin small letter q
r 114 Latin small letter r
s 115 Latin small letter s
t 116 Latin small letter t
u 117 Latin small letter u
v 118 Latin small letter v
w 119 Latin small letter w
x 120 Latin small letter x
y 121 Latin small letter y
z 122 Latin small letter z
{ 123 left curly bracket
| 124 vertical line
} 125 right curly bracket
~ 126 tilde
     127 Control character

In ISO-8859-1, the characters between 128 and 159 are undefined.
But there are conventions for some characters:
Character Entity Number Entity Name Description
€ 128 € euro sign
     129 NOT USED
‚ 130 ‚ single low-9 quotation mark
ƒ 131 ƒ Latin small letter f with hook
„ 132 „ double low-9 quotation mark
… 133 … horizontal ellipsis
† 134 † dagger
‡ 135 ‡ double dagger
ˆ 136 ˆ modifier letter circumflex accent
‰ 137 ‰ per mille sign
Š 138 Š Latin capital letter S with caron
‹ 139 ‹ single left-pointing angle quotation mark
Π140 ΠLatin capital ligature OE
     141 NOT USED
Ž 142 Ž Latin capital letter Z with caron
     143 NOT USED
     144 NOT USED
‘ 145 ‘ left single quotation mark
’ 146 ’ right single quotation mark
“ 147 “ left double quotation mark
” 148 ” right double quotation mark
• 149 • bullet
– 150 – en dash
— 151 — em dash
~ 152 ˜ small tilde
™ 153 ™ trade mark sign
š 154 š Latin small letter s with caron
› 155 › single right-pointing angle quotation mark
œ 156 œ Latin small ligature oe
     157 NOT USED
ž 158 ž Latin small letter z with caron
Ÿ 159 Ÿ Latin capital letter Y with diaeresis

The purpose of HTML entities is so that special characters do not cause problems parsing HTML, but the fact that it creates new XSS problems. HTML entities increase the difficulty of filtering XSS. Character entities are not treated as tags but written in place of the tag attribute value can be executed as normal. For example, if the href attribute value can be injected, and Javascript filtering is done, then,

<a href=javascript:alert(1); >aa</a>

The alert(1) will not be executed. If javascript:alert(1); performs entity character conversion. The alert will execute normally. XSS and can execute js pseudo-protocol (javascript: execute the script) using some attribute values. It is said that entity characters are decoded after the dom tree node is created, and because of this, entity characters cannot create nodes.

2. CSS Entity Characters

Not only does HTML have solid characters, but CSS and JS also support character encoding for display. The Unicode character set is the UTF-8 character encoding. Some common characters and entities are represented as follows.

CSS Entity Characters

Example of CSS Entity Characters

<!DOCTYPE html>
<html>
<style>
p:after {
  content: '\21E0';
}
</style>
<body>
<p>Display

</body> </html>

Not only special symbols in CSS but also some common characters (a-z, 0-9). For example, in CSS format, backslash + hexadecimal of ASCII value. The character number must be placed in the attribute value position, for example:

<p style="\61\62">aaaa</p> //ab The value after the backslash is the hexadecimal representation of the character set.

JavaScript Entity Characters

Example of JavaScript Entity Characters

var code2 = '\u0061';
document.write(code2); // a

In JS, if the characters representing the range of the ASCII character set do not need to start with "\u", you can directly use the backslash + the octal of the ASCII code value.

var code='\141'; // Octal.
If hex starts with "\x".
There are three kinds of "\u", "\", "\x" for all character encoding representations in js.

JavaScript provides some special characters such as:
\n (line feed), \r (carriage return), \' (single quote), etc.
In fact, "\" can also be followed by octal or hexadecimal numbers.
For example, the character "a" can be represented as:
"\141" or "\x61" (note the lowercase "x"),
As for double-byte characters such as the Chinese character "black", it can only be represented as "\u9ED1" in hexadecimal (note that it is a lowercase character "u"), where the character "u" (Unicode) represents a double-byte character.

The octal escape string is as follows:
<SCRIPT LANGUAGE="JavaScript">
eval("\141\154\145\162\164\50\42\u9ED1\u5BA2\u9632\u7EBF\42\51\73")
</SCRIPT>

The hexadecimal escape string is as follows:
<SCRIPT LANGUAGE="JavaScript">
eval("\x61\x6C\x65\x72\x74\x28\x22\u9ED1\u5BA2\u9632\u7EBF\x22\x29\x3B")
</SCRIPT>
 
<SCRIPT LANGUAGE="JavaScript">
alert("\x61\x6C\x65\x72\x74\x28\x22\u9ED1\u5BA2\u9632\u7EBF\x22\x29\x3B")
</SCRIPT>

4. Entity Representation Format of HTML, CSS, and JS

Based on the numbering of characters:

1. HTML entity: "& #" + character number + ";"; // The character number is in decimal.

2. CSS entity: "\" + character number; //The character number must be converted to hexadecimal, which is the character corresponding to the number.

3. JavaScript: "\u" + character number. //The byte number should be converted to hexadecimal. Octal for \numbers, hexadecimal for \u and \x numbers.



Leave a reply



Submit