When Should One Use HTML Entities

When should one use HTML entities?

You don't generally need to use HTML character entities if your editor supports Unicode. Entities can be useful when:

  • Your keyboard does not support the character you need to type. For example, many keyboards do not have em-dash or the copyright symbol.
  • Your editor does not support Unicode (very common some years ago, but probably not today).
  • You want to make it explicit in the source what is happening. For example, the   code is clearer than the corresponding white space character.
  • You need to escape HTML special characters like <, &, or ".

Should I still use html entities? Why?

If the encoding is set correctly (and the document is saved as UTF-8) you should be able to work with just the characters. From the W3C:

Using an encoding such as UTF-8 means that you can avoid the need for most escapes and just work with characters.

http://www.w3.org/International/questions/qa-escapes

However, you still need to use entities for special characters such at greater/less than.

Is there any reason to use HTML entities over the characters themselves?

It is definitely needed if you want to render characters that will mess up the HTML parser (< and >, and the & itself).

If you set the character set to UTF-8, then you can use other raw UTF8 characters.

<meta charset='utf-8'>

However some older browsers don't understand this tag, so they might not render UTF-8 characters properly.

I also wanted to point out - HTML entities are sometimes used to tell the browser to render a concept, rather than a specific character. Some browsers prefer to render entities in a more readable way than a Unicode character, for example Lynx renders as (tm) instead of

Do I need to use HTML entities when storing data in the database?

HTML entities have been introduced years ago to transport character information over the wire when transportation was not binary safe and for the case that the user-agent (browser) did not support the charset encoding of the transport-layer or server.

As a HTML entity contains only very basic characters (&, ;, a-z and 0-9) and those characters have the same binary encoding in most character sets, this is and was very safe from those side-effects.

However when you store something in the database, you don't have these issues because you're normally in control and you know what and how you can store text into the database.

For example, if you allow Unicode for text inside the database, you can store all characters, none is actually special. Note that you need to know your database here, there are some technical details you can run into. Like you don't know the charset encoding for your database connection so you can't exactly tell your database which text you want to store in there. But generally, you just store the text and retrieve it later. Nothing special to deal with.

In fact there are downsides when you use HTML entities instead of the plain character:

  • HTML entities consume more space: ü is much larger than ü in LATIN-1, UTF-8, UTF-16 or UTF-32.
  • HTML entities need further processing. They need to be created, and when read, they need to be parsed. Imagine you need to search for a specific text in your database, or any other action would need additional handling. That's just overhead.

The real fun starts when you mix both concepts. You come to a place you really don't want to go into. So just don't do it because you ain't gonna need it.

Why are HTML character entities necessary?

Two main things.

  1. They let you use characters that are not defined in a current charset. E.g., you can legally use ASCII as the charset, and still include arbitrary Unicode characters thorugh entities.
  2. They let you quote characters that HTML gives special meaning to, as Simon noted.

PHP & mySQL: When exactly to use htmlentities?

Here's the general rule of thumb.

Escape variables at the last possible moment.

You want your variables to be clean representations of the data. That is, if you are trying to store the last name of someone named "O'Brien", then you definitely don't want these:

O'Brien
O\'Brien

.. because, well, that's not his name: there's no ampersands or slashes in it. When you take that variable and output it in a particular context (eg: insert into an SQL query, or print to a HTML page), that is when you modify it.

$name = "O'Brien";

$sql = "SELECT * FROM people "
. "WHERE lastname = '" . mysql_real_escape_string($name) . "'";

$html = "<div>Last Name: " . htmlentities($name, ENT_QUOTES) . "</div>";

You never want to have htmlentities-encoded strings stored in your database. What happens when you want to generate a CSV or PDF, or anything which isn't HTML?

Keep the data clean, and only escape for the specific context of the moment.

Do I need to use HTML entities for special characters if I'm using the UTF-8 charset?

No, symbols like ©, é, , the German umlauts ä, ö, ü, ß and all the other stuff can be used just like any other character when using UTF-8.

But note that some things still have to be entities because they have a special meaning in HTML ( < and > for example, which should still be replaced with > and < if you want to use them in your text)

Is it okay to use HTML entities in attributes?

Yes, it's perfectly fine. Character references are valid inside attributes, too, and will be treated as character references just the same.

For reference, see:

  • A description of character references (they may be found within text)
  • A description of text

URL + htmlentities ? what to think about this?

This is not about HTML entities in URLs. This is about you putting arbitrary data into HTML, which means you need to HTML escape any special characters in it. That this data happens to be a URL is irrelevant.

  1. You need to escape any arbitrary data you put into the URL with urlencode to preserve characters with a special meaning in the URL.
  2. The arbitrary blob of data you get from step one needs to be HTML escaped for the same reasons when put into HTML. As you see in your example, there's an & in your data which is required to be escaped to & by HTML rules.

If you did not use the URL in an HTML context, there'd be no need to HTML escape it. HTML entities have no place in a URL. A URL in an HTML context must be HTML escaped though, like any other data.



Related Topics



Leave a reply



Submit