Store HTML Entities in Database? or Convert When Retrieved

Store html entities in database? Or convert when retrieved?

I'd recommend storing the most raw form of the data in the database. That gives you the most flexibility when choosing how and where to output that data.

If you find that performance is a problem, you could cache the HTML-formatted version of this data somehow. Remember that premature optimization is a bad thing.

Do I need to use HTML entities when storing data in the database?

HTML entities have been introduced years ago to transport character information over the wire when transportation was not binary safe and for the case that the user-agent (browser) did not support the charset encoding of the transport-layer or server.

As a HTML entity contains only very basic characters (&, ;, a-z and 0-9) and those characters have the same binary encoding in most character sets, this is and was very safe from those side-effects.

However when you store something in the database, you don't have these issues because you're normally in control and you know what and how you can store text into the database.

For example, if you allow Unicode for text inside the database, you can store all characters, none is actually special. Note that you need to know your database here, there are some technical details you can run into. Like you don't know the charset encoding for your database connection so you can't exactly tell your database which text you want to store in there. But generally, you just store the text and retrieve it later. Nothing special to deal with.

In fact there are downsides when you use HTML entities instead of the plain character:

  • HTML entities consume more space: ü is much larger than ü in LATIN-1, UTF-8, UTF-16 or UTF-32.
  • HTML entities need further processing. They need to be created, and when read, they need to be parsed. Imagine you need to search for a specific text in your database, or any other action would need additional handling. That's just overhead.

The real fun starts when you mix both concepts. You come to a place you really don't want to go into. So just don't do it because you ain't gonna need it.

Decoding htmlentities from database PHP

Just echo the $new_html.

$new_html = html_entity_decode($ohtml, ENT_QUOTES);
echo $new_html;

EDIT: Slow I am.. Trying to understand his question took so long others already posted comments with the solution...

EDIT2: Try this code, it's working for me:

$html = "<b>I hate entities</b>";

$ohtml = htmlentities($html);

$new_html = html_entity_decode($ohtml, ENT_QUOTES);
echo $new_html;

Converting back to HTML characters

http://us3.php.net/manual/en/function.htmlspecialchars-decode.php
This is the answer to your problems.

I am not sure how you should store the HTML tags. Obviously you can't store them in their original form, and it is bloated to use them with special entities. I recommend instead linking to a file that can be accessed that holds the HTML in it.

OK, you can store them as such, but you should use http://us3.php.net/manual/en/function.addslashes.php this function, too.

I don't think it's efficient to store a large block of HTML in the field; I guess it depends on the size.

Saving and Displaying HTML and special characters in a mysql database safely?

While encoding characters is a good thing, one must make sure not to over-encode.

Only encode what /needs/ encoded at that time. Don't encode the HTML before putting it into your database. You may want to print things out later, or you may want to run searches against it. Use the proper escape sequences for SQL (or, better yet, use PDO).

Only when you are sending things to the browser should you escape the HTML, and then you need to decide what kind of escaping you need. To convert things like < and & as the character entities so they will display properly, then use the right escape method for that.

How to save HTML to database and retrieve it properly

The rule of thumb is the following:

  1. Store in your database the RAW HTML without any encodings or sanitizings. A SQL server doesn't care if you store some string containing XSS code.
  2. When displaying this output to your page make sure that it is sanitized.

So:

[HttpPost, ActionName("Create")]
[ValidateAntiForgeryToken]
public ActionResult Create(Post model)
{
// store model.Data directly in your database without any cleaning or sanitizing
}

and then when displaying:

@Html.Raw(HtmlUtility.SanitizeHtml(Model.Data))

Notice how I used the Html.Raw helper here to ensure that you don't get double HTML encoded output. The HtmlUtility.SanitizeHtml function should already take care of sanitizing the value and return a safe string that you could display in your view and it will not be further encoded. If on the other hand you used @HtmlUtility.SanitizeHtml(Model.Data), then the @ razor function would HTML encode the result of the SanitizeHtml function which might not be what you are looking for.

How to retrieve original text after using htmlspecialchars() and htmlentities()

There are two problems.

First, you are double-encoding HTML characters by using both htmlentities and htmlspecialchars. Both of those functions do the same thing, but htmlspecialchars only does it with a subset of characters that have HTML character entity equivalents (the special ones.) So with your example, the ampersand would be encoded twice (since it is a special character), so what you would actually get would be:

$example = 'Welcome & This is a test paragraph';

$example = htmlentities($example);
var_dump($example); // 'Welcome & This is a test paragraph'

$example = htmlspecialchars($example);
var_dump($example); // 'Welcome &amp; This is a test paragraph'

Decide which one of those functions you need to use (probably htmlspecialchars will be sufficient) and use only one of them.

Second, you are using these functions at the wrong time. htmlentities and htmlspecialchars will not do anything to "sanitize" your data for input into your database. (Not saying that's what you're intending, as you haven't mentioned this, but many people do seem to try to do this.) If you want to protect yourself from SQL injection, bind your values to prepared statements. Escaping it as you are currently doing with mysqli_real_escape_string is good, but it isn't really sufficient.

htmlspecialchars and htmlentities have specific purposes: to convert characters in strings that you are going to output into an HTML document. Just wait to use them until you are ready to do that.



Related Topics



Leave a reply



Submit