PHP Messing with HTML Charset Encoding

PHP messing with HTML Charset Encoding

You have probably come to mix encoding types.
For example. A page that is sent as iso-8859-1, but get UTF-8 text encoding from MySQL or XML would typically fail.

To solve this problem you must keep control on input ecodings type in relation to the type of encoding you have chosen to use internal.

If you send it as an iso-8859-1, your input from the user is also iso-8859-1.

header("Content-type:text/html; charset: iso-8859-1");

And if mysql sends latin1 you do not have to do anything.

But if your input is not iso-8859-1 you must converted it, before it's sending to the user or to adapt it to Mysql before it's store.

mb_convert_encoding($text, mb_internal_encoding(), 'UTF-8'); // If it's UTF-8 to internal encoding

Short it means that you must always have input converted to fit internal encoding and convereter output to match the external encoding.


This is the internal encoding I have chosen to use.

mb_internal_encoding('iso-8859-1'); // Internal encoding

This is a code i use.

mb_language('uni'); // Mail encoding
mb_internal_encoding('iso-8859-1'); // Internal encoding
mb_http_output('pass'); // Skip

function convert_encoding($text, $from_code='', $to_code='')
{
if (empty($from_code))
{
$from_code = mb_detect_encoding($text, 'auto');
if ($from_code == 'ASCII')
{
$from_code = 'iso-8859-1';
}
}

if (empty($to_code))
{
return mb_convert_encoding($text, mb_internal_encoding(), $from_code);
}
return mb_convert_encoding($text, $to_code, $from_code);
}

function encoding_html($text, $code='')
{
if (empty($code))
{
return htmlentities($text, ENT_NOQUOTES, mb_internal_encoding());
}

return mb_convert_encoding(htmlentities($text, ENT_NOQUOTES, $code), mb_internal_encoding(), $code);
}
function decoding_html($text, $code='')
{
if (empty($code))
{
return html_entity_decode($text, ENT_NOQUOTES, mb_internal_encoding());
}

return mb_convert_encoding(html_entity_decode($text, ENT_NOQUOTES, $code), mb_internal_encoding(), $code);
}

HTML Website Character Encoding Mess

I found a similar post here

  1. Make sure the database charset/coallition is UTF-8
  2. On the page you insert these russian characters ( the form, textarea ), make sure the encoding is UTF-8, by setting Content-Type to text/html; charset=utf-8. Enter in russian text directly to the form input.
  3. On the processing page that handles this form, which inserts it into the database, make sure to do SET NAMES utf8 so it's stored as UTF-8 before you insert the data, in a separate query beforehand.
  4. When you render the content from the database in a view, make sure the Content-Type is text/html; charset=utf-8.

Make sure that the content-type is not windows-1251 or iso-8859-1/latin1. Make sure the database charset/coallition is NOT ISO-8859-1/Latin1.

Utf8 in html correct and php html output messed up

To close this question myself (because I feel rather stupid right now), the one who actually solved this is Marc B as his comments made me understand the process of text encoding.

After setting the header (Content Type and charset) as well as setting the meta tag in HTML I discovered, just like Marc suspected that my IDE had encoded the php file in another encoding than UTF8. Saving the file as UTF8 and replacing the messed up specialchars fixed my issue.

Please excuse this, I wasn't fully aware of what I was doing.

Using PHP in HTML pages without messing up Character Encoding

I think what the problem may be now, is the type of editor you are using.

I created a file with plain Windows Notepad, and it did not show the characters correctly.

However, when I pasted my codes into Notepad++ and saved it as "Encoding/Encode in UTF-8 without BOM" (byte order mark),
it displayed correctly.

Visit notepad-plus-plus.org to download it. It has different encoding formats.

Content-Type charset won't change

It appeared I had a Chrome encoding extension installed, which was set to Windows-1251: https://chrome.google.com/webstore/detail/set-character-encoding/bpojelgakakmcfmjfilgdlmhefphglae

Charset=utf8 not working in my PHP page

sounds like you don't serve your content as utf-8. do this by setting the correct header:

header('Content-type: text/html; charset=utf-8');

in addition to be really sure the browser understands, add a meta-tag:

<meta http-equiv="Content-type" content="text/html; charset=utf-8" />

note that, depending on where the text comes from, you might have to check some other things too (database-connection, source-file-encoding, ...) - i've listed a lot of them in one of my answers to a similar question.

Fix incorrectly displayed encoding on an html document with php

  1. You need to save the page with UTF-8 without BOM encoding.
  2. Add this header on top of your script:

    header("Content-Type: text/html; charset=UTF-8");

[EDIT]: How to Save Files as UTF-8 without BOM :

On OP request, here's how you can do on Windows:

  1. Download Notepad++. It is an awesome text-editor that you should be using.
  2. Install it.
  3. open the PHP script in Notepad++ that contains this code. The page where you are doing all the coding. Yes, that file on your computer.
  4. In Notepad++, from the Encoding menu at the top, select "Convert to UTF-8 without BOM".
  5. Save the file.
  6. Upload to your webserver by FTP or whatever you use.
  7. Now, run that script.


Related Topics



Leave a reply



Submit