PHP: Convert Any String to Utf-8 Without Knowing the Original Character Set, or At Least Try

PHP: Convert any string to UTF-8 without knowing the original character set, or at least try

What you're asking for is extremely hard. If possible, getting the user to specify the encoding is the best. Preventing an attack shouldn't be much easier or harder that way.

However, you could try doing this:

iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $text);

Setting it to strict might help you get a better result.

How to convert any character encoding to UTF8 on PHP

Rather than blindly trying to detect the encoding, you should first check if the page that you downloaded has a listed character set. The character set may be set in the HTTP response header, for example:

Content-Type:text/html; charset=utf-8

Or in the HTML as a meta tag, for example:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 

Only if neither are available then try to guess the encoding with mb_detect_encoding() or other methods.

Set character set and convert to utf-8 without bom

PHP does not have any concept of character encodings; strings are binary data. The trick that makes everything seem to work is setting the output device, whether it's a web page or a terminal, to the correct character encoding.

If you are generating a web page, you can send the content-type header to tell the browser how the page is encoded.

header("Content-type: text/html;charset=utf-8");


Related Topics



Leave a reply



Submit