Convert Utf8-Characters to Iso-88591 and Back in PHP

Convert utf8-characters to iso-88591 and back in PHP

Have a look at iconv() or mb_convert_encoding().
Just by the way: why don't utf8_encode() and utf8_decode() work for you?

utf8_decode — Converts a string with
ISO-8859-1 characters encoded with
UTF-8 to single-byte ISO-8859-1

utf8_encode — Encodes an ISO-8859-1
string to UTF-8

So essentially

$utf8 = 'ÄÖÜ'; // file must be UTF-8 encoded
$iso88591_1 = utf8_decode($utf8);
$iso88591_2 = iconv('UTF-8', 'ISO-8859-1', $utf8);
$iso88591_2 = mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8');

$iso88591 = 'ÄÖÜ'; // file must be ISO-8859-1 encoded
$utf8_1 = utf8_encode($iso88591);
$utf8_2 = iconv('ISO-8859-1', 'UTF-8', $iso88591);
$utf8_2 = mb_convert_encoding($iso88591, 'UTF-8', 'ISO-8859-1');

all should do the same - with utf8_en/decode() requiring no special extension, mb_convert_encoding() requiring ext/mbstring and iconv() requiring ext/iconv.

How to convert String with “ (ISO-8859-1) characters to normal (UTF-8)characters?


$final = '<li>Jain R.K. and Iyengar S.R.K., “Advanced Engineering Mathematicsâ€, Narosa Publications,</li>';

$final = str_replace("Â", "", $final);
$final = str_replace("’", "'", $final);
$final = str_replace("“", '"', $final);
$final = str_replace('–', '-', $final);
$final = str_replace('â€', '"', $final);

for past datas, i replaced the weird characters with UTF-8 characters.

for future datas, i made the charset to utf8 in php, html and databases connections.

PHP UTF8 - ISO-8859-1 encoding

This is a common format for mails, called "quoted printable". All non ascii characters are encoded. (See http://en.wikipedia.org/wiki/Quoted-printable)

The string is encapsulated by

=?<encoding>?Q?<string>?=

<encoding> describes the encoding. Here: ISO8859-1

<string> is the string itself

Please use imap_mime_header_decode() to decode the string ( before using utf8_encode() )!

ISO 8859 1 octal back to normal characters

Simple workaround:

The first string is only octal iso-8859-1, while the second one is double slashed iso-8859-1 with mixed utf-16 characters (why? now that is the question). The code below takes octal codes, converts to hex, packs them to binary and encodes them into utf-8. The utf-16 codes are already in hex, so they are only packed and encoded into utf-8.

For future info reference on charsets: http://www.fileformat.info/info/charset/index.htm

<?php
$string = "Tak hur\341 v posteli po pr\341ci a jde se sp\355nkat";
$string2 = "Som nen\\355 ja len chodiaca kapuc\\341 pra\\u0161iva ignorujuca";

print decode_str($string2)."<br>";
print decode_str($string);


function decode_str($string){
return utf16_to_utf8(iso_to_utf8($string));
}

function iso_to_utf8($string){
preg_match_all('#\\\\[0-9]{3}#',$string,$matches);

foreach($matches[0] as $match){
$char = preg_replace("#(\\\)#","",$match);
$a = pack("H*" , base_convert($char,8,16));
$string = preg_replace('#(\\\\)'.$char.'#',$a,$string);
}
return mb_convert_encoding($string,"UTF-8","ISO-8859-1");
}

function utf16_to_utf8($string){
preg_match_all('#\\\u[a-z0-9]{4}#',$string,$matches);

foreach($matches[0] as $match){
$char = preg_replace("#\\\\u#","",$match);
$a = pack("H*" , $char);
$a = mb_convert_encoding($a,"UTF-8","UTF-16");
$string = preg_replace('#'.preg_quote($match).'#',$a,$string);
}

return $string;
}

?>

PHP recovery broken non-english string(iso 8859-1) as utf-8

Answer by myself

Broken characters are in iso-8859-1 but not exactly.
It should be converted to bytes and converted again to ksc5601
For that I use just a mapping table. Because ksc5601 doesn't have any rule. It uses its own mapping table.

https://github.com/jihuichoi/correct-broken-korean-iso8859-1-to-utf8

Converting ISO-8859-1 charcodes to UTF-8

It's rather trivial: convert the hex string to binary, convert the ISO-8859 binary to UTF-8 binary:

$input = '4BFC434845000000';
echo iconv('ISO-8859-1', 'UTF-8', hex2bin($input));

Optionally strip out the NUL bytes at some point.

Converting string from ISO-8859-1 to UTF8

I tried in interactive mode. Seems to be encoded in UTF-8 your text:

$ php -a
Interactive mode enabled

php > $text = "<p>Ayurveda ist die älteste Lebens- und Gesundheitslehre der Welt. Sie ist in einer Hochkultur auf dem Gebiet des heutigen Indien entstanden und ihre Prinzipien sind universell gültig.
php " </p>";
php > echo utf8_encode(html_entity_decode($text));
<p>Ayurveda ist die älteste Lebens- und Gesundheitslehre der Welt. Sie ist in einer Hochkultur auf dem Gebiet des heutigen Indien entstanden und ihre Prinzipien sind universell gültig.
</p>
php > echo utf8_decode(html_entity_decode($text));
<p>Ayurveda ist die älteste Lebens- und Gesundheitslehre der Welt. Sie ist in einer Hochkultur auf dem Gebiet des heutigen Indien entstanden und ihre Prinzipien sind universell gültig.
</p>
php >

You can try use as above in your environment. If the problem persists when you load your page, you can try iconv() to fix it.

Translating ISO-8859-1 to UTF-8 problem

I found the solution here: PHP: Problems converting "’" character from ISO-8859-1 to UTF-8

The server claims it's serving up ISO-8859-1, but it's really Windows-1252, which converts to UTF-8 without a problem.



Related Topics



Leave a reply



Submit