Convert utf8-characters to iso-88591 and back in PHP
Have a look at iconv()
or mb_convert_encoding()
.
Just by the way: why don't utf8_encode()
and utf8_decode()
work for you?
utf8_decode — Converts a string with
ISO-8859-1 characters encoded with
UTF-8 to single-byte ISO-8859-1utf8_encode — Encodes an ISO-8859-1
string to UTF-8
So essentially
$utf8 = 'ÄÖÜ'; // file must be UTF-8 encoded
$iso88591_1 = utf8_decode($utf8);
$iso88591_2 = iconv('UTF-8', 'ISO-8859-1', $utf8);
$iso88591_2 = mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8');
$iso88591 = 'ÄÖÜ'; // file must be ISO-8859-1 encoded
$utf8_1 = utf8_encode($iso88591);
$utf8_2 = iconv('ISO-8859-1', 'UTF-8', $iso88591);
$utf8_2 = mb_convert_encoding($iso88591, 'UTF-8', 'ISO-8859-1');
all should do the same - with utf8_en/decode()
requiring no special extension, mb_convert_encoding()
requiring ext/mbstring and iconv()
requiring ext/iconv.
How to convert String with “ (ISO-8859-1) characters to normal (UTF-8)characters?
$final = '<li>Jain R.K. and Iyengar S.R.K., “Advanced Engineering Mathematicsâ€, Narosa Publications,</li>';
$final = str_replace("Â", "", $final);
$final = str_replace("’", "'", $final);
$final = str_replace("“", '"', $final);
$final = str_replace('–', '-', $final);
$final = str_replace('â€', '"', $final);
for past datas, i replaced the weird characters with UTF-8 characters.
for future datas, i made the charset to utf8 in php, html and databases connections.
PHP UTF8 - ISO-8859-1 encoding
This is a common format for mails, called "quoted printable". All non ascii characters are encoded. (See http://en.wikipedia.org/wiki/Quoted-printable)
The string is encapsulated by
=?<encoding>?Q?<string>?=
<encoding>
describes the encoding. Here: ISO8859-1
<string>
is the string itself
Please use imap_mime_header_decode() to decode the string ( before using utf8_encode() )!
ISO 8859 1 octal back to normal characters
Simple workaround:
The first string is only octal iso-8859-1, while the second one is double slashed iso-8859-1 with mixed utf-16 characters (why? now that is the question). The code below takes octal codes, converts to hex, packs them to binary and encodes them into utf-8. The utf-16 codes are already in hex, so they are only packed and encoded into utf-8.
For future info reference on charsets: http://www.fileformat.info/info/charset/index.htm
<?php
$string = "Tak hur\341 v posteli po pr\341ci a jde se sp\355nkat";
$string2 = "Som nen\\355 ja len chodiaca kapuc\\341 pra\\u0161iva ignorujuca";
print decode_str($string2)."<br>";
print decode_str($string);
function decode_str($string){
return utf16_to_utf8(iso_to_utf8($string));
}
function iso_to_utf8($string){
preg_match_all('#\\\\[0-9]{3}#',$string,$matches);
foreach($matches[0] as $match){
$char = preg_replace("#(\\\)#","",$match);
$a = pack("H*" , base_convert($char,8,16));
$string = preg_replace('#(\\\\)'.$char.'#',$a,$string);
}
return mb_convert_encoding($string,"UTF-8","ISO-8859-1");
}
function utf16_to_utf8($string){
preg_match_all('#\\\u[a-z0-9]{4}#',$string,$matches);
foreach($matches[0] as $match){
$char = preg_replace("#\\\\u#","",$match);
$a = pack("H*" , $char);
$a = mb_convert_encoding($a,"UTF-8","UTF-16");
$string = preg_replace('#'.preg_quote($match).'#',$a,$string);
}
return $string;
}
?>
PHP recovery broken non-english string(iso 8859-1) as utf-8
Answer by myself
Broken characters are in iso-8859-1 but not exactly.
It should be converted to bytes and converted again to ksc5601
For that I use just a mapping table. Because ksc5601 doesn't have any rule. It uses its own mapping table.
https://github.com/jihuichoi/correct-broken-korean-iso8859-1-to-utf8
Converting ISO-8859-1 charcodes to UTF-8
It's rather trivial: convert the hex string to binary, convert the ISO-8859 binary to UTF-8 binary:
$input = '4BFC434845000000';
echo iconv('ISO-8859-1', 'UTF-8', hex2bin($input));
Optionally strip out the NUL
bytes at some point.
Converting string from ISO-8859-1 to UTF8
I tried in interactive mode. Seems to be encoded in UTF-8 your text:
$ php -a
Interactive mode enabled
php > $text = "<p>Ayurveda ist die älteste Lebens- und Gesundheitslehre der Welt. Sie ist in einer Hochkultur auf dem Gebiet des heutigen Indien entstanden und ihre Prinzipien sind universell gültig.
php " </p>";
php > echo utf8_encode(html_entity_decode($text));
<p>Ayurveda ist die älteste Lebens- und Gesundheitslehre der Welt. Sie ist in einer Hochkultur auf dem Gebiet des heutigen Indien entstanden und ihre Prinzipien sind universell gültig.
</p>
php > echo utf8_decode(html_entity_decode($text));
<p>Ayurveda ist die älteste Lebens- und Gesundheitslehre der Welt. Sie ist in einer Hochkultur auf dem Gebiet des heutigen Indien entstanden und ihre Prinzipien sind universell gültig.
</p>
php >
You can try use as above in your environment. If the problem persists when you load your page, you can try iconv()
to fix it.
Translating ISO-8859-1 to UTF-8 problem
I found the solution here: PHP: Problems converting "’" character from ISO-8859-1 to UTF-8
The server claims it's serving up ISO-8859-1, but it's really Windows-1252, which converts to UTF-8 without a problem.
Related Topics
Best Way to Completely Destroy a Session - Even If the Browser Is Not Closed
Convert Latin1 Characters on a Utf8 Table into Utf8
How to Disable Output Buffering in PHP
Laravel Certificate Verification Errors When Sending Tls Email
PHPmailer - Ssl3_Get_Server_Certificate:Certificate Verify Failed
How to Clear Apc Cache Entries
Do I Have to Guard Against SQL Injection If I Used a Dropdown
How Exactly Is a PHP Script Executed
Strtotime With Different Languages
Running Command-Line Application from PHP as Specific User
Why Does the PHP Json_Encode Function Convert Utf-8 Strings to Hexadecimal Entities
Pdo With "Where... In" Queries
Laravel 5 Failed Opening Required Bootstrap/../Vendor/Autoload.PHP
Does MySQL_Real_Escape_String() Fully Protect Against SQL Injection
How to Make This Preg_Match Case Insensitive
Visual Studio Code PHP Intelephense Keep Showing Not Necessary Error