PHP Curl UTF-8 Charset
Simple:
When you use curl it encodes the string to utf-8
you just need to decode them..
Description
string utf8_decode ( string $data )
This function decodes data , assumed to be UTF-8
encoded, to ISO-8859-1
. HTML and PHP cURL response utf-8 encoding problem
Both pages are UTF-8 encoded, and cURL returns that as is. The problem is the following processing; assuming that libxml2 is involved, it tries to guess the encoding from <meta>
elements, but if there are none, it assumes ISO-8859-1. It can be forced to assume UTF-8, if an UTF-8 BOM ("\xEF\xBB\xBF") is preprended to the HTML.
Getting correct encoding from php cURL
Chrome default encoding is UTF-8, and if you set it to to UTF-8
curl_setopt($ch, CURLOPT_ENCODING, 'UTF-8');
your text will be as expected you can try that here.
Also detecting the encoding is painful since it can encounter many issues using mb_detect_encoding
but in this case it can be helpful if you specify the expected order of detection like so:
mb_detect_encoding($val, 'UTF-8,ISO-8859-15');
In my personal experience it is worthless without specifying the targets and in the right order, for example you need to list UTF-8
before ISO-8859-1
in your encoding_list or it will return ISO-8859-1
in most cases UPDATE:
The doc says
CURLOPT_ENCODING => ''
handle all encodings you can try that but as I said since you are dealing with a known encoding wich is UTF-8
please try $ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_URL => $url,
CURLOPT_HTTPHEADER => $headers,
CURLOPT_ENCODING => 'UTF-8',
CURLOPT_RETURNTRANSFER => true
));
PHP CURL isn't processing encoded return data properly
set another curl option for CURLOPT_ENCODING
and set it to "" to ensure it will not return any garbage
curl_setopt($ch, CURLOPT_ENCODING ,"");
How do I encode content fetched via CURL in PHP?
After doing a lot more experimenting I stumbled on this solution, which fixed everything.
My script fetched the URL contents and loaded them into a DOM document like this:
$html = file_get_contents_curl($link_url);
$doc = new DOMDocument();
@$doc->loadHTML($html);
Per the linked article, I changed it to this:$html = file_get_contents_curl($link_url);
$doc = new DOMDocument();
@$doc->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
I also eliminated the use of utf8_decode.And everything displayed properly.
php-curl dosen't support utf-8 in url
You just need to urlencode()
the UTF-8 characters, like this:
php > $api = 'http://www.aparat.com/etc/api/';
php > $searchEndpoint = 'videoBySearch/text/';
php > var_dump($api . $searchEndpoint . urlencode('نوروز'));
string(79) "http://www.aparat.com/etc/api/videoBySearch/text/%D9%86%D9%88%D8%B1%D9%88%D8%B2"
php > $encodedUrl = $api . $searchEndpoint . urlencode('نوروز');
php > var_dump($encodedUrl);
string(79) "http://www.aparat.com/etc/api/videoBySearch/text/%D9%86%D9%88%D8%B1%D9%88%D8%B2"
php > var_dump(json_decode(file_get_contents($encodedUrl)));
object(stdClass)#1 (2) {
["videobysearch"]=>
array(20) {
[0]=>
object(stdClass)#2 (17) {
["id"]=>
string(7) "4126322"
["title"]=>
string(35) "استقبال از نوروز 1395"
["username"]=>
string(6) "tabaar"
...
}
...
Unicode is getting Encoded CURL PHP
First you have to identify the source website character encoding.
Choose a page and download it... using the terminal, type:
$ curl -D headers.txt -o page.html http:/www.example.com/index.html
The response headers are saved into headers.txt
while the page source html is stored into page.html
Inspect the two files with a text editor and search for Content-Type
you should find indication of the character encoding at least in one of them.
If you're not successfull you can use file
to try to "guess" the character encoding by inspecting the file contents:
$ file -I page.html
The output looks like this:page.html: text/plain; charset=iso-8859-1
Second you have to decide or understand what the destination character set is:
are you storing the web page into a text file? What is the expected character encoding of the file?
are you parsing the web page within PHP in order to fetch some data of your interest?
are you serving back the webpage (totally or partially) on your website? What is the character encoding of the website?
Finally improve your PHP script to make the proper charset conversion after the page is retrieved with
$page = curl_exec($curl);
You may use mb-convert_encoding
$page = mb_convert_encoding( $page, 'ISO-8859-1', 'UTF-8' );
// from ----------^ ^--------to
Alternatively iconv
can be used for the same purpose.
Related Topics
Why Is Object Oriented PHP with MySQLi Better Than the Procedural Approach
Use MySQL_Fetch_Array() with Foreach() Instead of While()
Setting PHP Enviromental Variable While Running Command Line Script
PHP Code Inside a Laravel 5 Blade Template
PHP MySQL Pagination with Random Ordering
Add a Custom Checkbox in Woocommerce Checkout Which Value Shows in Admin Edit Order
How to Use Session Variables in Wordpress
What Are My Options to Check for Viruses on a PHP Upload
Use of Undefined Constant Mcrypt_Rijndael_128 - Assumed 'Mcrypt_Rijndael_128'
How to Deal with "Method Not Found in Class" Warning for Magically Implemented Methods
How to Run PHP File Using Cron Jobs
Php-Font-Lib Must Either Be Installed via Composer or Copied to Lib/Php-Font-Lib
How to Remove Email Addresses and Links from a String in PHP
How to Resize and Convert an Uploaded Image to a Png Using Gd