Decode gzipped web page retrieved via cURL in PHP
I use curl and:
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
php curl response show gzip or encoded data
In header i allowed gzip and deflate only and removed br and it worked for me. So instead of this $header[] = 'Accept-Encoding: gzip, deflate, br';
i used $header[] = 'Accept-Encoding: gzip, deflate';
Thanks for help every one.
How to decode Content-Encoding: gzip, gzip using curl?
You can decode it by trimming off the headers and using gzinflate.
$url = "http://www.dealstan.com"
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING, "gzip");
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
$return = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
$return = gzinflate(substr($return, 10));
print_r($return);
Requesting a GZIP'ed page and processing with cURL and PHP
You can request a gzipped encoding with curl_setopt, like this:
curl_setopt($curl, CURLOPT_ENCODING, 'gzip');
You can then decompress the content with gzdecode like this:
$response = gzdecode($response);
decode curl response gzip multipart attachment in PHP
It appears that the solution is really simple but didn't think about it before
Once I extracted the decoded attachment, all I needed is:
$xml_string = gzdecode($decoded_attachment);
and the result is the expected XML attachment
curl php not getting response
try to remove this $headers[]='Accept-Encoding:gzip,deflate,br';
How to find out in PHP, if the output will be gzipped by Apache?
After searching and trying out a bit, I found out several things.
1) There might be gzip compression which is executed via the zlib library. This can be deactivated on runtime:
ini_set('zlib.output_compression', false);
2) There might additionally be gzipping applied via an Apache module. It is not possible to see, wether or not this is going to happen after code execution, but there is a pretty reliable way to break it:
header("Content-Encoding: none");
This is not standard compliant, but it forces Apache to think that the provided content can possibly not be compressed. So it won't jump in.
There might be a lot of other situations (like nginx, or another gzipping extension, and so on), but in most of the cases, this combination of tricks will do the trick:
// disable zlib
ini_set('zlib.output_compression', false);
// Force termination of all instantiated buffers
while (@ob_end_flush());
// prevent apache from gzipping
header("Content-Encoding: none");
// prevent the browsers from showing a cached version before showing the new one
header('Cache-Control: no-cache');
// Start the output to enable buffering
header('Content-Type: text/html; charset=utf-8' );
// Push the beginning of the page to the browser
ob_flush();
flush();
// Do stuff here.
Hope this helps anyone...
How to parse response compressed using GZip in php?
quote "As you see, the response is .gz format", No it is not. the server SAYS Content-Type:application/x-gzip , but this wrong! it is an XML file, with the name "markets_20151127T065210.gz"
quote "how to parse that response compressed using GZip in php script and echo xml format? "
your xml is in $xml by line 4 here:
<?php
$ch=hhb_curl_init();
$xml=hhb_curl_exec2($ch,'http://services.eoddsmaker.net/demo/feeds/V1.0/markets.ashx?l=1&bid=43&sid=50&cid=58&lid=10&u=kwaninmacau&p=kwaninmacau',$headers,$cookies,$requeststring);
var_dump('headers:',$headers,'cookies:',$cookies,'requeststring:',$requeststring,'xml:',$xml);
function hhb_curl_init($custom_options_array = array())
{
if (empty($custom_options_array)) {
$custom_options_array = array();
//i feel kinda bad about this.. argv[1] of curl_init wants a string(url), or NULL
//at least i want to allow NULL aswell :/
}
if (!is_array($custom_options_array)) {
throw new InvalidArgumentException('$custom_options_array must be an array!');
}
;
$options_array = array(
CURLOPT_AUTOREFERER => true,
CURLOPT_BINARYTRANSFER => true,
CURLOPT_COOKIESESSION => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_FORBID_REUSE => false,
CURLOPT_HTTPGET => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_CONNECTTIMEOUT => 10,
CURLOPT_TIMEOUT => 11,
CURLOPT_ENCODING => ""
//CURLOPT_REFERER=>'example.org',
//CURLOPT_USERAGENT=>'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0'
);
if (!array_key_exists(CURLOPT_COOKIEFILE, $custom_options_array)) {
//do this only conditionally because tmpfile() call..
static $curl_cookiefiles_arr = array(); //workaround for https://bugs.php.net/bug.php?id=66014
$curl_cookiefiles_arr[] = $options_array[CURLOPT_COOKIEFILE] = tmpfile();
$options_array[CURLOPT_COOKIEFILE] = stream_get_meta_data($options_array[CURLOPT_COOKIEFILE]);
$options_array[CURLOPT_COOKIEFILE] = $options_array[CURLOPT_COOKIEFILE]['uri'];
}
//we can't use array_merge() because of how it handles integer-keys, it would/could cause corruption
foreach ($custom_options_array as $key => $val) {
$options_array[$key] = $val;
}
unset($key, $val, $custom_options_array);
$curl = curl_init();
if($curl===false){
throw new RuntimeException('could not create a curl handle! curl_init() returned false');
}
if(false===curl_setopt_array($curl, $options_array)){
$errno=curl_errno($curl);
$error=curl_error($curl);
throw new RuntimeException('could not set options on curl! curl_setopt_array returned false. curl_errno :'.$curl_errno.'. curl_error: '.$curl_error);
}
return $curl;
}
function hhb_curl_exec($ch, $url)
{
static $hhb_curl_domainCache = "";//warning, this will not work properly with 2 different curl's visiting 2 different sites.
//should probably use SplObjectStorage here, so each curl can have its own cache..
//$hhb_curl_domainCache=&$this->hhb_curl_domainCache;
//$ch=&$this->curlh;
if (!is_resource($ch) || get_resource_type($ch) !== 'curl') {
throw new InvalidArgumentException('$ch must be a curl handle!');
}
if (!is_string($url)) {
throw new InvalidArgumentException('$url must be a string!');
}
$tmpvar = "";
if (parse_url($url, PHP_URL_HOST) === null) {
if (substr($url, 0, 1) !== '/') {
$url = $hhb_curl_domainCache . '/' . $url;
} else {
$url = $hhb_curl_domainCache . $url;
}
}
;
if(false===curl_setopt($ch, CURLOPT_URL, $url)){
$errno=curl_errno($curl);
$error=curl_error($curl);
throw new RuntimeException('could not set CURLOPT_URL on curl! curl_setopt returned false. curl_errno :'.$curl_errno.'. curl_error: '.$curl_error.'. url: '.var_export($url,true));
}
$html = curl_exec($ch);
if (curl_errno($ch)) {
throw new Exception('Curl error (curl_errno=' . curl_errno($ch) . ') on url ' . var_export($url, true) . ': ' . curl_error($ch));
// echo 'Curl error: ' . curl_error($ch);
}
if ($html === '' && 203 != ($tmpvar = curl_getinfo($ch, CURLINFO_HTTP_CODE)) /*203 is "success, but no output"..*/ ) {
throw new Exception('Curl returned nothing for ' . var_export($url, true) . ' but HTTP_RESPONSE_CODE was ' . var_export($tmpvar, true));
}
;
//remember that curl (usually) auto-follows the "Location: " http redirects..
$hhb_curl_domainCache = parse_url(curl_getinfo($ch, CURLINFO_EFFECTIVE_URL), PHP_URL_HOST);
return $html;
}
function hhb_curl_exec2($ch, $url, &$returnHeaders = array(), &$returnCookies = array(), &$verboseDebugInfo = "")
{
$returnHeaders = array();
$returnCookies = array();
$verboseDebugInfo = "";
if (!is_resource($ch) || get_resource_type($ch) !== 'curl') {
throw new InvalidArgumentException('$ch must be a curl handle!');
}
if (!is_string($url)) {
throw new InvalidArgumentException('$url must be a string!');
}
$verbosefileh = tmpfile();
if($verbosefileh===false){
throw new RuntimeException('can not create a tmpfile for curl\'s stderr. tmpfile returned false');
}
$verbosefile = stream_get_meta_data($verbosefileh);
$verbosefile = $verbosefile['uri'];
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_STDERR, $verbosefileh);
curl_setopt($ch, CURLOPT_HEADER, 1);
$html = hhb_curl_exec($ch, $url);
$verboseDebugInfo = file_get_contents($verbosefile);
curl_setopt($ch, CURLOPT_STDERR, NULL);
fclose($verbosefileh);
unset($verbosefile, $verbosefileh);
$headers = array();
$crlf = "\x0d\x0a";
$thepos = strpos($html, $crlf . $crlf, 0);
$headersString = substr($html, 0, $thepos);
$headerArr = explode($crlf, $headersString);
$returnHeaders = $headerArr;
unset($headersString, $headerArr);
$htmlBody = substr($html, $thepos + 4); //should work on utf8/ascii headers... utf32? not so sure..
unset($html);
//I REALLY HOPE THERE EXIST A BETTER WAY TO GET COOKIES.. good grief this looks ugly..
//at least it's tested and seems to work perfectly...
$grabCookieName = function($str,&$len)
{
$len=0;
$ret = "";
$i = 0;
for ($i = 0; $i < strlen($str); ++$i) {
++$len;
if ($str[$i] === ' ') {
continue;
}
if ($str[$i] === '=') {
--$len;
break;
}
$ret .= $str[$i];
}
return urldecode($ret);
};
foreach ($returnHeaders as $header) {
//Set-Cookie: crlfcoookielol=crlf+is%0D%0A+and+newline+is+%0D%0A+and+semicolon+is%3B+and+not+sure+what+else
/*Set-Cookie:ci_spill=a%3A4%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%22305d3d67b8016ca9661c3b032d4319df%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A14%3A%2285.164.158.128%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A109%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F43.0.2357.132+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1436874639%3B%7Dcab1dd09f4eca466660e8a767856d013; expires=Tue, 14-Jul-2015 13:50:39 GMT; path=/
Set-Cookie: sessionToken=abc123; Expires=Wed, 09 Jun 2021 10:18:14 GMT;
//Cookie names cannot contain any of the following '=,; \t\r\n\013\014'
//
*/
if (stripos($header, "Set-Cookie:") !== 0) {
continue;
/**/
}
$header = trim(substr($header, strlen("Set-Cookie:")));
$len=0;
while (strlen($header) > 0) {
$cookiename = $grabCookieName($header,$len);
$returnCookies[$cookiename] = '';
$header = substr($header, $len + 1); //also remove the =
if (strlen($header) < 1) {
break;
}
;
$thepos = strpos($header, ';');
if ($thepos === false) { //last cookie in this Set-Cookie.
$returnCookies[$cookiename] = urldecode($header);
break;
}
$returnCookies[$cookiename] = urldecode(substr($header, 0, $thepos));
$header = trim(substr($header, $thepos + 1)); //also remove the ;
}
}
unset($header, $cookiename, $thepos);
return $htmlBody;
}
Is there any way to get curl to decompress a response without sending the Accept headers in the request?
Probably the easiest thing to do is just use gunzip
to do it:
curl -sH 'Accept-encoding: gzip' http://example.com/ | gunzip -
Or there's also --compressed
, which curl
will decompress (I believe) since it knows the response is compressed. But, not sure if that meets your needs.
Related Topics
How to Get Page Content Using Curl
PHP Short Hash Like Url-Shortening Websites
Send Post Request Using Volley and Receive in PHP
Change Div Content Using Ajax, PHP and Jquery
How to Compile a Blade Template from a String
How to Use PHPexcel to Read Data and Insert into Database
How to Get Last Key in an Array
Characters Allowed in PHP Array Keys
Strtotime With Different Languages
Curl Error: Recv Failure: Connection Reset by Peer - PHP Curl
Set Cookie Wih Js, Read With PHP Problem
PHP How to Start an External Program Running - Having Trouble With System and Exec
Multidimensional Array Iteration
How to Delete All Cookies of My Website in PHP
Find Out How PHP Is Running on Server (Cgi or Fastcgi or Mod_PHP)
How Exactly Do Regular Expression Word Boundaries Work in PHP