Why Does File_Get_Contents Work with Google.Com But Not with My Site

file_get_contents gets different file from google than shown in browser

Nothing really. The problem is that the page uses javascript and ajax to get contents. So, in order to get a "snapshot" of the page, you need to "run it". That is, you need to parse the javascript code, which php doesn't do.

Your best bet is to use an headless browser such as phantomjs. If you search, you find some tutorials explaining how to do it

NOTE

If all you're looking for is a way to retrieve raw data from the search, you might want to try to use google's search api.

file_get_contents script works with some websites but not others

$html = file_get_html('http://google.com/');
$title = $html->find('title')->innertext;

Or if you prefer with preg_match and you should be really using cURL instead of fgc...

function curl($url){

$headers[] = "User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13";
$headers[] = "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$headers[] = "Accept-Language:en-us,en;q=0.5";
$headers[] = "Accept-Encoding:gzip,deflate";
$headers[] = "Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$headers[] = "Keep-Alive:115";
$headers[] = "Connection:keep-alive";
$headers[] = "Cache-Control:max-age=0";

$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_ENCODING, "gzip");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($curl);
curl_close($curl);
return $data;

}

$data = curl('http://www.google.com');
$regex = '#<title>(.*?)</title>#mis';
preg_match($regex,$data,$match);
var_dump($match);
echo $match[1];

file_get_contents is not working for some url

URL which is not retrieved by file_get_contents, because their server checks whether the request come from browser or any script. If they found request from script they simply disable page contents.

So that I have to make a request similar as browser request. So I have used following code to get 2nd url contents. It might be different for different web server. Because they might keep different checks.

Even though why dont you try to use following code! If you are lucky this might work for you!!

function getUrlContent($url) {
fopen("cookies.txt", "w");
$parts = parse_url($url);
$host = $parts['host'];
$ch = curl_init();
$header = array('GET /1575051 HTTP/1.1',
"Host: {$host}",
'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language:en-US,en;q=0.8',
'Cache-Control:max-age=0',
'Connection:keep-alive',
'Host:adfoc.us',
'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36',
);

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);

curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}

$url = "http://adfoc.us/1575051";
$html = getUrlContent($url);

Thanks everyone for the guidance.

file_get_contents does not work with protocolless urls?

you can't use file_get_contents with //google.com because what it's actually doing is file:///google.com when you do this in your web browser it's actually using the current protocol you're currently on. So if you had https://mywebsite.com and you linked to something as //google.com it would actually do is https://google.com. That being said you need to do file_get_contents('http://google.com');



Related Topics



Leave a reply



Submit