Get_Headers Inconsistency

get_headers Inconsistency

The problem is nothing to do with the length of the domain name, it is simply whether the domain exists.

You are using a DNS service that resolves non-existent domains to a server that gives you a "friendly" error page, which it returns with a 200 response code. This means it is also not a problem with get_headers() specifically, it is any procedure with an underlying reliance on sensible DNS lookups.

A way to handle this without hardcoding a work around for every environment you work in might look something like this:

// A domain that definitely does not exist. The easiest way to guarantee that
// this continues to work is to use an illegal top-level domain (TLD) suffix
$testDomain = 'idontexist.tld';

// If this resolves to an IP, we know that we are behind a service such as this
// We can simply compare the actual domain we test with the result of this
$badIP = gethostbyname($testDomain);

// Then when you want to get_headers()
$url = 'http://www.domainnnnnnnnnnnnnnnnnnnnnnnnnnnn.com/CraxyFile.jpg';

$host = parse_url($url, PHP_URL_HOST);
if (gethostbyname($host) === $badIP) {
// The domain does not exist - probably handle this as if it were a 404
} else {
// do the actual get_headers() stuff here
}

You may want to somehow cache the return value of the first call to gethostbyname(), since you know you are looking up a name that does not exist, and this can often take a few seconds.

How to catch error when get_headers() tries to resolve unresolvable host?

The script shouldn't stop running as the message produced is just a warning. I tested this script myself and that's the behaviour I see. You can see in the documentation that get_headers() will return FALSE on failure so your condition should actually be

if ($headers === FALSE) {
// nevermind, just move on to the next URL in the list...

php get_headers exception handling

NEW VERSION

Again this's not the right answer of the question, but avoiding the error, can give the expected result.

get_headers do an HTTP GET without Agent, so moneysupermarket.com don't like it, so use ini_set to set the default user agent in request, and all work well:

ini_set("user_agent","Mozilla custom agent");
$headers = get_headers("http://www.moneysupermarket.com/mortgages/", 1);

PREVIOUS

Apparently moneysupermarket.com reset a connection if request is not well formatted, do the request using cUrl (take from curl man page):

// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.moneysupermarket.com/mortgages/");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla custom agent");

// grab URL and pass it to the browser
curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);

php get_headers location

$headers = get_headers('http://www.google.com',1);
echo $headers["Location"];

None of the cases match the result of get_headers()

The last three headers are with http/1.0. So no case will be matched if it is a 1.1 server and it doesn't return 200.

Maybe you should try:

$h = get_headers($url, 1);
$h = explode(' ', $h[0]);
$responseCode = $h[1];

switch ($responseCode) {
case '200':
// ...
}


Related Topics



Leave a reply



Submit