PHP Get Url of Redirect from Source Url

How to get final URL after following HTTP redirections in pure PHP?

/**
* get_redirect_url()
* Gets the address that the provided URL redirects to,
* or FALSE if there's no redirect.
*
* @param string $url
* @return string
*/
function get_redirect_url($url){
$redirect_url = null;

$url_parts = @parse_url($url);
if (!$url_parts) return false;
if (!isset($url_parts['host'])) return false; //can't process relative URLs
if (!isset($url_parts['path'])) $url_parts['path'] = '/';

$sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
if (!$sock) return false;

$request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n";
$request .= 'Host: ' . $url_parts['host'] . "\r\n";
$request .= "Connection: Close\r\n\r\n";
fwrite($sock, $request);
$response = '';
while(!feof($sock)) $response .= fread($sock, 8192);
fclose($sock);

if (preg_match('/^Location: (.+?)$/m', $response, $matches)){
if ( substr($matches[1], 0, 1) == "/" )
return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
else
return trim($matches[1]);

} else {
return false;
}

}

/**
* get_all_redirects()
* Follows and collects all redirects, in order, for the given URL.
*
* @param string $url
* @return array
*/
function get_all_redirects($url){
$redirects = array();
while ($newurl = get_redirect_url($url)){
if (in_array($newurl, $redirects)){
break;
}
$redirects[] = $newurl;
$url = $newurl;
}
return $redirects;
}

/**
* get_final_url()
* Gets the address that the URL ultimately leads to.
* Returns $url itself if it isn't a redirect.
*
* @param string $url
* @return string
*/
function get_final_url($url){
$redirects = get_all_redirects($url);
if (count($redirects)>0){
return array_pop($redirects);
} else {
return $url;
}
}

And, as always, give credit:

http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/

PHP get redirected link

The server is not redirecting (response is 200 not 304).

The returned HTML contains

<frameset rows="24,100%" frameborder="0">
<frame src="frame_header.php?hello=&title=" scrolling="no" />
<frame src="http://www.promptfile.com/l/DA155F60EF-FC274DCD8D" />

</frameset><noframes>http://www.promptfile.com/l/DA155F60EF-FC274DCD8D</noframes>

which would trigger an HTML browser to load that page. curl doesn't parse the returned HTML, just the HTTP headers because it's an HTTP client, not an actual HTML browser. So the second URL is not actually ever being requested (which you could verify by looking at what curl actually returned in your example code), and the effective URL is actually the one you are getting back.

Getting the source HTML from an url that redirects

This may be because the site wants to use cookies, so the website keeps redirecting because its failing to set a cookie file.

Replace this :

curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);

with :

curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_MAXREDIRS, 10);
curl_setopt($curl, CURLOPT_COOKIEJAR, 'cookie.txt');

You need the CURLOPT_COOKIEJAR option to set a cookie file.
CURLOPT_MAXREDIRS is the maximum of redirects allowed. 10 should be enough.

if it still gives you an error you can use :

if($errno = curl_errno($curl)) {
echo $errno;
}

This will show you the error code



Related Topics



Leave a reply



Submit