Check If Links Are Broken in PHP

Check if links are broken in php

You can check for broken link using this function:

function check_url($url) {

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch , CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
$headers = curl_getinfo($ch);
curl_close($ch);

return $headers['http_code'];
}

You need to have CURL installed for this to work. Now you can check for broken links using:

$check_url_status = check_url($url);
if ($check_url_status == '200')
echo "Link Works";
else
echo "Broken Link";

Also check this link for HTTP status codes : HTTP Status Codes

I think you can also check for 301 and 302 status codes.

Also another method would be to use get_headers function . But this works only if your PHP version is greater than 5 :

function check_url($url) {
$headers = @get_headers( $url);
$headers = (is_array($headers)) ? implode( "\n ", $headers) : $headers;

return (bool)preg_match('#^HTTP/.*\s+[(200|301|302)]+\s#i', $headers);
}

In this case just check the output :

if (check_url($url))
echo "Link Works";
else
echo "Broken Link";

Hope this helps you :).

Check link works and if not visually identify it as broken

As the sites you want to check are created by different people, there is unlikely to be a one-liner to detect if a link is broken or not over a vast number of sites.

I suggest that you create a simple function for each site that detects if the link is broken for that particular site. When you want to check a link, you would decide which function to run on the external site's HTML based on the domain name.

You can use parse_url() to extract the domain/host from the file links:

// Get your url from the database. Here I'll just set it:
$file_url_from_database = 'http://example.com/link/to/file?var=1&hello=world#file'

$parsed_link = parse_url($file_url_from_database);
$domain = $parsed_link['host']; // $domain now equals 'example.com'

You could store the function names in an associative array and call them that way:

function check_domain_com(){ ... }
function check_example_com(){ ... }

$link_checkers = array();
$link_checkers['domain.com'] = 'check_domain_com';
$link_checkers['example.com'] = 'check_example_com';

or store the functions in the array (PHP >=5.3).

$link_checkers = array();
$link_checkers['domain.com'] = function(){ ... };
$link_checkers['example.com'] = function(){ ... };

and call these with

if(isset($link_checkers[$domain]))
// call the function stored under the index 'example.com'
call_user_func($link_checkers[$domain]);
else
throw( new Exception("I don't know how to check the domain $domain") );

Alternatively you could just use a bunch of if statements

if($domain == 'domain.com')
check_domain_com();
else if($domain == 'example.com')
check_example_com(); // this function is called

The functions could return a boolean (true or false; 0 or 1) to use, or call another function themselves if needed (for example to add an extra CSS class to broken links).

I did something similar recently, though I was fetching metadata for stock photography from multiple sites. I used an abstract class because I had a few functions to run for each site.

As a side note, it would be wise to store the last checked date in your database and limit the checking rate to something like 24 or 48 hours (or further apart depending on your needs).


Edit to clarify implementation a little:

As making HTTP requests to other websites is potentially very slow, you will want to check and update link statuses independently of page loads. You could achieve this like this:

  • A script could run every 12 hours and check all links from the database that were last checked more than 24 hours ago. For each 'old' link, it would update the active and last_checked columns in your database appropriately.
  • When someone requests a page, your script would read from the active column in your database instead of downloading the remote page to check every time.
  • (extra thought) When a new link is submitted, it is checked immediately in the script, or added to a queue to be checked by the server as soon as possible.

As people can easily click a link to check it's current state, it would be redundant to allow them to click a button to check from your page (nothing against the idea though).

Note that the potentially resource-heavy update-all script should not be executable (accessible) via web.

How do I check for valid (not dead) links programmatically using PHP?

Use the PHP cURL extension. Unlike fopen() it can also make HTTP HEAD requests which are sufficient to check the availability of a URL and save you a ton of bandwith as you don't have to download the entire body of the page to check.

As a starting point you could use some function like this:

function is_available($url, $timeout = 30) {
$ch = curl_init(); // get cURL handle

// set cURL options
$opts = array(CURLOPT_RETURNTRANSFER => true, // do not output to browser
CURLOPT_URL => $url, // set URL
CURLOPT_NOBODY => true, // do a HEAD request only
CURLOPT_TIMEOUT => $timeout); // set timeout
curl_setopt_array($ch, $opts);

curl_exec($ch); // do it!

$retval = curl_getinfo($ch, CURLINFO_HTTP_CODE) == 200; // check if HTTP OK

curl_close($ch); // close handle

return $retval;
}

However, there's a ton of possible optimizations: You might want to re-use the cURL instance and, if checking more than one URL per host, even re-use the connection.

Oh, and this code does check strictly for HTTP response code 200. It does not follow redirects (302) -- but there also is a cURL-option for that.

How to check if link is downloadable file in php?

Using CURL for remote file

function checkRemoteFile($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
// don't download content
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
if(curl_exec($ch)!==FALSE)
{
return true;
}
else
{
return false;
}
}

EDIT: I may have misunderstood you but if you just want to check if the url actually exists than the code below will be all you need.

function url_exists($url) {
if(@file_get_contents($url,0,NULL,0,1))
{return 1;}
else
{return 0;}
}

Check if a link is working and hide/disable if it's broken using HTML/JavaScript

Due your application backend is Django, I would recommend you using Python to check whether an http based url is available or not.

According to @T. J. Crowder suggested, track of the links/questions to implement the mothod checkUrlAvailable(url) with a URL as the argument.

I've found the documentation of Built-in template tags and filters showing how to apply filters in Django template.

All in all, we can combine all together like the following:

<td width="100%"><center>
{% if checkUrlAvailable(server) is True %}
<a href="//{{ server.ServerName }}.ilo.lab.dba.co.il"> http://{{ server.ServerName }}.ilo.lab.dba.co.il </a>
{% else %}
http://{{ server.ServerName }}.ilo.lab.dba.co.il
{% endif %}
</center></td>

The problem I am having is to check an non-existing server/url with Python. I will update this issue soon once it would be resolved.

To write checkUrlAvailable in Django template, take a look at the documentation. The idea is to build a customised filter to check the existence of a given URL. The pseudo code could be looked like the following but I argue to dive into these answers:

import requests

def checkUrlAvailable(url):
resp = requests.head(url)
if resp.status_code == 200:
return True
else
return False

I just borrowed the snippet code from stackoverflow's link.

(to be updated soon)

How to find broken links on a website

For Chrome Extension there is hexometer

See LinkChecker for Firefox.

For Mac OS there is a tool Integrity which can check URLs for broken links.

For Windows there is Xenu's Link Sleuth.

PHP-JSON : Check broken links

OK, After doing test and trials for 8 hours. I finally got this working. Thanks a lot to Vytautas. He teached me a lot. Mostly how to debug.

For anyone who wants to check broken links using JSON + PHP + CURL :

  1. First of all, Check if you have curl installed and enabled in your server.
  2. Those who don't understand curl : If there is a response from your url, there will be a status code (like 200 or 404). If the url entered is blank, invalid or anything like that, It will return status code 0
  3. If you can't get the proper response from php page, use FireBug (Console Tab) to check the headers and responses. Also use breakpoint to see if variables are passing correctly.

Here is the php code :

function url_exists($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_NOBODY, true);

if(curl_exec($ch) === false) // These 2 line here are for debugging.
die('Curl error: ' . curl_error($ch));

$retcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
return $retcode;
}
$response = array(
'status' => url_exists($_GET['location'])
);
echo json_encode($response)

I was doing 2 thing wrong in php. I should have used $_GET['location'] instead of $location And the other was $response instead of using a second variable.

And the js function :

function UrlExistsNew(url, callback) {
$.getJSON('checklink.php', { location: url }, function ( data ) {
callback.call( null, data.status );
});
}

Another thing I was doing wrong in js was passing callback to the function. I should have used callback.call instead of callback.apply

Simple usage :

UrlExistsNew($(this).val(), function(status){
if(status === 404) $(element).css('background-color','#FC0');
});


Related Topics



Leave a reply



Submit