how to detect search engine bots with php?
Here's a Search Engine Directory of Spider names
Then you use $_SERVER['HTTP_USER_AGENT'];
to check if the agent is said spider.
if(strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
{
// what to do
}
How can one detect a crawler / spider using PHP?
According to Verifying Googlebot:
You can verify that a bot accessing your server really is Googlebot (or another Google user-agent) by using a reverse DNS lookup, verifying that the name is in the googlebot.com domain, and then doing a forward DNS lookup using that googlebot name. This is useful if you're concerned that spammers or other troublemakers are accessing your site while claiming to be Googlebot.
For example:
host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer
crawl-66-249-66-1.googlebot.com.
host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1
Google doesn't post a public list of IP addresses for webmasters to whitelist. This is because these IP address ranges can change, causing problems for any webmasters who have hard coded them. The best way to identify accesses by Googlebot is to use the user-agent (Googlebot).
You can do a reverse DNS lookup:
function validateGoogleBotIP($ip) {
$hostname = gethostbyaddr($ip); //"crawl-66-249-66-1.googlebot.com"
return preg_match('/\.google(bot)?\.com$/i', $hostname);
}
if (strpos($_SERVER['HTTP_USER_AGENT'], 'Google') !== false) {
if (validateGoogleBotIP($_SERVER['REMOTE_ADDR'])) {
echo 'It is ACTUALLY google';
} else {
echo 'Someone\'s faking it!';
}
} else {
echo 'Nothing to do with Google';
}
How to recognize bots with php?
You should filter by user-agent strings. You can find a list of about 300 common user-agents given by bots here: http://www.robotstxt.org/db.html Running through that list and ignoring bot user-agents before you run your SQL statement should solve your problem for all practical purposes.
If you don't want the search engines to even reach the page, use a basic robots.txt file to block them.
how to detect search engine visites on my site? like phpBB
You can go by either IP addresses or the 'User-Agent' string that the bot or web browser sends you.
When Googlebot (or most other well-behaving robots) visit your website, they'll send you a $_SERVER['HTTP_USER_AGENT'] variable which identifies what they are. Some examples are:
Googlebot/2.1 (+http://www.google.com/bot.html)
NutchCVS/0.8-dev (Nutch; http://lucene.apache.org/nutch/bot.html
Baiduspider+(+http://www.baidu.com/search/spider_jp.html)
Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/531.4 (KHTML, like Gecko)
You can find many more examples at these websites:
link text
link text
You could then use PHP to examine those user-agent strings and determine if the user is a search engine or not. I use something like this often:
$searchengines = array(
'Googlebot',
'Slurp',
'search.msn.com',
'nutch',
'simpy',
'bot',
'ASPSeek',
'crawler',
'msnbot',
'Libwww-perl',
'FAST',
'Baidu',
);
$is_se = false;
foreach ($searchengines as $searchengine){
if (!empty($_SERVER['HTTP_USER_AGENT']) and
false !== strpos(strtolower($_SERVER['HTTP_USER_AGENT']), strtolower($searchengine)))
{
$is_se = true;
break;
}
}
if ($is_se) { print('Its a search engine!'); }
Remember that no detection method (Google Analytics or another statistics package or otherwise) is going to be 100% accurate. Some web browsers allow you to set a custom user-agent string, and some misbehaving web crawlers may not send a user-agent string at all. This method can be probably effective for 95%+ of crawlers/visitors though.
Detect if a page is visited by a bot
Well, after some digging inside the Google I found this.
$agent = strpos(strtolower($_SERVER['HTTP_USER_AGENT']));
foreach($bots as $name => $bot)
{
if(stripos($agent,$bot)!==false)
{
return true;
}
else {
return false;
}
}
Thanks for the support Dale!!
Related Topics
PHP - How to Create a Newline Character
How Create an Array from the Output of an Array Printed With Print_R
How to Post Pictures to Instagram Using API
How to Create a Simple 'Hello World' Module in Magento
Limit Keyword on MySQL With Prepared Statement
Issue Reading Http Request Body from a Json Post in PHP
Why Is Facebook PHP Sdk Getuser Always Returning 0
Encrypting/Decrypting File With Mcrypt
How to Retract a Salted Password from the Database and Auth User
Using PHP 5.5'S Password_Hash and Password_Verify Function
How to Have a 64-Bit Integer in PHP
Get All Permutations of a PHP Array
Full Secure Image Upload Script
Getting Title and Meta Tags from External Website