Python Check If Website Exists

Python check if website exists

You can use HEAD request instead of GET. It will only download the header, but not the content. Then you can check the response status from the headers.

For python 2.7.x, you can use httplib:

import httplib
c = httplib.HTTPConnection('www.example.com')
c.request("HEAD", '')
if c.getresponse().status == 200:
print('web site exists')

or urllib2:

import urllib2
try:
urllib2.urlopen('http://www.example.com/some_page')
except urllib2.HTTPError, e:
print(e.code)
except urllib2.URLError, e:
print(e.args)

or for 2.7 and 3.x, you can install requests

import requests
response = requests.get('http://www.example.com')
if response.status_code == 200:
print('Web site exists')
else:
print('Web site does not exist')

Check if a Website Exists With Requests Isn't Working

You can use try/except like this:

import requests
from requests.exceptions import ConnectionError

try:
request = requests.get('http://www.example.com')
except ConnectionError:
print('Web site does not exist')
else:
print('Web site exists')

Python check if website exists for a list of websites

Try combining xrange and the string zfill method in a loop.

import requests


def test_for_200(url):
req = requests.get(url)
return req.status_code == 200


def numbers():
for n in xrange(100000):
yield str(n).zfill(5)


results = {}
for num in numbers():
url = "http://{}.com".format(num)
results[num] = test_for_200(url)

results will look something like this:

>>> results
{'00000': True, '00001': False, ...}

Check whether url exists or not without downloading the content using python

Something like the below. See HTTP head for more info.

import requests
urls = ['https://www.google.com','https://www.google.com/you_can_not_find_me']
for idx,url in enumerate(urls,1):
r = requests.head(url)
if r.status_code == 200:
print(f'{idx}) {url} was found')
else:
print(f'{idx}) {url} was NOT found')

output

1) https://www.google.com was found
2) https://www.google.com/you_can_not_find_me was NOT found

Checking if a website exist with python3

I'll throw in ideas to get you started, whole careers are built around spidering :) btw, http://www.pastaia.co seems to just be down. And that's a big part of the trick, how to handle the unexpected when crawling the web. Ready? Here we go...

import requests

filepath = 'url.txt'
with open(filepath) as fp:
for url in fp:
print(url)
try:
request = requests.get(url) #Here is where im getting the error
if request.status_code == 200:
print('Web site exists')
except:
print('Web site does not exist')
  • make it a for loop, you just want to loop the whole file right?
  • do a try and except that way if it blows up for whatever reason of which there can be lots like, bad DNS, non 200 returned, perhaps it's a .pdf page, the web is the wild wild west. This way the code won't crash and you can check the next site in the list and just record the error however you'd like.
  • you can add other kinds of conditions in there too, perhaps the page needs to be a certain length? And just because it's a response code 200 doesn't always mean the page is valid, just that the site returned success, but it's a good place to start.
  • consider adding a user-agent to your request, you may want to mimic a browser, or perhaps have your program identify itself as super bot 9000
  • if you want to get further into spidering and parsing of the text, look at using beautifulsoup: https://www.crummy.com/software/BeautifulSoup/

Check in Python if URL exists

I think the difference between the browser and the python written code is the underlying HTTP request.
The python code could not work should because the constructed HTTP request does not exactly like the one generated by browser.

Add customer headers (using the one you provided)

print requests.get(url, headers=headers).status_code

It works in my local side for url http://www.rajivbajaj.net/, to get 200.

In this example, I guess the web site has done something special to some user-agent.



Related Topics



Leave a reply



Submit