Python check if website exists
You can use HEAD request instead of GET. It will only download the header, but not the content. Then you can check the response status from the headers.
For python 2.7.x, you can use httplib
:
import httplib
c = httplib.HTTPConnection('www.example.com')
c.request("HEAD", '')
if c.getresponse().status == 200:
print('web site exists')
or urllib2
:
import urllib2
try:
urllib2.urlopen('http://www.example.com/some_page')
except urllib2.HTTPError, e:
print(e.code)
except urllib2.URLError, e:
print(e.args)
or for 2.7 and 3.x, you can install requests
import requests
response = requests.get('http://www.example.com')
if response.status_code == 200:
print('Web site exists')
else:
print('Web site does not exist')
Check if a Website Exists With Requests Isn't Working
You can use try/except like this:
import requests
from requests.exceptions import ConnectionError
try:
request = requests.get('http://www.example.com')
except ConnectionError:
print('Web site does not exist')
else:
print('Web site exists')
Python check if website exists for a list of websites
Try combining xrange
and the string zfill
method in a loop.
import requests
def test_for_200(url):
req = requests.get(url)
return req.status_code == 200
def numbers():
for n in xrange(100000):
yield str(n).zfill(5)
results = {}
for num in numbers():
url = "http://{}.com".format(num)
results[num] = test_for_200(url)
results
will look something like this:
>>> results
{'00000': True, '00001': False, ...}
Check whether url exists or not without downloading the content using python
Something like the below. See HTTP head
for more info.
import requests
urls = ['https://www.google.com','https://www.google.com/you_can_not_find_me']
for idx,url in enumerate(urls,1):
r = requests.head(url)
if r.status_code == 200:
print(f'{idx}) {url} was found')
else:
print(f'{idx}) {url} was NOT found')
output
1) https://www.google.com was found
2) https://www.google.com/you_can_not_find_me was NOT found
Checking if a website exist with python3
I'll throw in ideas to get you started, whole careers are built around spidering :) btw, http://www.pastaia.co seems to just be down. And that's a big part of the trick, how to handle the unexpected when crawling the web. Ready? Here we go...
import requests
filepath = 'url.txt'
with open(filepath) as fp:
for url in fp:
print(url)
try:
request = requests.get(url) #Here is where im getting the error
if request.status_code == 200:
print('Web site exists')
except:
print('Web site does not exist')
- make it a
for
loop, you just want to loop the whole file right? - do a
try
andexcept
that way if it blows up for whatever reason of which there can be lots like, badDNS
, non200
returned, perhaps it's a.pdf
page, the web is the wild wild west. This way the code won't crash and you can check the next site in the list and just record the error however you'd like. - you can add other kinds of conditions in there too, perhaps the page needs to be a certain length? And just because it's a
response code
200
doesn't always mean the page is valid, just that the site returnedsuccess
, but it's a good place to start. - consider adding a
user-agent
to your request, you may want to mimic a browser, or perhaps have your program identify itself assuper bot 9000
- if you want to get further into spidering and parsing of the text, look at using
beautifulsoup
: https://www.crummy.com/software/BeautifulSoup/
Check in Python if URL exists
I think the difference between the browser and the python written code is the underlying HTTP request.
The python code could not work should because the constructed HTTP request does not exactly like the one generated by browser.
Add customer headers (using the one you provided)
print requests.get(url, headers=headers).status_code
It works in my local side for url http://www.rajivbajaj.net/, to get 200.
In this example, I guess the web site has done something special to some user-agent.
Related Topics
How to Print Colored Text to the Terminal
How to Downgrade Tensorflow, Multiple Versions Possible
Most Efficient Way to Forward-Fill Nan Values in Numpy Array
Pyqt: Getting Widgets to Resize Automatically in a Qdialog
Convert Timedelta to Floating-Point
How to Do an Upsert With Sqlalchemy
Print 5 Items in a Row on Separate Lines for a List
Python Print First N Lines of String
How to Serialize Sqlalchemy Result to Json
Valueerror: Invalid \Escape Unable to Load Json from File
How to Wait Until I Receive Data Using a Python Socket
How to Extract Data from Dictionary in the List
Inserting a Python Datetime.Datetime Object into MySQL
Python SQL Select With Possible Null Values
Pandas Open_Excel() Fails With Xlrd.Biffh.Xlrderror: Can't Find Workbook in Ole2 Compound Document