How can I open a website with urllib via proxy in Python?
By default, urlopen
uses the environment variable http_proxy
to determine which HTTP proxy to use:
$ export http_proxy='http://myproxy.example.com:1234'
$ python myscript.py # Using http://myproxy.example.com:1234 as a proxy
If you instead want to specify a proxy inside your application, you can give a proxies
argument to urlopen
:
proxies = {'http': 'http://myproxy.example.com:1234'}
print("Using HTTP proxy %s" % proxies['http'])
urllib.urlopen("http://www.google.com", proxies=proxies)
Edit: If I understand your comments correctly, you want to try several proxies and print each proxy as you try it. How about something like this?
candidate_proxies = ['http://proxy1.example.com:1234',
'http://proxy2.example.com:1234',
'http://proxy3.example.com:1234']
for proxy in candidate_proxies:
print("Trying HTTP proxy %s" % proxy)
try:
result = urllib.urlopen("http://www.google.com", proxies={'http': proxy})
print("Got URL using proxy %s" % proxy)
break
except:
print("Trying next proxy in 5 seconds")
time.sleep(5)
Setting proxy to urllib.request (Python3)
You should be calling set_proxy()
on an instance of class Request
, not on the class itself:
from urllib import request as urlrequest
proxy_host = 'localhost:1234' # host and port of your proxy
url = 'http://www.httpbin.org/ip'
req = urlrequest.Request(url)
req.set_proxy(proxy_host, 'http')
response = urlrequest.urlopen(req)
print(response.read().decode('utf8'))
unable to access website with urllib and proxy
Should you be using http
as the protocol, not socks
? Thus:
proxyhand = urllib.request.ProxyHandler({"http" : "http://localhost:5678"})
urllib.request.urlretrieve with proxy?
You need to use your proxy-object, not just instanciate it (you created an object, but didn't assign it to a variable and therefore can't use it). Try using this pattern:
#create the object, assign it to a variable
proxy = urllib.request.ProxyHandler({'http': '127.0.0.1'})
# construct a new opener using your proxy settings
opener = urllib.request.build_opener(proxy)
# install the openen on the module-level
urllib.request.install_opener(opener)
# make a request
urllib.request.urlretrieve('http://www.google.com')
Or, if you do not need to rely on the std-lib, use requests (this code is from the official documentation):
import requests
proxies = {"http": "http://10.10.1.10:3128",
"https": "http://10.10.1.10:1080"}
requests.get("http://example.org", proxies=proxies)
Unable to use https proxy within urllib.request
While we were testing the proxes, there was unusual traffic from your computer network for Google services and that was the reason of response error, because whatismyipaddress uses Google's services. But the issue was not affect other sites like stackoverflow.
from urllib import request
from bs4 import BeautifulSoup
url = 'https://whatismyipaddress.com/proxy-check'
proxies = {
# 'https': 'https://167.172.229.86:8080',
# 'https': 'https://51.91.137.248:3128',
'https': 'https://118.70.144.77:3128',
}
user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
headers = {
'User-Agent': user_agent,
'accept-language': 'ru,en-US;q=0.9,en;q=0.8,tr;q=0.7'
}
proxy_support = request.ProxyHandler(proxies)
opener = request.build_opener(proxy_support)
# opener.addheaders = [('User-Agent', user_agent)]
request.install_opener(opener)
req = request.Request(url, headers=headers)
try:
response = request.urlopen(req).read()
soup = BeautifulSoup(response, "html5lib")
ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
print(ip_addr)
except Exception as e:
print(e)
Using urllib.request returns Proxy Auto-Config file
import urllib.request
req = urllib.request.Request('http://www.espncricinfo.com/', data=None, headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
)
proxy_support = urllib.request.ProxyHandler({'http': 'ip:port'})
opener = urllib.request.build_opener(proxy_support)
# make opener object the global default opener.
urllib.request.install_opener(opener)
f = urllib.request.urlopen(req)
g = open('writing.txt','w')
g.write(f.read().decode('utf-8'))
g.close
Scraping web-page data with urllib with headers and proxy
From the documentation
urllib will auto-detect your proxy settings and use those. This is through the ProxyHandler, which is part of the normal handler chain when a proxy setting is detected. Normally that’s a good thing, but there are occasions when it may not be helpful. One way to do this is to setup our own ProxyHandler, with no proxies defined. This is done using similar steps to setting up a Basic Authentication handle.
Check this, https://docs.python.org/3/howto/urllib2.html#proxies
Proxy with urllib2
proxy = urllib2.ProxyHandler({'http': '127.0.0.1'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')
Related Topics
How to Find First Non-Zero Value in Every Column of a Numpy Array
Having Trouble Making a List of Lists of a Designated Size
How to Sort a List by Length of String Followed by Alphabetical Order
Intersection of Two Lists Including Duplicates
Converting List of Tuples into a Dictionary
Pandas Dataframe Stack Multiple Column Values into Single Column
Case-Insensitive List Sorting, Without Lowercasing the Result
Python - How to Convert JSON File to Dataframe
Modular Multiplicative Inverse Function in Python
Efficient Way to Add Spaces Between Characters in a String
How to Get Tkinter Canvas to Dynamically Resize to Window Width
Django Query That Get Most Recent Objects from Different Categories
Python Super() Raises Typeerror
Python/Beautifulsoup - How to Remove All Tags from an Element
How to Extract Text and Text Coordinates from a PDF File