How to Open a Website with Urllib via Proxy in Python

How can I open a website with urllib via proxy in Python?

By default, urlopen uses the environment variable http_proxy to determine which HTTP proxy to use:

$ export http_proxy='http://myproxy.example.com:1234'
$ python myscript.py # Using http://myproxy.example.com:1234 as a proxy

If you instead want to specify a proxy inside your application, you can give a proxies argument to urlopen:

proxies = {'http': 'http://myproxy.example.com:1234'}
print("Using HTTP proxy %s" % proxies['http'])
urllib.urlopen("http://www.google.com", proxies=proxies)

Edit: If I understand your comments correctly, you want to try several proxies and print each proxy as you try it. How about something like this?

candidate_proxies = ['http://proxy1.example.com:1234',
'http://proxy2.example.com:1234',
'http://proxy3.example.com:1234']
for proxy in candidate_proxies:
print("Trying HTTP proxy %s" % proxy)
try:
result = urllib.urlopen("http://www.google.com", proxies={'http': proxy})
print("Got URL using proxy %s" % proxy)
break
except:
print("Trying next proxy in 5 seconds")
time.sleep(5)

Setting proxy to urllib.request (Python3)

You should be calling set_proxy() on an instance of class Request, not on the class itself:

from urllib import request as urlrequest

proxy_host = 'localhost:1234' # host and port of your proxy
url = 'http://www.httpbin.org/ip'

req = urlrequest.Request(url)
req.set_proxy(proxy_host, 'http')

response = urlrequest.urlopen(req)
print(response.read().decode('utf8'))

unable to access website with urllib and proxy

Should you be using http as the protocol, not socks? Thus:

proxyhand = urllib.request.ProxyHandler({"http" : "http://localhost:5678"})

urllib.request.urlretrieve with proxy?

You need to use your proxy-object, not just instanciate it (you created an object, but didn't assign it to a variable and therefore can't use it). Try using this pattern:

#create the object, assign it to a variable
proxy = urllib.request.ProxyHandler({'http': '127.0.0.1'})
# construct a new opener using your proxy settings
opener = urllib.request.build_opener(proxy)
# install the openen on the module-level
urllib.request.install_opener(opener)
# make a request
urllib.request.urlretrieve('http://www.google.com')

Or, if you do not need to rely on the std-lib, use requests (this code is from the official documentation):

import requests

proxies = {"http": "http://10.10.1.10:3128",
"https": "http://10.10.1.10:1080"}

requests.get("http://example.org", proxies=proxies)

Unable to use https proxy within urllib.request

While we were testing the proxes, there was unusual traffic from your computer network for Google services and that was the reason of response error, because whatismyipaddress uses Google's services. But the issue was not affect other sites like stackoverflow.

from urllib import request
from bs4 import BeautifulSoup

url = 'https://whatismyipaddress.com/proxy-check'

proxies = {
# 'https': 'https://167.172.229.86:8080',
# 'https': 'https://51.91.137.248:3128',
'https': 'https://118.70.144.77:3128',
}

user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
headers = {
'User-Agent': user_agent,
'accept-language': 'ru,en-US;q=0.9,en;q=0.8,tr;q=0.7'
}

proxy_support = request.ProxyHandler(proxies)
opener = request.build_opener(proxy_support)
# opener.addheaders = [('User-Agent', user_agent)]
request.install_opener(opener)

req = request.Request(url, headers=headers)
try:
response = request.urlopen(req).read()
soup = BeautifulSoup(response, "html5lib")
ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
print(ip_addr)
except Exception as e:
print(e)

Using urllib.request returns Proxy Auto-Config file

import urllib.request

req = urllib.request.Request('http://www.espncricinfo.com/', data=None, headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
)

proxy_support = urllib.request.ProxyHandler({'http': 'ip:port'})
opener = urllib.request.build_opener(proxy_support)
# make opener object the global default opener.
urllib.request.install_opener(opener)

f = urllib.request.urlopen(req)

g = open('writing.txt','w')
g.write(f.read().decode('utf-8'))
g.close

Scraping web-page data with urllib with headers and proxy

From the documentation

urllib will auto-detect your proxy settings and use those. This is through the ProxyHandler, which is part of the normal handler chain when a proxy setting is detected. Normally that’s a good thing, but there are occasions when it may not be helpful. One way to do this is to setup our own ProxyHandler, with no proxies defined. This is done using similar steps to setting up a Basic Authentication handle.

Check this, https://docs.python.org/3/howto/urllib2.html#proxies

Proxy with urllib2

proxy = urllib2.ProxyHandler({'http': '127.0.0.1'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')


Related Topics



Leave a reply



Submit