Source interface with Python and urllib2
Unfortunately the stack of standard library modules in use (urllib2, httplib, socket) is somewhat badly designed for the purpose -- at the key point in the operation, HTTPConnection.connect
(in httplib) delegates to socket.create_connection
, which in turn gives you no "hook" whatsoever between the creation of the socket instance sock
and the sock.connect
call, for you to insert the sock.bind
just before sock.connect
that is what you need to set the source IP (I'm evangelizing widely for NOT designing abstractions in such an airtight, excessively-encapsulated way -- I'll be speaking about that at OSCON this Thursday under the title "Zen and the Art of Abstraction Maintenance" -- but here your problem is how to deal with a stack of abstractions that WERE designed this way, sigh).
When you're facing such problems you only have two not-so-good solutions: either copy, paste and edit the misdesigned code into which you need to place a "hook" that the original designer didn't cater for; or, "monkey-patch" that code. Neither is GOOD, but both can work, so at least let's be thankful that we have such options (by using an open-source and dynamic language). In this case, I think I'd go for monkey-patching (which is bad, but copy and paste coding is even worse) -- a code fragment such as:
import socket
true_socket = socket.socket
def bound_socket(*a, **k):
sock = true_socket(*a, **k)
sock.bind((sourceIP, 0))
return sock
socket.socket = bound_socket
Depending on your exact needs (do you need all sockets to be bound to the same source IP, or...?) you could simply run this before using urllib2
normally, or (in more complex ways of course) run it at need just for those outgoing sockets you DO need to bind in a certain way (then each time restore socket.socket = true_socket
to get out of the way for future sockets yet to be created). The second alternative adds its own complications to orchestrate properly, so I'm waiting for you to clarify whether you do need such complications before explaining them all.AKX's good answer is a variant on the "copy / paste / edit" alternative so I don't need to expand much on that -- note however that it doesn't exactly reproduce socket.create_connection
in its connect
method, see the source here (at the very end of the page) and decide what other functionality of the create_connection
function you may want to embody in your copied/pasted/edited version if you decide to go that route.
What are the differences between the urllib, urllib2, urllib3 and requests module?
I know it's been said already, but I'd highly recommend the requests
Python package.
If you've used languages other than python, you're probably thinking urllib
and urllib2
are easy to use, not much code, and highly capable, that's how I used to think. But the requests
package is so unbelievably useful and short that everyone should be using it.
First, it supports a fully restful API, and is as easy as:
import requests
resp = requests.get('http://www.mywebsite.com/user')
resp = requests.post('http://www.mywebsite.com/user')
resp = requests.put('http://www.mywebsite.com/user/put')
resp = requests.delete('http://www.mywebsite.com/user/delete')
Regardless of whether GET / POST, you never have to encode parameters again, it simply takes a dictionary as an argument and is good to go:userdata = {"firstname": "John", "lastname": "Doe", "password": "jdoe123"}
resp = requests.post('http://www.mywebsite.com/user', data=userdata)
Plus it even has a built in JSON decoder (again, I know json.loads()
isn't a lot more to write, but this sure is convenient):resp.json()
Or if your response data is just text, use:resp.text
This is just the tip of the iceberg. This is the list of features from the requests site:- International Domains and URLs
- Keep-Alive & Connection Pooling
- Sessions with Cookie Persistence
- Browser-style SSL Verification
- Basic/Digest Authentication
- Elegant Key/Value Cookies
- Automatic Decompression
- Unicode Response Bodies
- Multipart File Uploads
- Connection Timeouts
- .netrc support
- List item
- Python 2.7, 3.6—3.9
- Thread-safe.
Import error: No module name urllib2
As stated in the urllib2
documentation:
So you should instead be sayingThe
urllib2
module has been split across several modules in Python 3 namedurllib.request
andurllib.error
. The2to3
tool will automatically adapt imports when converting your sources to Python 3.
from urllib.request import urlopen
html = urlopen("http://www.google.com/").read()
print(html)
Your current, now-edited code sample is incorrect because you are saying urllib.urlopen("http://www.google.com/")
instead of just urlopen("http://www.google.com/")
. Python: urllib/urllib2/httplib confusion
Focus on urllib2
for this, it works quite well. Don't mess with httplib
, it's not the top-level API.
What you're noting is that urllib2
doesn't follow the redirect.
You need to fold in an instance of HTTPRedirectHandler
that will catch and follow the redirects.
Further, you may want to subclass the default HTTPRedirectHandler
to capture information that you'll then check as part of your unit testing.
cookie_handler= urllib2.HTTPCookieProcessor( self.cookies )
redirect_handler= HTTPRedirectHandler()
opener = urllib2.build_opener(redirect_handler,cookie_handler)
You can then use this opener
object to POST and GET, handling redirects and cookies properly.You may want to add your own subclass of HTTPHandler
to capture and log various error codes, also.
Changing the IP address for urllib2
Try using the following code:
import urllib.request as urllib2
proxy = urllib2.ProxyHandler({"http": "118.69.140.108:53281"})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
page = urllib2.urlopen("http://example.com/")
Alternatively you can use the requests
library which makes it easier:import requests
url = "http://example.com/"
page = requests.get(url, proxies={"http":"118.69.140.108:53281"})
hope this helps
Related Topics
Find All Upper, Lower and Mixed Case Combinations of a String
How to Know/Change Current Directory in Python Shell
Nan Loss When Training Regression Network
Numpy Array Initialization (Fill with Identical Values)
Scale Everything on Pygame Display Surface
Hide Chromedriver Console in Python
Return List of Items in List Greater Than Some Value
Python [Errno 98] Address Already in Use
Multiprocessing:Use Tqdm to Display a Progress Bar
Errors While Building/Installing C Module for Python 2.7
How to Combine Multiple Rows into a Single Row with Pandas
How to Have Shared Log Files Under Windows
Elif' in List Comprehension Conditionals
What Is the Inverse Function of Zip in Python
Pandas - Filter Dataframe by Another Dataframe by Row Elements
Python Regex to Find a String in Double Quotes Within a String