Python-Requests close http connection
As discussed here, there really isn't such a thing as an HTTP connection and what httplib refers to as the HTTPConnection is really the underlying TCP connection which doesn't really know much about your requests at all. Requests abstracts that away and you won't ever see it.
The newest version of Requests does in fact keep the TCP connection alive after your request.. If you do want your TCP connections to close, you can just configure the requests to not use keep-alive.
s = requests.session()
s.config['keep_alive'] = False
How to close requests.Session()?
In requests
's source code, Session.close
only close all underlying Adapter
. And further closing a Adapter
is clearing underlying PoolManager
. Then all the
established connections inside this PoolManager
will be closed. But PoolManager
will create a fresh connection if there is no usable connection.
Critical code:
# requests.Session
def close(self):
"""Closes all adapters and as such the session"""
for v in self.adapters.values():
v.close()
# requests.adapters.HTTPAdapter
def close(self):
"""Disposes of any internal state.
Currently, this closes the PoolManager and any active ProxyManager,
which closes any pooled connections.
"""
self.poolmanager.clear()
for proxy in self.proxy_manager.values():
proxy.clear()
# urllib3.poolmanager.PoolManager
def connection_from_pool_key(self, pool_key, request_context=None):
"""
Get a :class:`ConnectionPool` based on the provided pool key.
``pool_key`` should be a namedtuple that only contains immutable
objects. At a minimum it must have the ``scheme``, ``host``, and
``port`` fields.
"""
with self.pools.lock:
# If the scheme, host, or port doesn't match existing open
# connections, open a new ConnectionPool.
pool = self.pools.get(pool_key)
if pool:
return pool
# Make a fresh ConnectionPool of the desired type
scheme = request_context['scheme']
host = request_context['host']
port = request_context['port']
pool = self._new_pool(scheme, host, port, request_context=request_context)
self.pools[pool_key] = pool
return pool
So if I understand its structure well, when you close a Session
, you are almost the same as creating a new Session
and assign it to old one. So you can still use it to send request.
Or if I misunderstand anything, welcome to correct me :D
Closing python requests connection
Yes, there is a call to a session.close
behind the get
code. If using a proper IDE like PyCharm for example, you can follow the get
code to see what is happening. Inside get
there is a call to request:
return request('get', url, params=params, **kwargs)
Within the definition of that request
method, the call to session.close
is made.
By following the link here to the requests repo, there is a call being made for the session control:
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
Python 3 requests how to force use a new connection for each request?
I guess that your problem isn't server related. Probably servers are behaving correctly and the problem are the threads.
Considering the code from the related question, if it is up to date, when PAUSE
is set to true, which happens during 50% of the time when first argv argument is set to 1
, dozens of threads are created every second (actually num_connections
threads, the (pressed - lastpressed).total_seconds() > 0.5
and self.paused = not self.paused
logic makes a new batch start every second). In linux you would check this with tp -H -p $pid
or watch ps -T -p $pid
or watch ls /proc/$pid/task/
- you are probably using windows and there are the windows ways to check this.
Each batch of connections are correct when considered in isolation, the connection range is being correctly set on the headers. By sniffing yourself you'll see that they are just fine. The problem arrises when new batches of threads arrive doing the same work. You get a lot of threads downloading similar ranges in different batches giving you the same data. Since your writing logic is relative not absolute, if two threads gives you the same 123th chunk your self.position += len(chunk)
will increase for both similar chunks, which can be a reason you get your over-100%.
To test whether what I said happens, just try to download an ever increasing file and check if your file being saved does not suffer from this double increases:
0000000000 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 03 ................
0000000010 00 00 00 04 00 00 00 05 00 00 00 06 00 00 00 07 ................
0000000020 00 00 00 08 00 00 00 09 00 00 00 0a 00 00 00 0b ................
0000000030 00 00 00 0c 00 00 00 0d 00 00 00 0e 00 00 00 0f ................
Or simulate one file range server yourself by doing something similar to this:
#!/usr/bin/env python3
from http.server import BaseHTTPRequestHandler, HTTPServer
import time
hostname = "localhost"
serverport = 8081
filesizemegabytes = 8#.25
filesizebytes = int(filesizemegabytes*1024*1024)
filesizebytestr = str(filesizebytes)
class Server(BaseHTTPRequestHandler):
def do_GET(self):
self.do(True)
def do_HEAD(self):
self.do(False)
def do(self,writebody=True):
rangestr = self.headers.get('range')
if type(rangestr) is str and rangestr.startswith('bytes='):
self.send_response(206)
rangestr = rangestr[6:]
rangeint = tuple(int(i) for i in rangestr.split('-'))
self.send_header('Content-Range', 'bytes '+rangestr+'/'+filesizebytestr)
else:
self.send_response(200)
rangeint = (0,filesizebytes)
self.send_header('Content-type', 'application/octet-stream')
self.send_header('Accept-Ranges', 'bytes')
self.send_header('Content-Length', rangeint[1]-rangeint[0])
self.end_headers()
if writebody:
for i in range(rangeint[0],rangeint[1]):
self.wfile.write(i.to_bytes(4, byteorder='big'))
if __name__ == '__main__':
serverinstance = HTTPServer((hostname, serverport), server)
print("Server started http://%s:%s" % (hostname, serverport))
try:
serverinstance.serve_forever()
except KeyboardInterrupt:
pass
serverinstance.server_close()
Considerations about resource usage
You don't need multithreading for multidownloads. "Green" threads are enough since you don't need more than one CPU, you just need to wait for IO. Instead of multithread
+requests
, a more suitable solution would be asyncio
+aiohttp
(aiohttp
once requests
is not very well designed for async
, altough you will find some adaptations in the wild).
Lastly, keep-alives
are useful when you are planning to reconnect again, which seems to be your case. Are you source and origin IPs:ports the same? You are trying to force close
of connections, but once you realize the problem are not the servers, reanalyze your situation and see whether it is not better to keep-alive
connections.
Related Topics
Pandas Datetime to Unix Timestamp Seconds
Principal Component Analysis (Pca) in Python
Inheritance of Private and Protected Methods in Python
Python: Download a File from an Ftp Server
In Selenium Web Driver How to Choose the Correct Iframe
Tkinter: Using Scrollbars on a Canvas
Most Efficient Property to Hash for Numpy Array
Yield in List Comprehensions and Generator Expressions
Re.Findall Not Returning Full Match
Python Script Returns Unintended "None" After Execution of a Function
Pil Thumbnail Is Rotating My Image
How to Dump a Dict to a JSON File
Setting Stacksize in a Python Script
Runtimeerror: Main Thread Is Not in Main Loop