Python: HTTP Post a large file with streaming
Reading through the mailing list thread linked to by systempuntoout, I found a clue towards the solution.
The mmap
module allows you to open file that acts like a string. Parts of the file are loaded into memory on demand.
Here's the code I'm using now:
import urllib2
import mmap
# Open the file as a memory mapped string. Looks like a string, but
# actually accesses the file behind the scenes.
f = open('somelargefile.zip','rb')
mmapped_file_as_string = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
# Do the request
request = urllib2.Request(url, mmapped_file_as_string)
request.add_header("Content-Type", "application/zip")
response = urllib2.urlopen(request)
#close everything
mmapped_file_as_string.close()
f.close()
How to upload large files using POST method in Python?
When you pass files
arg then requests lib makes a multipart form upload. i.e. it is like submitting a form, where the file is passed as a named field (file
in your example)
I suspect the problem you saw is because when you pass a file object as data
arg, as suggested in the docs here https://requests.readthedocs.io/en/latest/user/advanced/#streaming-uploads then it does a streaming upload but the file content is used as the whole http post body.
So I think the server at the other end is expecting a form with a file
field, but we're just sending the binary content of the file by itself.
What we need is some way to wrap the content of the file with the right "envelope" as we send it to the server, so that it can recognise the data we are sending.
See this issue where others have noted the same problem: https://github.com/psf/requests/issues/1584
I think the best suggestion from there is to use this additional lib, which provides streaming multipart form file upload: https://github.com/requests/toolbelt#multipartform-data-encoder
For example:
from requests_toolbelt import MultipartEncoder
import requests
encoder = MultipartEncoder(
fields={'file': ('myfilename.xyz', open(path, 'rb'), 'text/plain')}
)
response = requests.post(
url, data=encoder, headers={'Content-Type': encoder.content_type}
)
Download large file in python with requests
With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:
def download_file(url):
local_filename = url.split('/')[-1]
# NOTE the stream=True parameter below
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
# If you have chunk encoded response uncomment if
# and set chunk_size parameter to None.
#if chunk:
f.write(chunk)
return local_filename
Note that the number of bytes returned using iter_content
is not exactly the chunk_size
; it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.
See body-content-workflow and Response.iter_content for further reference.
Streaming POST a large file to CherryPy by Python client
If it's CherryPy specific upload you can skip multipart/form-data
encoding obstacles and just send streaming POST body of file contents.
client
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import urllib2
import io
import os
class FileLenIO(io.FileIO):
def __init__(self, name, mode = 'r', closefd = True):
io.FileIO.__init__(self, name, mode, closefd)
self.__size = statinfo = os.stat(name).st_size
def __len__(self):
return self.__size
f = FileLenIO('/home/user/Videos/video.mp4', 'rb')
request = urllib2.Request('http://127.0.0.1:8080/upload', f)
request.add_header('Content-Type', 'application/octet-stream')
# you can add custom header with filename if you need it
response = urllib2.urlopen(request)
print response.read()
server
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import tempfile
import shutil
import cherrypy
config = {
'global' : {
'server.socket_host' : '127.0.0.1',
'server.socket_port' : 8080,
'server.thread_pool' : 8,
# remove any limit on the request body size; cherrypy's default is 100MB
'server.max_request_body_size' : 0,
# increase server socket timeout to 60s; cherrypy's defult is 10s
'server.socket_timeout' : 60
}
}
class App:
@cherrypy.config(**{'response.timeout': 3600}) # default is 300s
@cherrypy.expose()
def upload(self):
'''Handle non-multipart upload'''
destination = os.path.join('/home/user/test-upload')
with open(destination, 'wb') as f:
shutil.copyfileobj(cherrypy.request.body, f)
return 'Okay'
if __name__ == '__main__':
cherrypy.quickstart(App(), '/', config)
Tested on 1.3GiB video file. Server-side memory consumption is under 10MiB, client's under 5MiB.
Streaming download large file with python-requests interrupting
There might be several issues that will cause download to be interrupted. Network issues, etc. But we know the file size before we start the download to check if you have downloaded the whole file, you can do this using urllib:
site = urllib.urlopen("http://python.org")
meta = site.info()
print meta.getheaders("Content-Length")
Using requests:
r = requests.get("http://python.org")
r.headers["Content-Length"]
Related Topics
Differencebetween Pylab and Pyplot
How to Install Python 3.X and 2.X on the Same Windows Computer
Automating Pydrive Verification Process
Comparing Boolean and Int Using Isinstance
In Selenium Web Driver How to Choose the Correct Iframe
How to Access Function Variables in Another Function
Check List of Words in Another String
Using Colormaps to Set Color of Line in Matplotlib
Typeerror: Module._Init_() Takes at Most 2 Arguments (3 Given)
Strange Behavior of Lists in Python
Removing Index Column in Pandas When Reading a CSV
Add Sum of Values of Two Lists into New List
Why Do "Not a Number" Values Equal True When Cast as Boolean in Python/Numpy
How to Make Python Requests Work via Socks Proxy