Python: Http Post a Large File with Streaming

Python: HTTP Post a large file with streaming

Reading through the mailing list thread linked to by systempuntoout, I found a clue towards the solution.

The mmap module allows you to open file that acts like a string. Parts of the file are loaded into memory on demand.

Here's the code I'm using now:

import urllib2
import mmap

# Open the file as a memory mapped string. Looks like a string, but
# actually accesses the file behind the scenes.
f = open('somelargefile.zip','rb')
mmapped_file_as_string = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

# Do the request
request = urllib2.Request(url, mmapped_file_as_string)
request.add_header("Content-Type", "application/zip")
response = urllib2.urlopen(request)

#close everything
mmapped_file_as_string.close()
f.close()

How to upload large files using POST method in Python?

When you pass files arg then requests lib makes a multipart form upload. i.e. it is like submitting a form, where the file is passed as a named field (file in your example)

I suspect the problem you saw is because when you pass a file object as data arg, as suggested in the docs here https://requests.readthedocs.io/en/latest/user/advanced/#streaming-uploads then it does a streaming upload but the file content is used as the whole http post body.

So I think the server at the other end is expecting a form with a file field, but we're just sending the binary content of the file by itself.

What we need is some way to wrap the content of the file with the right "envelope" as we send it to the server, so that it can recognise the data we are sending.

See this issue where others have noted the same problem: https://github.com/psf/requests/issues/1584

I think the best suggestion from there is to use this additional lib, which provides streaming multipart form file upload: https://github.com/requests/toolbelt#multipartform-data-encoder

For example:

from requests_toolbelt import MultipartEncoder
import requests

encoder = MultipartEncoder(
fields={'file': ('myfilename.xyz', open(path, 'rb'), 'text/plain')}
)
response = requests.post(
url, data=encoder, headers={'Content-Type': encoder.content_type}
)

Download large file in python with requests

With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:

def download_file(url):
local_filename = url.split('/')[-1]
# NOTE the stream=True parameter below
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
# If you have chunk encoded response uncomment if
# and set chunk_size parameter to None.
#if chunk:
f.write(chunk)
return local_filename

Note that the number of bytes returned using iter_content is not exactly the chunk_size; it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.

See body-content-workflow and Response.iter_content for further reference.

Streaming POST a large file to CherryPy by Python client

If it's CherryPy specific upload you can skip multipart/form-data encoding obstacles and just send streaming POST body of file contents.

client

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import urllib2
import io
import os

class FileLenIO(io.FileIO):

def __init__(self, name, mode = 'r', closefd = True):
io.FileIO.__init__(self, name, mode, closefd)

self.__size = statinfo = os.stat(name).st_size

def __len__(self):
return self.__size

f = FileLenIO('/home/user/Videos/video.mp4', 'rb')
request = urllib2.Request('http://127.0.0.1:8080/upload', f)
request.add_header('Content-Type', 'application/octet-stream')
# you can add custom header with filename if you need it
response = urllib2.urlopen(request)

print response.read()

server

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import tempfile
import shutil

import cherrypy

config = {
'global' : {
'server.socket_host' : '127.0.0.1',
'server.socket_port' : 8080,
'server.thread_pool' : 8,
# remove any limit on the request body size; cherrypy's default is 100MB
'server.max_request_body_size' : 0,
# increase server socket timeout to 60s; cherrypy's defult is 10s
'server.socket_timeout' : 60
}
}

class App:

@cherrypy.config(**{'response.timeout': 3600}) # default is 300s
@cherrypy.expose()
def upload(self):
'''Handle non-multipart upload'''

destination = os.path.join('/home/user/test-upload')
with open(destination, 'wb') as f:
shutil.copyfileobj(cherrypy.request.body, f)

return 'Okay'

if __name__ == '__main__':
cherrypy.quickstart(App(), '/', config)

Tested on 1.3GiB video file. Server-side memory consumption is under 10MiB, client's under 5MiB.

Streaming download large file with python-requests interrupting

There might be several issues that will cause download to be interrupted. Network issues, etc. But we know the file size before we start the download to check if you have downloaded the whole file, you can do this using urllib:

site = urllib.urlopen("http://python.org")
meta = site.info()
print meta.getheaders("Content-Length")

Using requests:

r = requests.get("http://python.org")
r.headers["Content-Length"]


Related Topics



Leave a reply



Submit