Stream Large Binary Files with Urllib2 to File

Stream large binary files with urllib2 to file

No reason to work line by line (small chunks AND requires Python to find the line ends for you!-), just chunk it up in bigger chunks, e.g.:

# from urllib2 import urlopen # Python 2
from urllib.request import urlopen # Python 3

response = urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as f:
while True:
chunk = response.read(CHUNK)
if not chunk:
break
f.write(chunk)

Experiment a bit with various CHUNK sizes to find the "sweet spot" for your requirements.

Download binary encrypted file with urllib, keep it as stream

When you make a call to print(), the standard Python interpreter is going to wrap the binary contents in the binary string notation (b'string contents'). The extra characters are probably messing up GPG's read of the file. You can try removing the extra characters by hand if piping is really important to you or just do a quick write in the Python:

binary_file = file_response.read()
with open('file1', 'wb') as output:
output.write(binary_file)

(I don't understand your apparent aversion to this)

edit:
You could also use the sys.stdout object:

binary_file = file_response.read()
import sys
sys.stdout.buffer.write(binary_file)

How to write a large binary file from the internet in python 3 without reading the entire file to memory?

No, urlopen will return a file like object over a socket. Quoting:

Open a network object denoted by a URL for reading. If the URL does not
have a scheme identifier, or if it has file: as its scheme identifier,
this opens a local file (without universal newlines); otherwise it
opens a socket to a server somewhere on the network. If the connection
cannot be made the IOError exception is raised. If all went well, a
file-like object is returned. This supports the following methods:
read(), readline(), readlines(), fileno(), close(), info(), getcode() and geturl().

So since seek method is not supported either by urlopen but not also used by copyfileobj we can deduce that there is no need to store all the content in memory.

python + urllib2: streaming ends prematurely

This turned out to be a bug elsewhere in the program, NOT in the urllib2/zlib handling. I can recommend the pattern used in the code above if you need to handle large gzip files.

Download large file in python with requests

With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:

def download_file(url):
local_filename = url.split('/')[-1]
# NOTE the stream=True parameter below
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
# If you have chunk encoded response uncomment if
# and set chunk_size parameter to None.
#if chunk:
f.write(chunk)
return local_filename

Note that the number of bytes returned using iter_content is not exactly the chunk_size; it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.

See body-content-workflow and Response.iter_content for further reference.

Stream large binary files with urllib2 to file

No reason to work line by line (small chunks AND requires Python to find the line ends for you!-), just chunk it up in bigger chunks, e.g.:

# from urllib2 import urlopen # Python 2
from urllib.request import urlopen # Python 3

response = urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as f:
while True:
chunk = response.read(CHUNK)
if not chunk:
break
f.write(chunk)

Experiment a bit with various CHUNK sizes to find the "sweet spot" for your requirements.



Related Topics



Leave a reply



Submit