Stream large binary files with urllib2 to file
No reason to work line by line (small chunks AND requires Python to find the line ends for you!-), just chunk it up in bigger chunks, e.g.:
# from urllib2 import urlopen # Python 2
from urllib.request import urlopen # Python 3
response = urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as f:
while True:
chunk = response.read(CHUNK)
if not chunk:
break
f.write(chunk)
Experiment a bit with various CHUNK sizes to find the "sweet spot" for your requirements.
Download binary encrypted file with urllib, keep it as stream
When you make a call to print()
, the standard Python interpreter is going to wrap the binary contents in the binary string notation (b'string contents'). The extra characters are probably messing up GPG's read of the file. You can try removing the extra characters by hand if piping is really important to you or just do a quick write in the Python:
binary_file = file_response.read()
with open('file1', 'wb') as output:
output.write(binary_file)
(I don't understand your apparent aversion to this)
edit:
You could also use the sys.stdout object:
binary_file = file_response.read()
import sys
sys.stdout.buffer.write(binary_file)
How to write a large binary file from the internet in python 3 without reading the entire file to memory?
No, urlopen
will return a file like object over a socket. Quoting:
Open a network object denoted by a URL for reading. If the URL does not
have a scheme identifier, or if it has file: as its scheme identifier,
this opens a local file (without universal newlines); otherwise it
opens a socket to a server somewhere on the network. If the connection
cannot be made the IOError exception is raised. If all went well, a
file-like object is returned. This supports the following methods:
read(), readline(), readlines(), fileno(), close(), info(), getcode() and geturl().
So since seek
method is not supported either by urlopen but not also used by copyfileobj
we can deduce that there is no need to store all the content in memory.
python + urllib2: streaming ends prematurely
This turned out to be a bug elsewhere in the program, NOT in the urllib2/zlib handling. I can recommend the pattern used in the code above if you need to handle large gzip files.
Download large file in python with requests
With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:
def download_file(url):
local_filename = url.split('/')[-1]
# NOTE the stream=True parameter below
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
# If you have chunk encoded response uncomment if
# and set chunk_size parameter to None.
#if chunk:
f.write(chunk)
return local_filename
Note that the number of bytes returned using iter_content
is not exactly the chunk_size
; it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.
See body-content-workflow and Response.iter_content for further reference.
Stream large binary files with urllib2 to file
No reason to work line by line (small chunks AND requires Python to find the line ends for you!-), just chunk it up in bigger chunks, e.g.:
# from urllib2 import urlopen # Python 2
from urllib.request import urlopen # Python 3
response = urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as f:
while True:
chunk = response.read(CHUNK)
if not chunk:
break
f.write(chunk)
Experiment a bit with various CHUNK sizes to find the "sweet spot" for your requirements.
Related Topics
How to Highlight Searched Queries in Result Page of Django Template
Using Beautiful Soup to Convert CSS Attributes to Individual HTML Attributes
Displaying Subprocess Output to Stdout and Redirecting It
SQL Join or R's Merge() Function in Numpy
Replace Characters Not Working in Python
Pandas Read_HTML Valueerror: No Tables Found
Pyqt: No Error Msg (Traceback) on Exit
Opencv Videocapture and Error: (-215:Assertion Failed) !_Src.Empty() in Function 'Cv::Cvtcolor'
Django Gunicorn Not Load Static Files
Google Fonts (Ttf) Being Ignored in Qtwebengine When Using @Font Face
Logger Configuration to Log to File and Print to Stdout
Does Python Have a Module to Convert CSS Styles to Inline Styles for Emails
What Are Some Good Python Orm Solutions
Saving Interactive Matplotlib Figures