Download large file in python with requests
With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:
def download_file(url):
local_filename = url.split('/')[-1]
# NOTE the stream=True parameter below
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
# If you have chunk encoded response uncomment if
# and set chunk_size parameter to None.
#if chunk:
f.write(chunk)
return local_filename
Note that the number of bytes returned using iter_content
is not exactly the chunk_size
; it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.
See body-content-workflow and Response.iter_content for further reference.
problems downloading large files with requests?
Thanks to SilentGhost on IRC#python who pointed out to this suggesting I should upgrade my requests, which solved it(from 2.22.0 to 2.24.0).
upgrading the package is done like this:
pip install requests --upgrade
Another source that may help someone looking at this question is to use pycurl, here is a good starting point: https://github.com/rajatkhanduja/PyCurl-Downloader
or/and you can use --libcurl to your curl command to get a good indication on how to use pycurl
Save a large file using the Python requests library
Oddly enough, requests doesn't have anything simple for this. You'll have to iterate over the response and write those chunks to a file:
response = requests.get('http://www.example.com/image.jpg', stream=True)
# Throw an error for bad status codes
response.raise_for_status()
with open('output.jpg', 'wb') as handle:
for block in response.iter_content(1024):
handle.write(block)
I usually just use urllib.urlretrieve()
. It works, but if you need to use a session or some sort of authentication, the above code works as well.
How to download a file using Python requests, when that file is being served with redirect?
- Pass
cookies={"PHPSESSID": "3r7ql7poiparp92ia7ltv8nai5"}
instead ofheaders={"cookie": "PHPSESSID=3r7ql7poiparp92ia7ltv8nai5"}
.
This is because therequests
library doesheaders.pop('Cookie', None)
upon redirect. - Retry if
resp.url
is notf"https://www.fadedpage.com/books/{bookID}/{fileType}.php"
.
This is because the server first redirectslink.php
with a differentbookID
toshowbook.php
. - A download of
downloadFile("20170817", "html")
contains the text"The First Part of this book is intended for pupils"
, not"woodland slope behind St. Pierre-les-Bains"
that is contained in a download ofdownloadFile("20130603", "html")
.
def downloadFile(bookID, fileType, retry=1):
cookies = {"PHPSESSID": "3r7ql7poiparp92ia7ltv8nai5"}
url = f"https://www.fadedpage.com/link.php?file={bookID}.{fileType}"
print("Getting ", url)
with requests.get(url, cookies=cookies) as resp:
if resp.url != f"https://www.fadedpage.com/books/{bookID}/{fileType}.php":
if retry:
return downloadFile(bookID, fileType, retry=retry-1)
else:
raise Exception
with open(f"{bookID}.{fileType}", 'wb') as f:
f.write(resp.content)
def isValidDownload(bookID, fileType="html"):
"""
A download of `downloadFile("20170817", "html")` should produce
a file 20170817.html which contains the text "The First Part of
this book is intended for pupils". If it doesn't, it isn't getting
the full text file.
"""
with open(f"{bookID}.{fileType}") as f:
raw = f.read()
test = ""
if bookID == "20130603":
test = "woodland slope behind St. Pierre-les-Bains"
if bookID == "20170817":
test = "The First Part of this book is intended for pupils"
return test in raw
Streaming download large file with python-requests interrupting
There might be several issues that will cause download to be interrupted. Network issues, etc. But we know the file size before we start the download to check if you have downloaded the whole file, you can do this using urllib:
site = urllib.urlopen("http://python.org")
meta = site.info()
print meta.getheaders("Content-Length")
Using requests:
r = requests.get("http://python.org")
r.headers["Content-Length"]
Related Topics
Converting String "Jun 1 2005 1:33Pm" into Datetime
Difference Between _Str_ and _Repr_
How to Unnest (Explode) a Column in a Pandas Dataframe, into Multiple Rows
Get the Data Received in a Flask Request
How to Pandas Group-By to Get Sum
What Is the Meaning of Single and Double Underscore Before an Object Name
What Does "List Comprehension" and Similar Mean? How Does It Work and How to Use It
Importing Files from Different Folder
Indentationerror: Unindent Does Not Match Any Outer Indentation Level
What Do _Init_ and Self Do in Python
What Is the Python "With" Statement Designed For
What Does "Syntaxerror: Missing Parentheses in Call to 'Print'" Mean in Python
Converting Datetime.Date to Utc Timestamp in Python
What Is Wrong With Using a Bare 'Except'
How to Access Object Attribute Given String Corresponding to Name of That Attribute