Download file from web in Python 3
If you want to obtain the contents of a web page into a variable, just read
the response of urllib.request.urlopen
:
import urllib.request
...
url = 'http://example.com/'
response = urllib.request.urlopen(url)
data = response.read() # a `bytes` object
text = data.decode('utf-8') # a `str`; this step can't be used if data is binary
The easiest way to download and save a file is to use the urllib.request.urlretrieve
function:
import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
urllib.request.urlretrieve(url, file_name)
import urllib.request
...
# Download the file from `url`, save it in a temporary directory and get the
# path to it (e.g. '/tmp/tmpb48zma.txt') in the `file_name` variable:
file_name, headers = urllib.request.urlretrieve(url)
But keep in mind that urlretrieve
is considered legacy and might become deprecated (not sure why, though).
So the most correct way to do this would be to use the urllib.request.urlopen
function to return a file-like object that represents an HTTP response and copy it to a real file using shutil.copyfileobj
.
import urllib.request
import shutil
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
If this seems too complicated, you may want to go simpler and store the whole download in a bytes
object and then write it to a file. But this works well only for small files.
import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
data = response.read() # a `bytes` object
out_file.write(data)
It is possible to extract .gz
(and maybe other formats) compressed data on the fly, but such an operation probably requires the HTTP server to support random access to the file.
import urllib.request
import gzip
...
# Read the first 64 bytes of the file inside the .gz archive located at `url`
url = 'http://example.com/something.gz'
with urllib.request.urlopen(url) as response:
with gzip.GzipFile(fileobj=response) as uncompressed:
file_header = uncompressed.read(64) # a `bytes` object
# Or do anything shown above using `uncompressed` instead of `response`.
Unable to download file from URL using python
Check this, It's worked for me.
import requests
headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'}
response = requests.get(
"https://www.cmegroup.com/content/dam/cmegroup/notices/clearing/2020/08/Chadv20-239.pdf", headers=headers)
pdf = open("Chadv20-239.pdf", 'wb')
pdf.write(response.content)
pdf.close()
How do I download a file using urllib.request in Python 3?
change
f.write(g)
to
f.write(g.read())
How to download a file over HTTP?
Use urllib.request.urlopen()
:
import urllib.request
with urllib.request.urlopen('http://www.example.com/') as f:
html = f.read().decode('utf-8')
This is the most basic way to use the library, minus any error handling. You can also do more complex stuff such as changing headers.
On Python 2, the method is in urllib2
:
import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()
Downloading a file with a URL using python
To download the file:
In [1]: import requests
In [2]: url = 'https://assets.publishing.service.gov.uk/government/uploads/syste
...: m/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.
...: ods'
In [3]: with open('COVID-19-transport-use-statistics.ods', 'wb') as out_file:
...: content = requests.get(url, stream=True).content
...: out_file.write(content)
And then you can use pandas-ods-reader to read the file by running:
pip install pandas-ods-reader
Then:
In [4]: from pandas_ods_reader import read_ods
In [5]: df = read_ods('COVID-19-transport-use-statistics.ods', 1)
In [6]: df
Out[6]:
Department for Transport statistics ... unnamed.9
0 https://www.gov.uk/government/statistics/trans... ... None
1 None ... None
2 Use of transport modes: Great Britain, since 1... ... None
3 Figures are percentages of an equivalent day o... ... None
4 None ... Percentage
.. ... ... ...
390 Transport for London Tube and Bus ... None
391 Buses (excl. London) ... None
392 Cycling ... None
393 Any other queries ... None
394 Media enquiries ... None
And you can save it as a csv if that is what you want using df.to_csv('my_data.csv', index=False)
Python3, download file from url by button click
By adding the proper headers and using session we can download and save the file using request module.
import requests
headers = {
"Host": "freemidi.org",
"Connection": "keep-alive",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
}
session = requests.Session()
#the website sets the cookies first
req1 = session.get("https://freemidi.org/getter-13560", headers = headers)
#Request again to download
req2 = session.get("https://freemidi.org/getter-13560", headers = headers)
print(len(req2.text)) # This is the size of the mdi file
with open("testFile.mid", "wb") as saveMidi:
saveMidi.write(req2.content)
Basic http file downloading and saving to disk in python?
A clean way to download a file is:
import urllib
testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")
This downloads a file from a website and names it file.gz
. This is one of my favorite solutions, from Downloading a picture via urllib and python.
This example uses the urllib
library, and it will directly retrieve the file form a source.
Related Topics
How to Check for Valid Email Address
Getting "Permission Denied" When Running Pip as Root on My MAC
Using Os.Walk() to Recursively Traverse Directories in Python
How to Jump to a Particular Line in a Huge Text File
How to Convert a Pil Image into a Numpy Array
Python Pandas Remove Duplicate Columns
Rename Multiple Files in a Directory in Python
Convert Python Sequence to Numpy Array, Filling Missing Values
Pandas: Multiple Conditions While Indexing Data Frame - Unexpected Behavior
"Fire and Forget" Python Async/Await
How to Get a String After a Specific Substring
How to Hide Console Window in Python
Non-Alphanumeric List Order from Os.Listdir()
How to Read a Text File into a List or an Array with Python
Python Max Function Using 'Key' and Lambda Expression