Download a file from HTTPS using download.file()
It might be easiest to try the RCurl package. Install the package and try the following:
# install.packages("RCurl")
library(RCurl)
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
## Or
## x <- getURL(URL, ssl.verifypeer = FALSE)
out <- read.csv(textConnection(x))
head(out[1:6])
# RT SERIALNO DIVISION PUMA REGION ST
# 1 H 186 8 700 4 16
# 2 H 306 8 700 4 16
# 3 H 395 8 100 4 16
# 4 H 506 8 700 4 16
# 5 H 835 8 800 4 16
# 6 H 989 8 700 4 16
dim(out)
# [1] 6496 188
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv",destfile="reviews.csv",method="libcurl")
Trying to Download File via https
To download a file from a non protected url do something like:
import requests
url = 'http://somewebsite.org'
user, password = 'bob', 'I love cats'
resp = requests.get(url, auth=(user, password))
with open("result.zip", "wb") as fout:
fout.write(resp.content)
If course you should check whether you got a valid response before writing the zip file.
For a considerable amount of websites with login following recipe will work:
However if asite.com uses too much javascript, this might not necessarily work.
Use a requests session in order to store any session cookies and perform following three steps.
- GET the login url. This will get potential session cookies or CSRF protection cookies
- POST to the login url with the username and password. the name of the forms to be posted depend on the page. Use your web browser in debug mode to learn about the right values that you have to post, this can be more parameters than username and password
- List item
- GET the document url and save the result to a file.
On Firefox for example you go to the website you want to login, you press F12 (for debug mode), click on the network tab and then on reload.
You might
Fill in the login form and submit and look in the debug panel for a POST request.
The generic python code would look like.
import requests
def login_and_download():
ses = requests.session()
# Step 1 get the login page
rslt = ses.get("https://www.asite.com/login-home")
# now any potentially required cookie will be set
if rslt.status_code != 200:
print("failed getting login page")
return False
# for simple pages you can procedd to login
# for a little more complicated pages you might have to parse the
# HTML
# for really annoying pages that use loads of javascript it might be
# even more complicated
# Step 2 perform a post request to login
login_post_url = # This depends on the site you want to connect to. you have analyze the login
# procedure
rslt = ses.post(login_post_url)
if rslt.status_code != 200:
print("failed logging in")
return False
# Step 3 download the url, that you want to get.
rslt = ses.get(url_of_your_document)
if rslt.status_code != 200:
print("failed fetching the file")
return False
with open("result.zip", "wb") as fout:
fout.write(resp.content)
Unable to download file from URL using python
Check this, It's worked for me.
import requests
headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'}
response = requests.get(
"https://www.cmegroup.com/content/dam/cmegroup/notices/clearing/2020/08/Chadv20-239.pdf", headers=headers)
pdf = open("Chadv20-239.pdf", 'wb')
pdf.write(response.content)
pdf.close()
download.file works only over https
I'm posting my comment above as answer.
Please refer to this bug report : http://github.com/joshuaulrich/quantmod/issues/83
It seems there is an issue with curl.
If wget is working fine, you can follow the same advice and try the R command options(download.file.method="wget")
to make quantmod download using wget instead of curl.
Download File Using JavaScript/jQuery
Use an invisible <iframe>
:
<iframe id="my_iframe" style="display:none;"></iframe>
<script>
function Download(url) {
document.getElementById('my_iframe').src = url;
};
</script>
To force the browser to download a file it would otherwise be capable of rendering (such as HTML or text files), you need the server to set the file's MIME Type to a nonsensical value, such as application/x-please-download-me
or alternatively application/octet-stream
, which is used for arbitrary binary data.
If you only want to open it in a new tab, the only way to do this is for the user to a click on a link with its target
attribute set to _blank
.
In jQuery:
$('a#someID').attr({target: '_blank',
href : 'http://localhost/directory/file.pdf'});
Whenever that link is clicked, it will download the file in a new tab/window.
How to download a file over HTTP?
Use urllib.request.urlopen()
:
import urllib.request
with urllib.request.urlopen('http://www.example.com/') as f:
html = f.read().decode('utf-8')
This is the most basic way to use the library, minus any error handling. You can also do more complex stuff such as changing headers.
On Python 2, the method is in urllib2
:
import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()
Related Topics
Connecting Across Missing Values with Geom_Line
Subset Based on Variable Column Name
R Error "Sum Not Meaningful for Factors"
Dplyr/R Cumulative Sum with Reset
Select First Element of Nested List
Is There a Built-In Way to Do a Logarithmic Color Scale in Ggplot2
Getting the Last N Elements of a Vector. Is There a Better Way Than Using the Length() Function
Alternative to R's 'Memory.Size()' in Linux
Transforming a Time-Series into a Data Frame and Back
Predict.Lm() with an Unknown Factor Level in Test Data
Displaying a PDF from a Local Drive in Shiny
How to Change the First Row to Be the Header in R
Split Dataframe by Levels of a Factor and Name Dataframes by Those Levels