Download a File from Https Using Download.File()

Download a file from HTTPS using download.file()

It might be easiest to try the RCurl package. Install the package and try the following:

# install.packages("RCurl")
library(RCurl)
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
## Or
## x <- getURL(URL, ssl.verifypeer = FALSE)
out <- read.csv(textConnection(x))
head(out[1:6])
# RT SERIALNO DIVISION PUMA REGION ST
# 1 H 186 8 700 4 16
# 2 H 306 8 700 4 16
# 3 H 395 8 100 4 16
# 4 H 506 8 700 4 16
# 5 H 835 8 800 4 16
# 6 H 989 8 700 4 16
dim(out)
# [1] 6496 188

download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv",destfile="reviews.csv",method="libcurl")

Trying to Download File via https

To download a file from a non protected url do something like:

import requests
url = 'http://somewebsite.org'
user, password = 'bob', 'I love cats'
resp = requests.get(url, auth=(user, password))
with open("result.zip", "wb") as fout:
fout.write(resp.content)

If course you should check whether you got a valid response before writing the zip file.

For a considerable amount of websites with login following recipe will work:
However if asite.com uses too much javascript, this might not necessarily work.

Use a requests session in order to store any session cookies and perform following three steps.

  1. GET the login url. This will get potential session cookies or CSRF protection cookies
  2. POST to the login url with the username and password. the name of the forms to be posted depend on the page. Use your web browser in debug mode to learn about the right values that you have to post, this can be more parameters than username and password
  3. List item
  4. GET the document url and save the result to a file.

On Firefox for example you go to the website you want to login, you press F12 (for debug mode), click on the network tab and then on reload.
You might

Fill in the login form and submit and look in the debug panel for a POST request.

The generic python code would look like.
import requests

def login_and_download():
ses = requests.session()

# Step 1 get the login page
rslt = ses.get("https://www.asite.com/login-home")
# now any potentially required cookie will be set

if rslt.status_code != 200:
print("failed getting login page")
return False

# for simple pages you can procedd to login
# for a little more complicated pages you might have to parse the
# HTML
# for really annoying pages that use loads of javascript it might be
# even more complicated


# Step 2 perform a post request to login
login_post_url = # This depends on the site you want to connect to. you have analyze the login
# procedure
rslt = ses.post(login_post_url)

if rslt.status_code != 200:
print("failed logging in")
return False

# Step 3 download the url, that you want to get.
rslt = ses.get(url_of_your_document)
if rslt.status_code != 200:
print("failed fetching the file")
return False
with open("result.zip", "wb") as fout:
fout.write(resp.content)

Unable to download file from URL using python

Check this, It's worked for me.

import requests
headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'}
response = requests.get(
"https://www.cmegroup.com/content/dam/cmegroup/notices/clearing/2020/08/Chadv20-239.pdf", headers=headers)
pdf = open("Chadv20-239.pdf", 'wb')
pdf.write(response.content)
pdf.close()

download.file works only over https

I'm posting my comment above as answer.

Please refer to this bug report : http://github.com/joshuaulrich/quantmod/issues/83

It seems there is an issue with curl.

If wget is working fine, you can follow the same advice and try the R command options(download.file.method="wget") to make quantmod download using wget instead of curl.

Download File Using JavaScript/jQuery

Use an invisible <iframe>:

<iframe id="my_iframe" style="display:none;"></iframe>
<script>
function Download(url) {
document.getElementById('my_iframe').src = url;
};
</script>

To force the browser to download a file it would otherwise be capable of rendering (such as HTML or text files), you need the server to set the file's MIME Type to a nonsensical value, such as application/x-please-download-me or alternatively application/octet-stream, which is used for arbitrary binary data.

If you only want to open it in a new tab, the only way to do this is for the user to a click on a link with its target attribute set to _blank.

In jQuery:

$('a#someID').attr({target: '_blank', 
href : 'http://localhost/directory/file.pdf'});

Whenever that link is clicked, it will download the file in a new tab/window.

How to download a file over HTTP?

Use urllib.request.urlopen():

import urllib.request
with urllib.request.urlopen('http://www.example.com/') as f:
html = f.read().decode('utf-8')

This is the most basic way to use the library, minus any error handling. You can also do more complex stuff such as changing headers.

On Python 2, the method is in urllib2:

import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()


Related Topics



Leave a reply



Submit