Using Selenium in Python to Save a Webpage on Firefox

Save a Web Page with Python Selenium

Unfortunately you can't do what you would like to do with Selenium. You can use page_source to get the html but that is all that you would get.

Selenium unfortunately can't interact with the Dialog that is given to you when you do save as.

You can do the following to get the dialog up but then you will need something like AutoIT to finish it off

from selenium.webdriver.common.action_chains import ActionChains

saveas = ActionChains(driver).key_down(Keys.CONTROL)\
.send_keys('s').key_up(Keys.CONTROL)
saveas.perform()

using Selenium, Firefox, Python to save download of EPS files to disk after automated clicking of download link

Thank you to @unutbu for helping me solve this. I just didn't understand the anatomy of a file download. I do understand a little bit better now.

I ended up installing an extension called "Live HTTP Headers" on Firefox to examine the headers sent by the server. As it turned out, the 'EPS' files were sent with a 'Content-Type' of 'application/octet-stream'.

Now the EPS files are saved to disk as expected. I modified the Firefox preferences to the following:

profile.set_preference('browser.helperApps.neverAsk.saveToDisk',
'image/jpeg,image/png,'
'application/octet-stream')

How to download the PDF by using Selenium Module (FireFox) in Python 3

Apart from Tarun's solution, you can also download the file through js and store it as a blob. Then you can extract the data into python via selinium's execute script as shown in this answer.

In you case,

url = 'http://technical.traders.com/archive/articlefinal.asp?file=\V26\C07\\131INTR.pdf'
browser.execute_script("""
window.file_contents = null;
var xhr = new XMLHttpRequest();
xhr.responseType = 'blob';
xhr.onload = function() {
var reader = new FileReader();
reader.onloadend = function() {
window.file_contents = reader.result;
};
reader.readAsDataURL(xhr.response);
};
xhr.open('GET', %(download_url)s);
xhr.send();
""".replace('\r\n', ' ').replace('\r', ' ').replace('\n', ' ') % {
'download_url': json.dumps(url),
})

Now your data exists as a blob on the window object, so you can easily extract into python:

time.sleep(3)
downloaded_file = driver.execute_script("return (window.file_contents !== null ? window.file_contents.split(',')[1] : null);")
with open('/Users/Chetan/Desktop/dummy.pdf', 'wb') as f:
f.write(base64.b64decode(downloaded_file))


Related Topics



Leave a reply



Submit