Selenium Webdriver: How to Download a PDF File with Python

How to download file in pdf with selenium edge web driver in specific custom folder in python selenium?

NEW (works on edge)

To use this you have to install pyautogui library with the command pip install pyautogui

import time
import pyautogui
from selenium import webdriver

driver = webdriver.Edge()

pdf_url = 'http://www.africau.edu/images/default/sample.pdf'
driver.get(pdf_url)

time.sleep(3)

pyautogui.hotkey('ctrl', 's')
time.sleep(2)
path_and_filename = r'C:\Users\gt\Desktop\test.pdf'
pyautogui.typewrite(path_and_filename)
pyautogui.press('enter')

OLD (works on chrome)

This is the code I use to automatically download a pdf to a specific path. If you have windows, just put your account name in r'C:\Users\...\Desktop'. Moreover, you have to put the path of your driver in chromedriver_path. The code below downloads a sample pdf.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service

options = webdriver.ChromeOptions()
download_path = r'C:\Users\...\Desktop'
options.add_experimental_option('prefs', {
"download.default_directory": download_path, # change default directory for downloads
"download.prompt_for_download": False, # to auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True # it will not show PDF directly in chrome
})

chromedriver_path = '...'
driver = webdriver.Chrome(options=options, service=Service(chromedriver_path))

pdf_url = 'http://www.africau.edu/images/default/sample.pdf'
driver.get(pdf_url)

Downloading a PDF using Selenium, Chrome and Python

You need to replace "plugins.plugins_disabled": ["Chrome PDF Viewer"]

With:

"plugins.always_open_pdf_externally": True

Hope this helps you!

How to download PDF from url in python

I fixed it, i only changed one function. The correct url is in the given page_source of the driver (with beautifulsoup you can parse html, xml etc.):

from bs4 import BeautifulSoup

def extract_url(driver):
soup = BeautifulSoup(driver.page_source, "html.parser")
object_element = soup.find("object")
data = object_element.get("data")

return f"https://webice.ongc.co.in{data}"

The hostname part may can be extracted from the driver.
I think i did not changed anything else, but if it not work for you, I can paste the full code.

Old Answer:

if you print the text of the returned page (print(driver.page_source)) i think you would get a message that says something like:
"Because of your system configuration the pdf can't be loaded"

This is because the requested site checks some preferences to decide if you are a roboter or not. Maybe it helps to change some arguments (screen size, user agent) to fix this. Here are some information about, how to detect a headless browser.

And for the next time you should paste all relevant code into the question (imports) to make it easier to test.

How to download PDF by clicking using selenium chrome driver

To download the Domestic Fixed Deposits pdf from the website you need to click on the element inducing WebDriverWait for the element_to_be_clickable() and you can use the following locator strategy:

options = Options()
options.add_argument("start-maximized")
options.add_experimental_option('prefs', {
"download.default_directory": "C:\\Datafiles",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True
})
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get("https://www.axisbank.com/interest-rate-on-deposits?cta=homepage-rhs-fd")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//h2[text()='Domestic Fixed Deposits']//following-sibling::div[1]/span"))).click()

Note: You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Browser Snapshot:

fixed deposit



Related Topics



Leave a reply



Submit