How to download file in pdf with selenium edge web driver in specific custom folder in python selenium?
NEW (works on edge)
To use this you have to install pyautogui
library with the command pip install pyautogui
import time
import pyautogui
from selenium import webdriver
driver = webdriver.Edge()
pdf_url = 'http://www.africau.edu/images/default/sample.pdf'
driver.get(pdf_url)
time.sleep(3)
pyautogui.hotkey('ctrl', 's')
time.sleep(2)
path_and_filename = r'C:\Users\gt\Desktop\test.pdf'
pyautogui.typewrite(path_and_filename)
pyautogui.press('enter')
OLD (works on chrome)
This is the code I use to automatically download a pdf to a specific path. If you have windows, just put your account name in r'C:\Users\...\Desktop'
. Moreover, you have to put the path of your driver in chromedriver_path
. The code below downloads a sample pdf.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
options = webdriver.ChromeOptions()
download_path = r'C:\Users\...\Desktop'
options.add_experimental_option('prefs', {
"download.default_directory": download_path, # change default directory for downloads
"download.prompt_for_download": False, # to auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True # it will not show PDF directly in chrome
})
chromedriver_path = '...'
driver = webdriver.Chrome(options=options, service=Service(chromedriver_path))
pdf_url = 'http://www.africau.edu/images/default/sample.pdf'
driver.get(pdf_url)
Downloading a PDF using Selenium, Chrome and Python
You need to replace "plugins.plugins_disabled": ["Chrome PDF Viewer"]
With:
"plugins.always_open_pdf_externally": True
Hope this helps you!
How to download PDF from url in python
I fixed it, i only changed one function. The correct url is in the given page_source of the driver (with beautifulsoup you can parse html, xml etc.):
from bs4 import BeautifulSoup
def extract_url(driver):
soup = BeautifulSoup(driver.page_source, "html.parser")
object_element = soup.find("object")
data = object_element.get("data")
return f"https://webice.ongc.co.in{data}"
The hostname part may can be extracted from the driver.
I think i did not changed anything else, but if it not work for you, I can paste the full code.
Old Answer:
if you print the text of the returned page (print(driver.page_source)
) i think you would get a message that says something like:
"Because of your system configuration the pdf can't be loaded"
This is because the requested site checks some preferences to decide if you are a roboter or not. Maybe it helps to change some arguments (screen size, user agent) to fix this. Here are some information about, how to detect a headless browser.
And for the next time you should paste all relevant code into the question (imports) to make it easier to test.
How to download PDF by clicking using selenium chrome driver
To download the Domestic Fixed Deposits pdf from the website you need to click on the element inducing WebDriverWait for the element_to_be_clickable() and you can use the following locator strategy:
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option('prefs', {
"download.default_directory": "C:\\Datafiles",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True
})
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get("https://www.axisbank.com/interest-rate-on-deposits?cta=homepage-rhs-fd")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//h2[text()='Domestic Fixed Deposits']//following-sibling::div[1]/span"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Browser Snapshot:
Related Topics
Matplotlib Connect Scatterplot Points with Line - Python
How to Extract Info Within a #Shadow-Root (Open) Using Selenium Python
How to Make a Python Script Executable
Creating Lowpass Filter in Scipy - Understanding Methods and Units
What Is the Most Efficient Way of Counting Occurrences in Pandas
How to Leave/Exit/Deactivate a Python Virtualenv
Re.Sub Erroring with "Expected String or Bytes-Like Object"
Embedding Ipython Qt Console in a Pyqt Application
How to Use Valgrind with Python
How to Change the Host and Port That the Flask Command Uses
Reading Two Text Files Line by Line Simultaneously
How to Specify "Nullable" Return Type with Type Hints
Do I Need to Import Submodules Directly
How to Run a Python Script in a Web Page