Downloading with Chrome Headless and Selenium

Downloading with chrome headless and selenium

Yes, it's a "feature", for security. As mentioned before here is the bug discussion: https://bugs.chromium.org/p/chromium/issues/detail?id=696481

Support was added in chrome version 62.0.3196.0 or above to enable downloading.

Here is a python implementation. I had to add the command to the chromedriver commands. I will try to submit a PR so it is included in the library in the future.

def enable_download_in_headless_chrome(self, driver, download_dir):
    # add missing support for chrome "send_command"  to selenium webdriver
    driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')

    params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': download_dir}}
    command_result = driver.execute("send_command", params)

For reference here is a little repo to demonstrate how to use this:
https://github.com/shawnbutton/PythonHeadlessChrome

update 2020-05-01 There have been comments saying this is not working anymore. Given this patch is now over a year old it's quite possible they have changed the underlying library.

Python Selenium Headless download

If anybody interested, after 2 days of search :). I manage to make it works!

I found the answer the bug tracking in this comment: https://bugs.chromium.org/p/chromium/issues/detail?id=696481#c86

The code I used is:

def enable_download_headless(browser,download_dir):
    browser.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
    params = {'cmd':'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': download_dir}}
    browser.execute("send_command", params)

if __name__ == '__main__':
    options = Options()
    options.add_argument("--disable-notifications")
    options.add_argument('--no-sandbox')
    options.add_argument('--verbose')
    options.add_experimental_option("prefs", {
        "download.default_directory": "C:\\tmp",
        "download.prompt_for_download": False,
        "download.directory_upgrade": True,
        "safebrowsing_for_trusted_sources_enabled": False,
        "safebrowsing.enabled": False
    })
    options.add_argument('--disable-gpu')
    options.add_argument('--disable-software-rasterizer')
    options.add_argument('--headless')
    driver_path = "C:\\Users\\tmp\\chromedriver.exe"
    driver = webdriver.Chrome(driver_path, chrome_options=options)
    enable_download_headless(driver, "C:/tmp")
    driver.get(url)

Maybe will be some use to others in the future...
Probably is a lot of useless things inside, but didn't have time yet to change :).

Download file through RemoteWebDriver and chrome in headless mode

Solution:

Just make sharefolder on machine where running selenium node or where have access to and set properties for downloadpath on network path

String downloadFilepath = "\\localhost\folderfordownload";
chromePrefs.put("download.default_directory", downloadFilepath);

Headless mode and non headless mode have different behavior when downloading same file multiple times

Reasons

Based on your experiments, and also based on my experience:

Chrome and Headless Chrome may behave differently (by many reasons). I've not found the clear documentation, but some Chrome browser options and preferences provided during chromedriver session start may be ignored by headless browser or may lead to different behaviour (like in your case with download). At least you may found several relevant questions on stackoverflow or github issues. Which might confirm my assumptions.

Headless chrome + ignore-certificate-errors
Selenium headless chrome error "Bootstrap's JavaScript requires jQuery"
Selenium Java - moveToElement does not work in headless but works in chrome
App that converts image to base64 string displays different results on browser's console when run in headed and headless mode. Does anyone know why?
etc.

The main reason I see is that Chrome is based on Chromium and provides a lot of adjustments and customizations. But in headless mode seems, pure Chromium browser launched (not Chrome).

How to deal with this behaviour

I assume you have access to the file system with java (based on your code example).

1 If you only want to validate the file content, you may always remove the downloaded file after validating it.

2 If you need to keep all the files, I suggest just to move the new downloaded file to another directory and rename it as you like. So you'll control all the file names are unique.

You might use apache commons lib for manage files:
https://zetcode.com/java/fileutils/.

If you run tests in parallel, try to keep tests with download steps within a single thread, or use unique download directories per browser, or implement synchronization for download actions.

Downloading with Chrome Headless and Selenium