Rselenium on Docker: Where Are Files Downloaded

RSelenium on docker: where are files downloaded?

The docker container is a separate entity to the HOST which is running it. You need to map a directory on the HOST to a directory on the container you download files to:

You can do this when starting your container:

docker run -d -p 4445:4444 -p 5901:5900 -v /home/john/test:/home/seluser/Downloads selenium/standalone-firefox-debug:2.53.1

Here (i am running docker on linux) I have mapped a directory on my linux HOST (/home/john/test) to a directory on the container (/home/seluser/Downloads).

We then need to add the necessary information to the firefox profile:

library(RSelenium)
ePrefs <- makeFirefoxProfile(
list(
browser.download.dir = "/home/seluser/Downloads",
"browser.download.folderList" = 2L,
"browser.download.manager.showWhenStarting" = FALSE,
"browser.helperApps.neverAsk.saveToDisk" = "multipart/x-zip,application/zip,application/x-zip-compressed,application/x-compressed,application/msword,application/csv,text/csv,image/png ,image/jpeg, application/pdf, text/html,text/plain, application/excel, application/vnd.ms-excel, application/x-excel, application/x-msexcel, application/octet-stream"))
remDr <- remoteDriver(extraCapabilities = ePrefs, port = 4445)
remDr$open()
remDr$navigate("http://www.colorado.edu/conflict/peace/download/")
firstzip <- remDr$findElement("xpath", "//a[contains(@href, 'zip')]")
firstzip$clickElement()

We can check if the download is on the HOST machine:

> list.files("/home/john/test/")
[1] "peace.zip"

Accessing file downloads from containerized RSpec/Capybara and Selenium Chrome

The following modifications resolved my issue.

spec/spec_helper.rb

Pass the following prefs in chromeOptions

        "prefs" => {
'download.default_directory' => '/tmp',
'download.directory_upgrade' => true,
'download.prompt_for_download' => false
}

Here is the complete file

require 'colorize'
require 'capybara/dsl'
require 'capybara/rspec'
require 'byebug'

RSpec.configure do |config|
config.color = true
config.tty = true
config.formatter = :documentation
config.include Capybara::DSL
end

def create_web_session
Capybara.app_host = 'https://github.com'
Capybara.run_server = false # don't start Rack

if ENV['CHROME_URL']
Capybara.register_driver :selenium_chrome_headless do |app|
args = [
'--no-default-browser-check',
'--start-maximized',
'--headless',
'--disable-dev-shm-usage',
'--whitelisted-ips'
]
caps = Selenium::WebDriver::Remote::Capabilities.chrome("chromeOptions" => {
"args" => args,
"prefs" => {
'download.default_directory' => '/tmp',
'download.directory_upgrade' => true,
'download.prompt_for_download' => false
}
})

Capybara::Selenium::Driver.new(
app,
browser: :remote,
desired_capabilities: caps,
url: ENV['CHROME_URL']
)
end
end
@session = Capybara::Session.new(:selenium_chrome_headless)
#@session = Capybara::Session.new(:selenium_chrome)
end

spec/test/demo_spec.rb

Change directory to /tmp and look for the download in /tmp

require 'spec_helper.rb'
require 'webdrivers/chromedriver'

sleep 1

RSpec.describe 'basic_tests', type: :feature do
before(:each) do
@session = create_web_session
Dir.chdir "/tmp"
end

it 'Load page' do
@session.visit '/docker/compose/releases/tag/1.27.0'
@session.find_link('Source code (zip)')
@session.click_link('Source code (zip)')
sleep 3
f = File.join('/tmp','compose-1.27.0.zip')
expect(File.exists?(f)).to be true
File.delete(f)
end

end

docker-compose.yml

Share /tmp as a docker volume between the rspec and chrome containers

version: '3.7'
networks:
mynet:
volumes:
downloads:
services:
rspec-chrome:
container_name: rspec-chrome
image: rspec-chrome
build:
context: .
dockerfile: Dockerfile
environment:
CHROME_URL: http://chrome:4444/wd/hub
stdin_open: true
tty: true
networks:
mynet:
depends_on:
- chrome
volumes:
- downloads:/tmp
chrome:
container_name: chrome
image: selenium/standalone-chrome
networks:
mynet:
volumes:
- /dev/shm:/dev/shm
- downloads:/tmp

Downloading data using RSelenium & Docker containers (makeFirefoxProfile & mime types)

It turned out that the MIME types were different on each website. For School 1, Dataset A, the file I was trying to download was a standard csv file (text/csv). The other schools/datasets were all application/x-csv MIME types.

To find the MIME type of the file I was attempting to download, I followed these steps: https://developer.mozilla.org/en-US/docs/Learn/Server-side/Configuring_server_MIME_types.

There is also a known bug when it comes to specifying the file location. My final code looked like this:

file_path <- getwd() %>% str_replace_all("/", "\\\\\\\\")

#Set download info for remoteDriver (aka, where to save datasets)
fprof <- makeFirefoxProfile(list(browser.download.dir = file_path,
browser.download.folderList = 2L,
browser.download.manager.showWhenStarting = FALSE,
browser.helperApps.neverAsk.saveToDisk = "application/x-csv,attachment/csv,application/excel,text/csv,application/vnd.ms-excel,application/vnd.ms-excel.addin.macroenabled.12,application/vnd.ms-excelsheet.binary.macroenabled.12,application/vnd.ms-excel.template.macroenabled.12,application/vnd.ms-excel.sheet.macroenabled.12,image/png,application/zip,application/pdf"))

Note that I added application/x-csv,attachment/csv to the browser.helperApps.neverAsk.saveToDisk list.

Important: It is worth adding application/csv to that list. I did not do it here, but had to do it when downloading another file of MIME type text/html; charset=UTF-8 to get it to download.

I also ditched Docker by downloading Java and then switching to rsDriver() so that I could watch R click through the browser instead of screenshotting each step (this not necessary).

Lastly, I believe that you should not have the same file types for browser.helperApps.neverAsk.openFile and browser.helperApps.neverAsk.saveToDisk, because they contradict each other. Since I wanted it to automatically save the file, I needed to only include browser.helperApps.neverAsk.saveToDisk.

How to set the download directory in docker selenium

You can add a volume to your selenium container that would be mapping your host to container folder so that your volumes would look like:

selenium:
image: selenium/standalone-chrome
ports:
- 4444:4444
restart: always
volumes:
- /path/to/host/folder:/Downloads

So all files which your container process would put to /Downloads would appear in /path/to/host/folder of your host.

Problem downloading file from a new tab with Rselenium

I was able to download the excel file by,

library(RSelenium)
driver <- rsDriver(browser = c("chrome"), port = 4341L)
remDr <- driver$client

# Select the dropdown menu,

remDr$findElement(using = "xpath",'//*[@id="ReportViewer1_ctl05_ctl04_ctl00"]') -> dropdown
dropdown$clickElement()

# Select the Excel file

remDr$findElement(using = "xpath",'//*[@id="ReportViewer1_ctl05_ctl04_ctl00_Menu"]/div[1]') -> exceldownload
exceldownload$clickElement()


Related Topics



Leave a reply



Submit