RSelenium on docker: where are files downloaded?
The docker container is a separate entity to the HOST which is running it. You need to map a directory on the HOST to a directory on the container you download files to:
You can do this when starting your container:
docker run -d -p 4445:4444 -p 5901:5900 -v /home/john/test:/home/seluser/Downloads selenium/standalone-firefox-debug:2.53.1
Here (i am running docker on linux) I have mapped a directory on my linux HOST (/home/john/test) to a directory on the container (/home/seluser/Downloads).
We then need to add the necessary information to the firefox profile:
library(RSelenium)
ePrefs <- makeFirefoxProfile(
list(
browser.download.dir = "/home/seluser/Downloads",
"browser.download.folderList" = 2L,
"browser.download.manager.showWhenStarting" = FALSE,
"browser.helperApps.neverAsk.saveToDisk" = "multipart/x-zip,application/zip,application/x-zip-compressed,application/x-compressed,application/msword,application/csv,text/csv,image/png ,image/jpeg, application/pdf, text/html,text/plain, application/excel, application/vnd.ms-excel, application/x-excel, application/x-msexcel, application/octet-stream"))
remDr <- remoteDriver(extraCapabilities = ePrefs, port = 4445)
remDr$open()
remDr$navigate("http://www.colorado.edu/conflict/peace/download/")
firstzip <- remDr$findElement("xpath", "//a[contains(@href, 'zip')]")
firstzip$clickElement()
We can check if the download is on the HOST machine:
> list.files("/home/john/test/")
[1] "peace.zip"
Accessing file downloads from containerized RSpec/Capybara and Selenium Chrome
The following modifications resolved my issue.
spec/spec_helper.rb
Pass the following prefs in chromeOptions
"prefs" => {
'download.default_directory' => '/tmp',
'download.directory_upgrade' => true,
'download.prompt_for_download' => false
}
Here is the complete file
require 'colorize'
require 'capybara/dsl'
require 'capybara/rspec'
require 'byebug'
RSpec.configure do |config|
config.color = true
config.tty = true
config.formatter = :documentation
config.include Capybara::DSL
end
def create_web_session
Capybara.app_host = 'https://github.com'
Capybara.run_server = false # don't start Rack
if ENV['CHROME_URL']
Capybara.register_driver :selenium_chrome_headless do |app|
args = [
'--no-default-browser-check',
'--start-maximized',
'--headless',
'--disable-dev-shm-usage',
'--whitelisted-ips'
]
caps = Selenium::WebDriver::Remote::Capabilities.chrome("chromeOptions" => {
"args" => args,
"prefs" => {
'download.default_directory' => '/tmp',
'download.directory_upgrade' => true,
'download.prompt_for_download' => false
}
})
Capybara::Selenium::Driver.new(
app,
browser: :remote,
desired_capabilities: caps,
url: ENV['CHROME_URL']
)
end
end
@session = Capybara::Session.new(:selenium_chrome_headless)
#@session = Capybara::Session.new(:selenium_chrome)
end
spec/test/demo_spec.rb
Change directory to /tmp and look for the download in /tmp
require 'spec_helper.rb'
require 'webdrivers/chromedriver'
sleep 1
RSpec.describe 'basic_tests', type: :feature do
before(:each) do
@session = create_web_session
Dir.chdir "/tmp"
end
it 'Load page' do
@session.visit '/docker/compose/releases/tag/1.27.0'
@session.find_link('Source code (zip)')
@session.click_link('Source code (zip)')
sleep 3
f = File.join('/tmp','compose-1.27.0.zip')
expect(File.exists?(f)).to be true
File.delete(f)
end
end
docker-compose.yml
Share /tmp as a docker volume between the rspec and chrome containers
version: '3.7'
networks:
mynet:
volumes:
downloads:
services:
rspec-chrome:
container_name: rspec-chrome
image: rspec-chrome
build:
context: .
dockerfile: Dockerfile
environment:
CHROME_URL: http://chrome:4444/wd/hub
stdin_open: true
tty: true
networks:
mynet:
depends_on:
- chrome
volumes:
- downloads:/tmp
chrome:
container_name: chrome
image: selenium/standalone-chrome
networks:
mynet:
volumes:
- /dev/shm:/dev/shm
- downloads:/tmp
Downloading data using RSelenium & Docker containers (makeFirefoxProfile & mime types)
It turned out that the MIME types were different on each website. For School 1, Dataset A, the file I was trying to download was a standard csv file (text/csv
). The other schools/datasets were all application/x-csv
MIME types.
To find the MIME type of the file I was attempting to download, I followed these steps: https://developer.mozilla.org/en-US/docs/Learn/Server-side/Configuring_server_MIME_types.
There is also a known bug when it comes to specifying the file location. My final code looked like this:
file_path <- getwd() %>% str_replace_all("/", "\\\\\\\\")
#Set download info for remoteDriver (aka, where to save datasets)
fprof <- makeFirefoxProfile(list(browser.download.dir = file_path,
browser.download.folderList = 2L,
browser.download.manager.showWhenStarting = FALSE,
browser.helperApps.neverAsk.saveToDisk = "application/x-csv,attachment/csv,application/excel,text/csv,application/vnd.ms-excel,application/vnd.ms-excel.addin.macroenabled.12,application/vnd.ms-excelsheet.binary.macroenabled.12,application/vnd.ms-excel.template.macroenabled.12,application/vnd.ms-excel.sheet.macroenabled.12,image/png,application/zip,application/pdf"))
Note that I added application/x-csv,attachment/csv
to the browser.helperApps.neverAsk.saveToDisk
list.
Important: It is worth adding application/csv
to that list. I did not do it here, but had to do it when downloading another file of MIME type text/html; charset=UTF-8
to get it to download.
I also ditched Docker by downloading Java and then switching to rsDriver()
so that I could watch R click through the browser instead of screenshotting each step (this not necessary).
Lastly, I believe that you should not have the same file types for browser.helperApps.neverAsk.openFile
and browser.helperApps.neverAsk.saveToDisk
, because they contradict each other. Since I wanted it to automatically save the file, I needed to only include browser.helperApps.neverAsk.saveToDisk
.
How to set the download directory in docker selenium
You can add a volume to your selenium container that would be mapping your host to container folder so that your volumes would look like:
selenium:
image: selenium/standalone-chrome
ports:
- 4444:4444
restart: always
volumes:
- /path/to/host/folder:/Downloads
So all files which your container process would put to /Downloads
would appear in /path/to/host/folder
of your host.
Problem downloading file from a new tab with Rselenium
I was able to download the excel file by,
library(RSelenium)
driver <- rsDriver(browser = c("chrome"), port = 4341L)
remDr <- driver$client
# Select the dropdown menu,
remDr$findElement(using = "xpath",'//*[@id="ReportViewer1_ctl05_ctl04_ctl00"]') -> dropdown
dropdown$clickElement()
# Select the Excel file
remDr$findElement(using = "xpath",'//*[@id="ReportViewer1_ctl05_ctl04_ctl00_Menu"]/div[1]') -> exceldownload
exceldownload$clickElement()
Related Topics
How to Install/Locate R.H and Rmath.H Header Files
How to Find The Indices Where There Are N Consecutive Zeroes in a Row
Converting a Long-Formated Dataframe to Wide Format Tidyverse
Getting Stargazer Column Labels to Print on Two or Three Lines
Control The Fill Order and Groups for a Ggplot2 Geom_Bar
R: How to Expand a Row Containing a "List" to Several Rows...One for Each List Member
Remove Certain Words in String from Column in Dataframe in R
Multiple Comboboxes in R Using Tcltk
Make a Boxplot Without Whiskers
How to Plot Multiple Lines in R
Initialize a List of Matrices in R
Verify Object Existence Inside a Function in R
Find Max Per Group and Return Another Column
Find If Each Row of a Logical Matrix Has at Least One True
How to Get Rstudio to Show Function Arguments and Descriptions for Custom Functions
R: Why Does Strptime Always Return Na When I Try to Format a Date String