Run RSelenium in parallel
On each node in the cluster start a remoteDriver:
library(RSelenium)
library(rvest)
library(magrittr)
library(foreach)
library(doParallel)
URLsPar <- c("http://www.bbc.com/", "http://www.cnn.com", "http://www.google.com",
"http://www.yahoo.com", "http://www.twitter.com")
appHTML <- c()
# start a Selenium Server
selServ <- startServer()
(cl <- (detectCores() - 1) %>% makeCluster) %>% registerDoParallel
# open a remoteDriver for each node on the cluster
clusterEvalQ(cl, {
library(RSelenium)
remDr <- remoteDriver()
remDr$open()
})
myTitles <- c()
ws <- foreach(x = 1:length(URLsPar), .packages = c("rvest", "magrittr", "RSelenium")) %dopar% {
remDr$navigate(URLsPar[x])
remDr$getTitle()[[1]]
}
# close browser on each node
clusterEvalQ(cl, {
remDr$close()
})
stopImplicitCluster()
# stop Selenium Server
selServ$stop()
> ws
[[1]]
[1] "BBC - Homepage"
[[2]]
[1] "CNN - Breaking News, U.S., World, Weather, Entertainment & Video News"
[[3]]
[1] "Google"
[[4]]
[1] "Yahoo"
[[5]]
[1] "Welcome to Twitter - Login or Sign up"
Running RSelenium in parallel using Docker
In the mean time, I was able to figure out how to fix my mistake. In case anyone else is facing the same issue, I am leaving a comment here. I cannot explain the logic behind it, but my code runs as expected when I replace
remDr <- remoteDriver(remoteServerAddr = "192.168.99.100", port = 4445L,
browserName = "chrome")
by
remDr <- remoteDriver(port = 4445L)
and using a Firefox browser rather than Chrome.
How To Run Selenium-scrapy in parallel
The following sample program creates a thread pool with only 2 threads for demo purposes and then scrapes 4 URLs to get their titles:
from multiprocessing.pool import ThreadPool
from bs4 import BeautifulSoup
from selenium import webdriver
import threading
import gc
class Driver:
def __init__(self):
options = webdriver.ChromeOptions()
options.add_argument("--headless")
# suppress logging:
options.add_experimental_option('excludeSwitches', ['enable-logging'])
self.driver = webdriver.Chrome(options=options)
print('The driver was just created.')
def __del__(self):
self.driver.quit() # clean up driver when we are cleaned up
print('The driver has terminated.')
threadLocal = threading.local()
def create_driver():
the_driver = getattr(threadLocal, 'the_driver', None)
if the_driver is None:
the_driver = Driver()
setattr(threadLocal, 'the_driver', the_driver)
return the_driver.driver
def get_title(url):
driver = create_driver()
driver.get(url)
source = BeautifulSoup(driver.page_source, "lxml")
title = source.select_one("title").text
print(f"{url}: '{title}'")
# just 2 threads in our pool for demo purposes:
with ThreadPool(2) as pool:
urls = [
'https://www.google.com',
'https://www.microsoft.com',
'https://www.ibm.com',
'https://www.yahoo.com'
]
pool.map(get_title, urls)
# must be done before terminate is explicitly or implicitly called on the pool:
del threadLocal
gc.collect()
# pool.terminate() is called at exit of with block
Prints:
The driver was just created.
The driver was just created.
https://www.google.com: 'Google'
https://www.microsoft.com: 'Microsoft - Official Home Page'
https://www.ibm.com: 'IBM - United States'
https://www.yahoo.com: 'Yahoo'
The driver has terminated.
The driver has terminated.
c# run selenium browsers in parallel
If you launch Chrome with different arguments, it launches a new instance. So in this case making --remote-debugging-port
to be in different port will make the driver launch a new instance instead of using the existing instance.
Run Parallel Execution on Selenum Grid
Based on my understanding of the document, TestNG will respect the order of tests in your xml file.
Now, you want to run both classes in parallel so you have set parallel="classes"
which is correct. However, your tests will still run on a single thread, meaning if you have multiple classes in your test group then it 'll run them in parallel but all tests will run in series one after another.
To solve this issue, you can either add multiple classes under same test group:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd">
<suite name="Parallel test suite" parallel="classes" thread-count="2">
<test thread-count="2" name="Transactoin">
<parameter name="parameterName" value="parameterValue"></parameter>
<classes>
<class name="Class1"/>
<class name="Class2"/>
</classes>
</test>
</suite> <!-- Suite -->
Or as in your case you can set parallel option to tests:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd">
<suite name="Parallel test suite" parallel="tests" thread-count="2">
<test thread-count="1" name="Transactoin">
<parameter name="remoteurl" value="http://xx.xx.xxx.xxx:5555/wd/hub"></parameter>
<classes>
<class name="POM_Test.ATransactionTest"/>
</classes>
</test> <!-- Test -->
<test thread-count="1" name="MyAlerts">
<parameter name="remoteurl" value="http://xx.xx.xx.xxx:5556/wd/hub"></parameter>
<classes>
<class name="POM_Test.MyAlertsTest"/>
</classes>
</test>
</suite> <!-- Suite -->
I hope this fixes your issue.
Related Topics
Combining Multiple Complex Plots as Panels in a Single Figure
Categorical Bubble Plot for Mapping Studies
Dataframe Create New Column Based on Other Columns
How to Plot a Contour Line Showing Where 95% of Values Fall Within, in R and in Ggplot2
Highlight (Shade) Plot Background in Specific Time Range
How to Make a Dummy Variable in R
Programmatically Insert Text, Headers and Lists with R Markdown
Asterisk (*) VS. Colon (:) in R Formulas
Create a Gif from a Series of Leaflet Maps in R
Skip Specific Rows Using Read.CSV in R
Reshaping an Array to Data.Frame
Using Lapply with Changing Arguments
How to Create a Different Report for Each Subset of a Data Frame with R Markdown
Selection of Activity Trace in a Chart and Display in a Data Table in R Shiny
How to Suppress Automatic Table Name and Number in an .Rmd File Using Xtable or Knitr::Kable
Shiny Dynamic Filter Variable Selection and Display of Variable Values for Selection
How to Preserve Base Data Frame Rownames Upon Filtering in Dplyr Chain