Run Rselenium in Parallel

Run RSelenium in parallel

On each node in the cluster start a remoteDriver:

library(RSelenium)
library(rvest)
library(magrittr)
library(foreach)
library(doParallel)

URLsPar <- c("http://www.bbc.com/", "http://www.cnn.com", "http://www.google.com",
"http://www.yahoo.com", "http://www.twitter.com")
appHTML <- c()
# start a Selenium Server
selServ <- startServer()

(cl <- (detectCores() - 1) %>% makeCluster) %>% registerDoParallel
# open a remoteDriver for each node on the cluster
clusterEvalQ(cl, {
library(RSelenium)
remDr <- remoteDriver()
remDr$open()
})
myTitles <- c()
ws <- foreach(x = 1:length(URLsPar), .packages = c("rvest", "magrittr", "RSelenium")) %dopar% {
remDr$navigate(URLsPar[x])
remDr$getTitle()[[1]]
}

# close browser on each node
clusterEvalQ(cl, {
remDr$close()
})

stopImplicitCluster()
# stop Selenium Server
selServ$stop()

> ws
[[1]]
[1] "BBC - Homepage"

[[2]]
[1] "CNN - Breaking News, U.S., World, Weather, Entertainment & Video News"

[[3]]
[1] "Google"

[[4]]
[1] "Yahoo"

[[5]]
[1] "Welcome to Twitter - Login or Sign up"

Running RSelenium in parallel using Docker

In the mean time, I was able to figure out how to fix my mistake. In case anyone else is facing the same issue, I am leaving a comment here. I cannot explain the logic behind it, but my code runs as expected when I replace

remDr <- remoteDriver(remoteServerAddr = "192.168.99.100", port = 4445L, 
browserName = "chrome")

by

remDr <- remoteDriver(port = 4445L) 

and using a Firefox browser rather than Chrome.

How To Run Selenium-scrapy in parallel

The following sample program creates a thread pool with only 2 threads for demo purposes and then scrapes 4 URLs to get their titles:

from multiprocessing.pool import ThreadPool
from bs4 import BeautifulSoup
from selenium import webdriver
import threading
import gc

class Driver:
def __init__(self):
options = webdriver.ChromeOptions()
options.add_argument("--headless")
# suppress logging:
options.add_experimental_option('excludeSwitches', ['enable-logging'])
self.driver = webdriver.Chrome(options=options)
print('The driver was just created.')

def __del__(self):
self.driver.quit() # clean up driver when we are cleaned up
print('The driver has terminated.')

threadLocal = threading.local()

def create_driver():
the_driver = getattr(threadLocal, 'the_driver', None)
if the_driver is None:
the_driver = Driver()
setattr(threadLocal, 'the_driver', the_driver)
return the_driver.driver

def get_title(url):
driver = create_driver()
driver.get(url)
source = BeautifulSoup(driver.page_source, "lxml")
title = source.select_one("title").text
print(f"{url}: '{title}'")

# just 2 threads in our pool for demo purposes:
with ThreadPool(2) as pool:
urls = [
'https://www.google.com',
'https://www.microsoft.com',
'https://www.ibm.com',
'https://www.yahoo.com'
]
pool.map(get_title, urls)
# must be done before terminate is explicitly or implicitly called on the pool:
del threadLocal
gc.collect()
# pool.terminate() is called at exit of with block

Prints:

The driver was just created.
The driver was just created.
https://www.google.com: 'Google'
https://www.microsoft.com: 'Microsoft - Official Home Page'
https://www.ibm.com: 'IBM - United States'
https://www.yahoo.com: 'Yahoo'
The driver has terminated.
The driver has terminated.

c# run selenium browsers in parallel

If you launch Chrome with different arguments, it launches a new instance. So in this case making --remote-debugging-port to be in different port will make the driver launch a new instance instead of using the existing instance.

Run Parallel Execution on Selenum Grid

Based on my understanding of the document, TestNG will respect the order of tests in your xml file.

Now, you want to run both classes in parallel so you have set parallel="classes" which is correct. However, your tests will still run on a single thread, meaning if you have multiple classes in your test group then it 'll run them in parallel but all tests will run in series one after another.

To solve this issue, you can either add multiple classes under same test group:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd">
<suite name="Parallel test suite" parallel="classes" thread-count="2">
<test thread-count="2" name="Transactoin">
<parameter name="parameterName" value="parameterValue"></parameter>
<classes>
<class name="Class1"/>
<class name="Class2"/>
</classes>
</test>
</suite> <!-- Suite -->

Or as in your case you can set parallel option to tests:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd">
<suite name="Parallel test suite" parallel="tests" thread-count="2">
<test thread-count="1" name="Transactoin">
<parameter name="remoteurl" value="http://xx.xx.xxx.xxx:5555/wd/hub"></parameter>
<classes>
<class name="POM_Test.ATransactionTest"/>
</classes>
</test> <!-- Test -->
<test thread-count="1" name="MyAlerts">
<parameter name="remoteurl" value="http://xx.xx.xx.xxx:5556/wd/hub"></parameter>
<classes>
<class name="POM_Test.MyAlertsTest"/>
</classes>
</test>
</suite> <!-- Suite -->

I hope this fixes your issue.



Related Topics



Leave a reply



Submit