Scrolling Page in Rselenium

Scrolling page in RSelenium

Assuming you got

library(RSelenium)
startServer()
remDr <- remoteDriver()
remDr$open()
remDr$setWindowSize(width = 800, height = 300)
remDr$navigate("https://www.r-project.org/about.html")

You could scroll to the buttom like this:

webElem <- remDr$findElement("css", "body")
webElem$sendKeysToElement(list(key = "end"))

And you could scroll to the top like this:

webElem$sendKeysToElement(list(key = "home"))

And in case you want to scroll down just a bit, use

webElem$sendKeysToElement(list(key = "down_arrow"))

The names of the keys are in selKeys.

RSelenium: Scroll down to load web content

If unfortunately your code does not work for scrolling down, try using executeScript() as below :-

remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")

Scroll down a page and load all items before using read_html()

To load whole page we need to scroll bit by bit instead of directly scrolling to the end of page.

#after navigating and accepting cookie, we shall scroll bit by bit 

for(i in 1:30){
print(i)
remDr$executeScript("window.scrollBy(0,500);")
Sys.sleep(1)
}

#get nodes of all houses
html_full_page = remDr$getPageSource()[[1]] %>%
read_html()
x <- html_full_page %>%
html_nodes('.re-CardPackPremium-carousel')
{xml_nodeset (30)}

Scrolling through entire page with Rselenium, then extracting a tabular data into a data frame

I solved this issue. There were two things that were going on. The first is that the page was automatically loading with the cursor inside of a search bar. I got rid of this by doing remDr$findElement(using = "css", "body")$clickElement() to click into the body of the text. Next, as one great question/answer pointed out, if the scrolling/arrow keys are not working with sendKeysToElement(list(key = "up_arrow")), you should try remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);").

Hence, the a small sample of my script is the following:

library(RSelenium)
library(rvest)
library(tidyverse)

## opens the driver
rD <- rsDriver(browser="firefox", port=4545L, verbose=F)
remDr <- rD[["client"]]

link_texts <- c("Base Set", "Promo", "Fossil")
## navigates to the correct page
remDr$navigate("https://www.pricecharting.com/category/pokemon-cards")

for (name in link_texts) {
## finds the link and clicks on it
remDr$findElement(using = "link text", name)$clickElement()
## gets the table path
remDr$findElement(using = "css", "body")$clickElement()
## finds the table - this line may be extraneous
table <- remDr$findElement(using = "css", "body")
## scrolls to the bottom of the table
remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")
Sys.sleep(1)
remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")
Sys.sleep(1)
remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")
Sys.sleep(1)
remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")
Sys.sleep(1)
remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")
Sys.sleep(1)
remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")
Sys.sleep(1)
## get the entire page source that's been loaded
html <- remDr$getPageSource()[[1]]
## read in the page source
page <- read_html(html)

data_name <- str_to_lower(str_replace(name, " ","_"))
## extract the tabular table
df <- page %>%
html_elements("#games_table") %>%
html_table() %>%
pluck(1) %>%
select(1:4)
assign(data_name, df)
Sys.sleep(3)
remDr$navigate("https://www.pricecharting.com/category/pokemon-cards")
}

## close driver
remDr$close()
rD$server$stop()

Check if it's possible to scroll down with RSelenium

Stumbled across a way to do this in Python here and modified it to work in R. Below is a now-working update of the original code I posted above.

# Open webpage
library(RSelenium)
rD = rsDriver(browser = "firefox")
remDr = rD[["client"]]
url = "https://stocktwits.com/symbol/NZDCHF"
remDr$navigate(url)

# Keep scrolling down page, loading new content each time.
last_height = 0 #
repeat {
remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")
Sys.sleep(3) #delay by 3sec to give chance to load.

# Updated if statement which breaks if we can't scroll further
new_height = remDr$executeScript("return document.body.scrollHeight")
if(unlist(last_height) == unlist(new_height)) {
break
} else {
last_height = new_height
}
}


Related Topics



Leave a reply



Submit