Web scraping of key stats in Yahoo! Finance with R
I gave up on Excel a long time ago. R is definitely the way to go for things like this.
library(XML)
stocks <- c("AXP","BA","CAT","CSCO")
for (s in stocks) {
url <- paste0("http://finviz.com/quote.ashx?t=", s)
webpage <- readLines(url)
html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
tableNodes <- getNodeSet(html, "//table")
# ASSIGN TO STOCK NAMED DFS
assign(s, readHTMLTable(tableNodes[[9]],
header= c("data1", "data2", "data3", "data4", "data5", "data6",
"data7", "data8", "data9", "data10", "data11", "data12")))
# ADD COLUMN TO IDENTIFY STOCK
df <- get(s)
df['stock'] <- s
assign(s, df)
}
# COMBINE ALL STOCK DATA
stockdatalist <- cbind(mget(stocks))
stockdata <- do.call(rbind, stockdatalist)
# MOVE STOCK ID TO FIRST COLUMN
stockdata <- stockdata[, c(ncol(stockdata), 1:ncol(stockdata)-1)]
# SAVE TO CSV
write.table(stockdata, "C:/Users/your_path_here/Desktop/MyData.csv", sep=",",
row.names=FALSE, col.names=FALSE)
# REMOVE TEMP OBJECTS
rm(df, stockdatalist)
Webscraping with Yahoo Finance
I rewrote the answer to get only the fundamental data. First of, instead of a loop I put your scrape request into a function. Next I wrote an error catcher function loosely based on possibly
function from purrr. This to be able to return a function instead of a default value. Then you can use map_df
to loop over all the ticker symbols. Whenever there is an error, the data will be NA
but will show the ticker and fill in an error column.
If speed is an issue, you might look into the furrr package to be able to run all of this in parallel.
library(rvest)
library(purrr)
library(dplyr)
get_stats <- function(symbol) {
url <- paste0("https://finance.yahoo.com/quote/",symbol,"/key-statistics?p=", symbol)
df <- url %>%
read_html() %>%
html_table(header = FALSE) %>%
map_df(bind_cols) %>%
as_tibble()
names(df) <- c("valuation_measures", "value")
df["stock"] <- symbol
return(df)
}
catch_error <- function(.f, otherwise=NULL) {
function(...) {
tryCatch({
.f(...)
}, error = function(e) otherwise(...))
}
}
tickers <- c("xxxxxx", "AAPL")
out <- map_df(tickers, catch_error(get_stats, otherwise = function(x) tibble(valuation_measures = NA_character_, value = NA_character_, stock = x, error = "error in getting data")))
# A tibble: 60 x 4
valuation_measures value stock error
<chr> <chr> <chr> <chr>
1 NA NA xxxxxx error in getting data
2 Market Cap (intraday) 5 1.22T AAPL NA
3 Enterprise Value 3 1.23T AAPL NA
4 Trailing P/E 22.07 AAPL NA
5 Forward P/E 1 17.81 AAPL NA
6 PEG Ratio (5 yr expected) 1 1.52 AAPL NA
7 Price/Sales (ttm) 4.54 AAPL NA
8 Price/Book (mrq) 13.61 AAPL NA
9 Enterprise Value/Revenue 3 4.58 AAPL NA
10 Enterprise Value/EBITDA 6 15.69 AAPL NA
# ... with 50 more rows
R: web scraping yahoo.finance after 2019 change
As mentioned in the comment above, here is an alternative that tries to deal with the different table sizes published. I have worked on this and have had help from a friend.
library(rvest)
library(tidyverse)
url <- https://finance.yahoo.com/quote/AAPL/financials?p=AAPL
# Download the data
raw_table <- read_html(url) %>% html_nodes("div.D\\(tbr\\)")
number_of_columns <- raw_table[1] %>% html_nodes("span") %>% length()
if(number_of_columns > 1){
# Create empty data frame with the required dimentions
df <- data.frame(matrix(ncol = number_of_columns, nrow = length(raw_table)),
stringsAsFactors = F)
# Fill the table looping through rows
for (i in 1:length(raw_table)) {
# Find the row name and set it.
df[i, 1] <- raw_table[i] %>% html_nodes("div.Ta\\(start\\)") %>% html_text()
# Now grab the values
row_values <- raw_table[i] %>% html_nodes("div.Ta\\(end\\)")
for (j in 1:(number_of_columns - 1)) {
df[i, j+1] <- row_values[j] %>% html_text()
}
}
view(df)
In R Studio how would I pull a specific piece of data from Yahoo Finance
You can try this.
If you try to call Book Value
with an additional value, it is working. However, when you call only Book Value
, the function is giving an error. You can post this bug to the package provider's GitHub. Here is the link
The following producer might work for you for now.
getQuote("AAPL",what=yahooQF(c("Open","Book Value")))
>
Trade Time Open Book Value
AAPL 2021-04-30 16:00:04 131.78 4.146
I just requested additional Open
price information. With that, API worked.
Extracting data from Yahoo Finance
There is actually a link to a downloadable csv on the page. You can specify the company and time period etc, in R prior to downloading it.
The only problem is that the url needs to include a parameter called "crumbs" which is effectively your session ID, and you'll need to harvest this from the page to construct the url.
I have written a function to do all this. All you need to do is supply the company name and it will give you all financial records in the last 15 years or so as a data frame.
get_financial_data <- function(company)
{
tmp <- tempfile()
yahoo <- httr::handle('https://finance.yahoo.com/quote/')
'https://finance.yahoo.com/quote/' %>%
paste0(company) %>%
paste0('/history?p=') %>%
paste0(company) %>%
httr::GET(handle = yahoo) %>%
httr::content("text") -> raw_html
strsplit(raw_html, "crumb\":\"") %>%
unlist %>%
strsplit("\"") %>%
lapply(`[`, 1) %>%
unlist %>%
table %>%
sort %>%
rev %>%
names %>%
`[`(2) %>%
paste0("https://query1.finance.yahoo.com/v7/finance/download/", company,
"?period1=1079873500&period2=1579873500&interval=1d&events=history&crumb=", .) %>%
httr::GET(handle = yahoo) %>%
httr::content("text", encoding = "UTF-8") %>%
writeBin(tmp, useBytes = T)
suppressWarnings(read.csv(tmp) %>% tibble::as_tibble())
}
Now you can just do:
get_financial_data("SAP")
#> # A tibble: 3,988 x 7
#> Date Open High Low Close Adj.Close Volume
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 2004-03-22 37.4 37.5 36.7 37.3 29.3 1717600
#> 2 2004-03-23 37.8 37.9 37.4 37.5 29.5 1417500
#> 3 2004-03-24 37.4 38.1 37.2 37.5 29.5 1682500
#> 4 2004-03-25 37.9 38.9 37.9 38.8 30.4 2233800
#> 5 2004-03-26 37.9 38.2 37.5 37.9 29.8 3343500
#> 6 2004-03-29 38.6 39.3 38.6 38.8 30.5 1886800
#> 7 2004-03-30 38.8 39.3 38.8 39.1 30.7 997900
#> 8 2004-03-31 39.5 39.8 39.3 39.3 30.9 1818600
#> 9 2004-04-01 39.8 40.6 39.8 40.6 31.9 2570600
#> 10 2004-04-02 40.9 41.3 40.6 41.1 32.3 1708200
#> # ... with 3,978 more rows
Adjusting Yahoo Stock Data Web Scraping to Loop over Dates
Consider mapply
or its non-simplified wrapper, Map
, to iterate elementwise through the pairings of start and end dates and corresponding symbols. Also, avoid the use of assign
and get
and build list of data frames for final rbind
at the end:
library(XML)
...
dateGroup <- data.frame(
start = c(1509519600, 1518159600, 1526799600, 1535439600, 1544079600),
end = c(1518073200, 1526713200, 1535353200, 1543993200, 1550732400)
)
# CROSS JOIN ALL SYMBOLS WITH EACH DATE PAIRING
dt_grp_sym <- merge(dateGroup, data.frame(symbols))
# DEFINED METHOD FOR HTML PROCESSING
proc_html <- function(sym, sd, ed) {
url <- paste0('https://finance.yahoo.com/quote/', sym, '/history?period1=',
sd, '&period2=', ed, '&interval=1d&filter=history&frequency=1d')
print(url)
webpage <- readLines(url, warn=FALSE)
html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
tableNodes <- getNodeSet(html, "//table")
html_df <- transform(readHTMLTable(tableNodes[[1]],
header=c("Date", "Open", "High", "Low",
"Close", "Adj. Close", "Volume")),
symbol = sym)
return(html_df)
}
# ITERATE ELEMENTWISE THROUGH EVERY ROW of dt_grp_sym
df_list <- Map(proc_html, dt_grp_sym$symbols, dt_grp_sym$start, dt_grp_sym$end)
final_df <- do.call(rbind, df_list)
To demonstrate using the Class 1 U.S. railroads:
symbols <- c("UNP", "CSX", "NSC", "CNI", "KSU")
dateGroup <- data.frame(
start = c(1509519600, 1518159600, 1526799600, 1535439600, 1544079600),
end = c(1518073200, 1526713200, 1535353200, 1543993200, 1550732400)
)
dt_grp_sym <- merge(dateGroup, data.frame(symbols))
# CALLING SAME ABOVE FUNCTION
df_list <- with(dt_grp_sym, Map(proc_html, symbols, start, end))
final_df <- do.call(rbind, df_list)
Output
by(final_df, final_df$symbol, head)
# final_df$symbol: CNI
# Date Open High Low Close Adj..Close Volume symbol
# 998 Feb 08, 2018 76.08 76.16 74.11 74.45 72.79 1,508,100 CNI
# 999 Feb 07, 2018 76.86 77.23 76.01 76.17 74.48 1,645,400 CNI
# 1000 Feb 06, 2018 76.21 77.42 74.81 77.14 75.42 2,293,300 CNI
# 1001 Feb 05, 2018 78.00 78.70 77.12 77.17 75.45 1,711,000 CNI
# 1002 Feb 02, 2018 79.17 79.24 78.17 78.46 76.71 1,331,400 CNI
# 1003 Feb 01, 2018 79.91 80.54 79.24 79.82 78.04 1,231,500 CNI
# ------------------------------------------------------------------------------
# final_df$symbol: CSX
# Date Open High Low Close Adj..Close Volume symbol
# 333 Feb 08, 2018 52.91 53.16 50.46 50.47 49.80 7,798,100 CSX
# 334 Feb 07, 2018 53.38 54.36 52.94 52.97 52.26 6,496,200 CSX
# 335 Feb 06, 2018 51.27 54.00 50.12 53.82 53.10 10,563,700 CSX
# 336 Feb 05, 2018 54.89 55.04 51.96 51.99 51.30 9,070,200 CSX
# 337 Feb 02, 2018 56.19 56.35 55.20 55.25 54.51 9,275,800 CSX
# 338 Feb 01, 2018 56.10 57.10 56.04 56.58 55.83 4,079,100 CSX
# ------------------------------------------------------------------------------
# final_df$symbol: KSU
# Date Open High Low Close Adj..Close Volume symbol
# 1330 Feb 08, 2018 107.17 107.64 103.50 103.53 102.15 1,434,600 KSU
# 1331 Feb 07, 2018 106.59 108.27 106.59 107.10 105.67 1,326,800 KSU
# 1332 Feb 06, 2018 103.11 108.02 102.07 107.32 105.89 1,459,400 KSU
# 1333 Feb 05, 2018 109.73 110.44 105.12 105.18 103.77 1,272,100 KSU
# 1334 Feb 02, 2018 112.06 112.85 110.03 110.15 108.68 1,051,900 KSU
# 1335 Feb 01, 2018 112.80 114.00 112.17 112.87 111.36 1,011,200 KSU
# ------------------------------------------------------------------------------
# final_df$symbol: NSC
# Date Open High Low Close Adj..Close Volume symbol
# 665 Feb 08, 2018 142.62 143.27 136.87 136.89 134.22 2,657,200 NSC
# 666 Feb 07, 2018 142.09 144.45 141.37 142.68 139.89 1,464,500 NSC
# 667 Feb 06, 2018 136.99 143.45 134.55 143.05 140.26 2,455,000 NSC
# 668 Feb 05, 2018 144.74 146.73 138.18 138.61 135.90 2,508,900 NSC
# 669 Feb 02, 2018 147.15 147.85 144.61 145.03 142.20 1,774,600 NSC
# 670 Feb 01, 2018 149.28 150.35 147.90 148.47 145.57 1,427,000 NSC
# ------------------------------------------------------------------------------
# final_df$symbol: UNP
# Date Open High Low Close Adj..Close Volume symbol
# 1 Feb 08, 2018 128.70 128.70 124.81 124.86 122.27 6,325,100 UNP
# 2 Feb 07, 2018 130.34 131.82 128.94 128.96 126.29 5,053,000 UNP
# 3 Feb 06, 2018 122.28 131.50 121.50 131.15 128.43 15,734,300 UNP
# 4 Feb 05, 2018 128.59 131.78 124.13 124.14 121.57 6,744,400 UNP
# 5 Feb 02, 2018 131.66 131.73 127.22 129.36 126.68 8,181,200 UNP
# 6 Feb 01, 2018 132.51 133.74 131.86 132.38 129.64 5,597,600 UNP
Related Topics
How to Add Se Error Bars to My Barplot in Ggplot2
Installing Package from a Local .Tar.Gz File on Linux
Insert Function Variable into Graph Title
How to Print a Variable Inside a for Loop to the Console in Real Time as the Loop Is Running
Create Columns from Column of List in Data.Table
How to Figure Third Friday of a Month in R
How to Pad a Vector with Na from the Front
R: Strptime() and Is.Na () Unexpected Results
Plot Title at Bottom of Plot Using Ggplot2
Convert Factor to Date Class for Multiple Columns
How to Prep Transaction Data into Basket for Arules
How to Deploy Shiny App That Uses Local Data
How to Add Legend to Geom_Smooth in Ggplot in R
R: Compare All the Columns Pairwise in Matrix
Rselenium, Chrome, How to Set Download Directory, File Download Error
Reshaping Data to Plot in R Using Ggplot2
How to Preserve Continuous (1,2,3,...N) Ranking Notation When Ranking in R