how can I extract text from R's help command?
help
itself doesn't return anything useful. To get the help text, you can read the contents of the help database for a package, and parse that.
extract_help <- function(pkg, fn = NULL, to = c("txt", "html", "latex", "ex"))
{
to <- match.arg(to)
rdbfile <- file.path(find.package(pkg), "help", pkg)
rdb <- tools:::fetchRdDB(rdbfile, key = fn)
convertor <- switch(to,
txt = tools::Rd2txt,
html = tools::Rd2HTML,
latex = tools::Rd2latex,
ex = tools::Rd2ex
)
f <- function(x) capture.output(convertor(x))
if(is.null(fn)) lapply(rdb, f) else f(rdb)
}
pkg
is a character string giving the name of a packagefn
is a character string giving the name of a function within that package. If it is left as NULL
, then the help for all the functions in that package gets returned.to
converts the help file to txt, tml or whatever.
Example usage:
#Everything in utils
extract_help("utils")
#just one function
extract_help("utils", "browseURL")
#convert to html instead
extract_help("utils", "browseURL", "html")
#a non-base package
extract_help("plyr")
How to write contents of help to a file from within R?
Looks like the two functions you would need are tools:::Rd2txt
and utils:::.getHelpFile
. This prints the help file to the console, but you may need to fiddle with the arguments to get it to write to a file in the way you want.
For example:
hs <- help(survey)
tools:::Rd2txt(utils:::.getHelpFile(as.character(hs)))
Since these functions aren't currently exported, I would not recommend you rely on them for any production code. It would be better to use them as a guide to create your own stable implementation.
In R, can I get the help text for a function into a variable?
I think help.search
could be of use. For instance, if I wanted everything in the base
package:
x <- help.search("*", package="base")
entries <- data.frame(entry=x$matches$Entry, title=x$matches$Title)
entries[c(1, 100, 1000),]
# entry title
# 1 + Arithmetic Operators
# 100 c.POSIXlt Date-Time Classes
# 1000 encoding Functions to Manipulate Connections
R command to extract text between two strings containing curly parentheses
Use the following regex.
a2 <- "@article{2020, title={Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin}, volume={9}, ISSN={2045-7634}, url={http://dx.doi.org/10.1002/cam4.3002}, DOI={10.1002/cam4.3002}, number={11}, journal={Cancer Medicine}, publisher={Wiley}, author={Ji, Yefeng and Feng, Guanying and Hou, Yunwen and Yu, Yang and Wang, Ruixia and Yuan, Hua}, year={2020}, month={Apr}, pages={3954–3963} }"
sub("^.*title=\\{([^{}]+)\\}.*$", "\\1", a2)
#> [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin"
Created on 2022-03-19 by the reprex package (v2.0.1)
Edit
Alternative stringr
way.
stringr::str_match(a2, "^.*title=\\{([^{}]+)\\}.*$")[,2]
#> [1] "Long noncoding RNA MEG3 decreases the growth of head and neck squamous cell carcinoma by regulating the expression of miR-421 and E-cadherin"
Created on 2022-03-19 by the reprex package (v2.0.1)
Extract text with gsub
We can use str_extract
library(stringr)
str_extract(df$column.with.new.names, "KB_*\\d+[_ ]*[^_]*")
#[1] "KB_1813_B" "KB1720_1" "KB1810 mat"
Or the same pattern can be captured as a group with sub
sub(".*(KB_*\\d+[_ ]*[^_]*).*", "\\1", df$column.with.new.names)
#[1] "KB_1813_B" "KB1720_1" "KB1810 mat"
data
df <- data.frame(column.with.new.names = c("Baseline/Cell_Line_2_KB_1813_B_Baseline",
"Dose 0001/Cell_Line_3_KB1720_1_0001",
"Dose 0010/Cell_Line_1_KB1810 mat_0010"), stringsAsFactors = FALSE)
Extract text between specific string in a URL /
Split the string on /
and pull the 3rd and 2nd to last elements:
url = "https://www.somewebsiteLink.com/someDirectory/Directory/ascensor/163235494/d"
url2 = "https://www.somewebsiteLink.com/someDirectory/Directory/aire-acondicionado-calefaccion-ascensor/45837493/d"
urls = c(url, url2)
pieces = strsplit(urls, split = "/")
result = lapply(pieces, \(x) x[length(x) - 2:1])
## for older R verions:
# result = lapply(pieces, function(x) x[length(x) - 2:1])
result
# [[1]]
# [1] "ascensor" "163235494"
#
# [[2]]
# [1] "aire-acondicionado-calefaccion-ascensor" "45837493"
Extract text from search result URLs using R
This is a basic idea of how to go about scrapping this pages. Though it might be slow in r if there are many pages to be scrapped.
Now your question is a bit ambiguous. You want the end results to be .txt files. What of the webpages that has pdf??? Okay. you can still use this code and change the file extension to pdf for the webpages that have pdfs.
library(xml2)
library(rvest)
urll="https://search.newyorkfed.org/board_public/search?start=10&Search=&number=10&text=inflation"
urll%>%read_html()%>%html_nodes("div#results a")%>%html_attr("href")%>%
.[!duplicated(.)]%>%lapply(function(x) read_html(x)%>%html_nodes("body"))%>%
Map(function(x,y) write_html(x,tempfile(y,fileext=".txt"),options="format"),.,
c(paste("tmp",1:length(.))))
This is the breakdown of the code above:
The url you want to scrap from:
urll="https://search.newyorkfed.org/board_public/search?start=10&Search=&number=10&text=inflation"
Get all the url's that you need:
allurls <- urll%>%read_html()%>%html_nodes("div#results a")%>%html_attr("href")%>%.[!duplicated(.)]
Where do you want to save your texts?? Create the temp files:
tmps <- tempfile(c(paste("tmp",1:length(allurls))),fileext=".txt")
as per now. Your allurls
is in class character. You have to change that to xml in order to be able to scrap them. Then finally write them into the tmp files created above:
allurls%>%lapply(function(x) read_html(x)%>%html_nodes("body"))%>%
Map(function(x,y) write_html(x,y,options="format"),.,tmps)
Please do not leave anything out. For example after ..."format"),
there is a period. Take that into consideration.
Now your files have been written in the tempdir. To determine where they are, just type the command tempdir()
on the console and it should give you the location of your files. At the same time, you can change the location of the files on scrapping within the tempfile
command.
Hope this helps.
Related Topics
Error: Package or Namespace Load Failed for 'Car'
Adding a Legend to an Rgl 3D Plot
Error with H2O in R - Can't Connect to Local Host
Classic Case of 'Sum' Returning Na Because It Doesn't Sum Nas
Why "Character Is Often Preferred to Factor" in Data.Table for Key
R Shiny: Multiple Use in UI of Same Renderui in Server
Create a Concentric Circle Legend for a Ggplot Bubble Chart
Getting the Minimum of the Rows in a Data Frame
Place Text Values to Right of Sankey Diagram
Ggplot: How to Produce a Gradient Fill Within a Geom_Polygon
How to Pop Up the Graphics Window from Rscript
Does Calculating Correlation Between Two Dataframes Require a Loop
Puzzled by Xlim/Ylim Behavior in R
How to Annotate Ggplot2 Qplot Outside of Legend and Plotarea? (Similar to Mtext())
R: "Make" Not Found When Installing a R-Package from Local Tar.Gz
Is There a Package or Technique Availabe for Calculating Large Factorials in R