How to get the first 10 words in a string in R?
Here is an small function that unlist the strings, subsets the first ten words and then pastes it back together.
string_fun <- function(x) {
ul = unlist(strsplit(x, split = "\\s+"))[1:10]
paste(ul,collapse=" ")
}
string_fun(x)
df <- read.table(text = "Keyword,City(Column Header)
The length of the string should not be more than 10 is or are in,New York
The Keyword should be of specific length is or are in,Los Angeles
This is an experimental basis program string is or are in,Seattle
Please help me with getting only the first ten words is or are in,Boston", sep = ",", header = TRUE)
df <- as.data.frame(df)
Using apply (the function isn't doing anything in the second column)
df$Keyword <- apply(df[,1:2], 1, string_fun)
EDIT
Probably this is a more general way to use the function.
df[,1] <- as.character(df[,1])
df$Keyword <- unlist(lapply(df[,1], string_fun))
print(df)
# Keyword City.Column.Header.
# 1 The length of the string should not be more than New York
# 2 The Keyword should be of specific length is or are Los Angeles
# 3 This is an experimental basis program string is or Seattle
# 4 Please help me with getting only the first ten Boston
R: how to display the first n characters from a string of words
The other answers didn't eliminate the spaces as you did in your example, so I'll add this:
strsplit(substr(gsub("\\s+", "", Getty), 1, 10), '')[[1]]
#[1] "F" "o" "u" "r" "s" "c" "o" "r" "e" "a"
Extract the first (or last) n characters of a string
See ?substr
R> substr(a, 1, 4)
[1] "left"
R get N words from a sentence as a string
Pure stringi
solution (stringr::word()
is overkill and uses more stringi
functions than this. stringr
handicap-wraps stringi
functions):
library(stringi)
sentence <- "The quick brown fox jumps over the lazy dog"
tail(stri_extract_all_words(sentence)[[1]], 2)
## [1] "lazy" "dog"
stri_join(tail(stri_extract_all_words(sentence)[[1]], 2), collapse=" ")
## [1] "lazy dog"
Actually readable version:
library(magrittr)
stri_extract_all_words(sentence)[[1]] %>%
tail(2) %>%
stri_join(collapse=" ")
## [1] "lazy dog"
It also uses a better, locale-sensitive word-break algorithm which is superior to base R's.
Using str_extract_all to extract only first two words in R?
Just relying on the stringr
package.
library(stringr)
species_location<-c('Homo_sapiens_Lausanne_Switzerland', 'Solenopsis_invicta_California_US', 'Rattus_novaborensis_Copenhagen_Denmark', 'Candida_albicans_Crotch_Home')
word(species_location, 1,2, sep="_")
Extract the first 2 Characters in a string
You can just use the substr
function directly to take the first two characters of each string:
x <- c("75 to 79", "80 to 84", "85 to 89")
substr(x, start = 1, stop = 2)
# [1] "75" "80" "85"
You could also write a simple function to do a "reverse" substring, giving the 'start' and 'stop' values assuming the index begins at the end of the string:
revSubstr <- function(x, start, stop) {
x <- strsplit(x, "")
sapply(x,
function(x) paste(rev(rev(x)[start:stop]), collapse = ""),
USE.NAMES = FALSE)
}
revSubstr(x, start = 1, stop = 2)
# [1] "79" "84" "89"
Extract first word from a column and insert into new column
You can use a regex ("([A-Za-z]+)"
or "([[:alpha:]]+)"
or "(\\w+)"
) to grab the first word
Dataframe1$COL2 <- gsub("([A-Za-z]+).*", "\\1", Dataframe1$COL1)
Extract words from a string
Use gsub()
with a regular expression
x <- c("Resistance_Test DevID (Ohms) 428", "Diode_Test SUBLo (V) 353")
ptn <- "(.*? ){3}"
gsub(ptn, "", x)
[1] "428" "353"
This works because the regular expression (.*? ){3}
finds exactly three {3}
sets of characters followed by a space (.*? )
, and then replaces this with ane empty string.
See ?gsub
and ?regexp
for more information.
If your data has structure that you don't mention in your question, then possibly the regular expression becomes even easier.
For example, if you are always interested in the last word of each line:
ptn <- "(.*? )"
gsub(ptn, "", x)
Or perhaps you know for sure you can only search for digits and discard everything else:
ptn <- "\\D"
gsub(ptn, "", x)
obtaining first word in the string
A very simple approach with gsub
gsub("/.*", '', y)
[1] "london" "newyork" "paris"
Getting and removing the first character of a string
See ?substring
.
x <- 'hello stackoverflow'
substring(x, 1, 1)
## [1] "h"
substring(x, 2)
## [1] "ello stackoverflow"
The idea of having a pop
method that both returns a value and has a side effect of updating the data stored in x
is very much a concept from object-oriented programming. So rather than defining a pop
function to operate on character vectors, we can make a reference class with a pop
method.
PopStringFactory <- setRefClass(
"PopString",
fields = list(
x = "character"
),
methods = list(
initialize = function(x)
{
x <<- x
},
pop = function(n = 1)
{
if(nchar(x) == 0)
{
warning("Nothing to pop.")
return("")
}
first <- substring(x, 1, n)
x <<- substring(x, n + 1)
first
}
)
)
x <- PopStringFactory$new("hello stackoverflow")
x
## Reference class object of class "PopString"
## Field "x":
## [1] "hello stackoverflow"
replicate(nchar(x$x), x$pop())
## [1] "h" "e" "l" "l" "o" " " "s" "t" "a" "c" "k" "o" "v" "e" "r" "f" "l" "o" "w"
Related Topics
How to Install Tidyverse on Ubuntu 16.04 and 17.04
Ggsave Png Error with Larger Size
How to Color the Ocean Blue in a Map of the Us
Importing S3 Method from Another Package
Get Continent Name from Country Name in R
How to Know a Function or an Operation in R Is Vectorized
Ggplot2: How to Set the Default Fill-Colour of Geom_Bar() in a Theme
Knitr Inline Chunk Options (No Evaluation) or Just Render Highlighted Code
Flatten Nested Lists in a List
Using Override.Aes() in Ggplot2 with Layered Symbols (R)
System Is Computationally Singular: Reciprocal Condition Number in R
How to Add "Author" Metadata to a PDF Created from R
Pivot_Longer into Multiple Columns
How to Reference Column Names That Start with a Number, in Data.Table
Sending in Column Name to Ddply from Function