Counting Occurrence of Particular Letter in Vector of Words in R

counting occurrence of particular letter in vector of words in r

Another posibility:

myvec <- c("A", "KILLS", "PASS", "JUMP", "BANANA", "AALU", "KPAL")

sapply(gregexpr("A", myvec, fixed = TRUE), function(x) sum(x > -1))

## [1] 1 0 1 0 3 2 1

EDIT This was begging for a benchmark:

library(stringr); library(stringi); library(microbenchmark); library(qdapDictionaries)

myvec <- toupper(GradyAugmented)

GREGEXPR <- function() sapply(gregexpr("A", myvec, fixed = TRUE), function(x) sum(x > -1))
GSUB <- function() nchar(gsub("[^A]", "", myvec))
STRSPLIT <- function() sapply(strsplit(myvec,""), function(x) sum(x=='A'))
STRINGR <- function() str_count(myvec, "A")
STRINGI <- function() stri_count(myvec, fixed="A")
VAPPLY_STRSPLIT <- function() vapply(strsplit(myvec,""), function(x) sum(x=='A'), integer(1))

(op <- microbenchmark(
GREGEXPR(),
GSUB(),
STRINGI(),
STRINGR(),
STRSPLIT(),
VAPPLY_STRSPLIT(),
times=50L))

## Unit: milliseconds
## expr min lq mean median uq max neval
## GREGEXPR() 477.278895 631.009023 688.845407 705.878827 745.73596 906.83006 50
## GSUB() 197.127403 202.313022 209.485179 205.538073 208.90271 270.19368 50
## STRINGI() 7.854174 8.354631 8.944488 8.663362 9.32927 11.19397 50
## STRINGR() 618.161777 679.103777 797.905086 787.554886 906.48192 1115.59032 50
## STRSPLIT() 244.721701 273.979330 331.281478 294.944321 348.07895 516.47833 50
## VAPPLY_STRSPLIT() 184.042451 206.049820 253.430502 219.107882 251.80117 595.02417 50

boxplot(op)

And stringi whooping some major tail. The vapply + strsplit was a nice approach as was the simple gsub approach. Interesting results for sure.

Sample Image

How to calculate the number of occurrence of a given character in each row of a column of strings?

The stringr package provides the str_count function which seems to do what you're interested in

# Load your example data
q.data<-data.frame(number=1:3, string=c("greatgreat", "magic", "not"), stringsAsFactors = F)
library(stringr)

# Count the number of 'a's in each element of string
q.data$number.of.a <- str_count(q.data$string, "a")
q.data
# number string number.of.a
#1 1 greatgreat 2
#2 2 magic 1
#3 3 not 0

count occurrences among a set of words in r

We can use str_count after pasteing the vector of 'words'

library(stringr)
df1$Scores <- str_count(df1$Col1, paste(words, collapse="|"))
df1$Scores
#[1] 3 3 3 2 0

Or another option is gregexpr from base R

res <- gregexpr(paste0(words, collapse="|"), df1$Col1)
df1$Scores <- lengths(res) * !sapply(res, function(x) -1 %in% x)

data

words <- c("Mon", "Tues", "Wed")
df1 <- structure(list(Col1 = c("Mon,Tues,Wed,Thurs,Fri", "Mon,Tues,Wed,Thurs",
"Mon,Tues,Wed", "Mon,Tues", "Thurs")), .Names = "Col1",
class = "data.frame", row.names = c(NA,
-5L))

count the number of occurrences of ( in a string

( is a special character. You need to escape it:

str_count(s,"\\(")
# [1] 3

Alternatively, given that you're using stringr, you can use the coll function:

str_count(s,coll("("))
# [1] 3

Count word occurrences in R

Let's for the moment assume you wanted the number of element containing "corn":

length(grep("corn", dataset))
[1] 3

After you get the basics of R down better you may want to look at the "tm" package.

EDIT: I realize that this time around you wanted any-"corn" but in the future you might want to get word-"corn". Over on r-help Bill Dunlap pointed out a more compact grep pattern for gathering whole words:

grep("\\<corn\\>", dataset)

Count occurrences of specific words from a dataframe row in R

I Assume this is what you require

Sample data

id <- c(1:4)
text <- c('I have a Dataset with 2 columns a',
'nd multiple rows. first column ID', 'second column the text which',
'n the text which belongs to it.')
dataset <- data.frame(id,text)

Function to find count

library(stringr)
getCount <- function(data,keyword)
{
wcount <- str_count(dataset$text, keyword)
return(data.frame(data,wcount))
}

Calling getCount should give the updated dataset

> getCount(dataset,'second')
id text wcount
1 I have a Dataset with 2 columns a 0
2 nd multiple rows. first column ID 0
3 second column the text which 1
4 n the text which belongs to it. 0

Count the number of all words in a string

You can use strsplit and sapply functions

sapply(strsplit(str1, " "), length)

Count occurrences of words in a string according to a category in R

Here is a base R method to get the count across types.

dataset$wcnt <- rowSums(sapply(c("dog|wolf", "cat|lion"),
function(x) grepl(x, dataset$text)))

Here, sapply runs through the regular expressions of each type and feeds it to grepl. This returns a matrix, where the columns are logical vectors indicating if a particular type (eg, "dog|wolf") was found. rowSums sums the logicals along the rows to get the type variety count.

This returns

dataset
id text wcnt
1 1 saw a cat 1
2 2 found a dog 1
3 3 saw a cat by a dog 2
4 4 There was a lion 1
5 5 Huge wolf 1

If you want the intermediary step, returning logical vectors as variables in your data.frame, you would probably want to set your values up in a named vector and then do cbind with the result.

# construct named vector
myTypes <- c("canine"="dog|wolf", "feline"="cat|lion")
# cbind sapply results of logicals to original data.frame
dataset <- cbind(dataset, sapply(myTypes, function(x) grepl(x, dataset$text)))

This returns

dataset
id text canine feline
1 1 saw a cat FALSE TRUE
2 2 found a dog TRUE FALSE
3 3 saw a cat by a dog TRUE TRUE
4 4 There was a lion FALSE TRUE
5 5 Huge wolf TRUE FALSE


Related Topics



Leave a reply



Submit