Counting Occurrence of Particular Letter in Vector of Words in R

counting occurrence of particular letter in vector of words in r

Another posibility:

myvec <- c("A", "KILLS", "PASS", "JUMP", "BANANA", "AALU", "KPAL")

sapply(gregexpr("A", myvec, fixed = TRUE), function(x) sum(x > -1))

## [1] 1 0 1 0 3 2 1

EDIT This was begging for a benchmark:

library(stringr); library(stringi); library(microbenchmark); library(qdapDictionaries)

myvec <- toupper(GradyAugmented)

GREGEXPR <- function() sapply(gregexpr("A", myvec, fixed = TRUE), function(x) sum(x > -1))
GSUB <- function() nchar(gsub("[^A]", "", myvec))
STRSPLIT <- function() sapply(strsplit(myvec,""), function(x) sum(x=='A'))
STRINGR <- function() str_count(myvec, "A")
STRINGI <- function()  stri_count(myvec, fixed="A")
VAPPLY_STRSPLIT <- function() vapply(strsplit(myvec,""), function(x) sum(x=='A'), integer(1))

(op <- microbenchmark( 
    GREGEXPR(),
    GSUB(),
    STRINGI(),
    STRINGR(),
    STRSPLIT(),
    VAPPLY_STRSPLIT(),    
times=50L))

## Unit: milliseconds
##               expr        min         lq       mean     median        uq        max neval
##         GREGEXPR() 477.278895 631.009023 688.845407 705.878827 745.73596  906.83006    50
##             GSUB() 197.127403 202.313022 209.485179 205.538073 208.90271  270.19368    50
##          STRINGI()   7.854174   8.354631   8.944488   8.663362   9.32927   11.19397    50
##          STRINGR() 618.161777 679.103777 797.905086 787.554886 906.48192 1115.59032    50
##         STRSPLIT() 244.721701 273.979330 331.281478 294.944321 348.07895  516.47833    50
##  VAPPLY_STRSPLIT() 184.042451 206.049820 253.430502 219.107882 251.80117  595.02417    50

boxplot(op)

And stringi whooping some major tail. The vapply + strsplit was a nice approach as was the simple gsub approach. Interesting results for sure.

Sample Image

How to calculate the number of occurrence of a given character in each row of a column of strings?

The stringr package provides the str_count function which seems to do what you're interested in

# Load your example data
q.data<-data.frame(number=1:3, string=c("greatgreat", "magic", "not"), stringsAsFactors = F)
library(stringr)

# Count the number of 'a's in each element of string
q.data$number.of.a <- str_count(q.data$string, "a")
q.data
#  number     string number.of.a
#1      1 greatgreat           2
#2      2      magic           1
#3      3        not           0

count occurrences among a set of words in r

We can use str_count after pasteing the vector of 'words'

library(stringr)
df1$Scores <- str_count(df1$Col1, paste(words, collapse="|"))
df1$Scores
#[1] 3 3 3 2 0

Or another option is gregexpr from base R

res <- gregexpr(paste0(words, collapse="|"), df1$Col1)
df1$Scores <-  lengths(res) * !sapply(res, function(x) -1 %in% x)

data

words <- c("Mon", "Tues", "Wed")
df1 <- structure(list(Col1 = c("Mon,Tues,Wed,Thurs,Fri", "Mon,Tues,Wed,Thurs", 
"Mon,Tues,Wed", "Mon,Tues", "Thurs")), .Names = "Col1",
  class = "data.frame", row.names = c(NA, 
 -5L))

count the number of occurrences of ( in a string

( is a special character. You need to escape it:

str_count(s,"\\(")
# [1] 3

Alternatively, given that you're using stringr, you can use the coll function:

str_count(s,coll("("))
# [1] 3

Count word occurrences in R

Let's for the moment assume you wanted the number of element containing "corn":

length(grep("corn", dataset))
[1] 3

After you get the basics of R down better you may want to look at the "tm" package.

EDIT: I realize that this time around you wanted any-"corn" but in the future you might want to get word-"corn". Over on r-help Bill Dunlap pointed out a more compact grep pattern for gathering whole words:

grep("\\<corn\\>", dataset)

Count occurrences of specific words from a dataframe row in R

I Assume this is what you require

Sample data

id <- c(1:4)
text <- c('I have a Dataset with 2 columns a',
          'nd multiple rows. first column ID', 'second column the text which',
          'n the text which belongs to it.')
dataset <- data.frame(id,text)

Function to find count

library(stringr)
getCount <- function(data,keyword)
{
  wcount <- str_count(dataset$text, keyword)
  return(data.frame(data,wcount))
}

Calling getCount should give the updated dataset

> getCount(dataset,'second')
  id                              text wcount
  1   I have a Dataset with 2 columns a      0
  2   nd multiple rows. first column ID      0
  3        second column the text which      1
  4     n the text which belongs to it.      0

Count the number of all words in a string

You can use strsplit and sapply functions

sapply(strsplit(str1, " "), length)

Count occurrences of words in a string according to a category in R

Here is a base R method to get the count across types.

dataset$wcnt <- rowSums(sapply(c("dog|wolf", "cat|lion"),
                               function(x) grepl(x, dataset$text)))

Here, sapply runs through the regular expressions of each type and feeds it to grepl. This returns a matrix, where the columns are logical vectors indicating if a particular type (eg, "dog|wolf") was found. rowSums sums the logicals along the rows to get the type variety count.

This returns

dataset
  id               text wcnt
1  1          saw a cat    1
2  2        found a dog    1
3  3 saw a cat by a dog    2
4  4   There was a lion    1
5  5          Huge wolf    1

If you want the intermediary step, returning logical vectors as variables in your data.frame, you would probably want to set your values up in a named vector and then do cbind with the result.

# construct named vector
myTypes <- c("canine"="dog|wolf", "feline"="cat|lion")
# cbind sapply results of logicals to original data.frame
dataset <- cbind(dataset, sapply(myTypes, function(x) grepl(x, dataset$text)))

This returns

dataset
  id               text canine feline
1  1          saw a cat  FALSE   TRUE
2  2        found a dog   TRUE  FALSE
3  3 saw a cat by a dog   TRUE   TRUE
4  4   There was a lion  FALSE   TRUE
5  5          Huge wolf   TRUE  FALSE

Counting Occurrence of Particular Letter in Vector of Words in R