How to Count the Frequency of a String for Each Row in R

How to count the frequency of a string for each row in R

df$count <- rowSums(df[-1] == "NC")
# V1 V2 V3 V4 count
# 1 rs1 NC AB NC 2
# 2 rs2 AB NC AA 1
# 3 rs3 NC NC NC 3

We can use rowSums on the matrix that is created from this expression df[-1] == "NC".

Counting overall word frequency when each sentence is a separate row in a dataframe

You can just use table() on the unlisted strsplit() of your column

table(unlist(strsplit(df$Words, " ")))

# Luke Luker Sky Skywalker Syker Walk
# 3 1 1 1 1 2

and if you need it sorted

sort(table(unlist(strsplit(df$Words, " "))), decreasing = TRUE)

# Luke Walk Luker Sky Skywalker Syker
# 3 2 1 1 1 1

where df$words is your column of interest.

R: Count the frequency of pairwise matching strings between all rows of a matrix

You are comparing wrong values:

apply(labels, 1, function(x) colMeans(x == t(labels)))

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1.0 0.0 0.0 0.0 0.0 0.4 0.0 0.0 1.0 0.0
[2,] 0.0 1.0 0.0 0.0 0.2 0.4 0.0 0.0 0.0 1.0
[3,] 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.2 0.0 0.0
[4,] 0.0 0.0 0.0 1.0 0.0 0.0 0.2 0.6 0.0 0.0
[5,] 0.0 0.2 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.2
[6,] 0.4 0.4 0.0 0.0 0.0 1.0 0.0 0.0 0.4 0.4
[7,] 0.0 0.0 0.0 0.2 0.0 0.0 1.0 0.0 0.0 0.0
[8,] 0.0 0.0 0.2 0.6 0.0 0.0 0.0 1.0 0.0 0.0
[9,] 1.0 0.0 0.0 0.0 0.0 0.4 0.0 0.0 1.0 0.0
[10,] 0.0 1.0 0.0 0.0 0.2 0.4 0.0 0.0 0.0 1.0

Count the frequency of strings in a dataframe R

You can use sapply() to go the counts and match every item in counts against the strings column in df using grepl() this will return a logical vector (TRUE if match, FALSE if non-match). You can sum this vector up to get the number of matches.

sapply(df, function(x) {
sapply(counts, function(y) {
sum(grepl(y, x))
})
})

This will return:

    strings
pi 5
in 2
pie 2
ie 2

Frequency of each word in a set of strings

Pipes do the job.

df <- data.frame(column_x = c("hello world", "hello morning hello", 
"bye bye world"), stringsAsFactors = FALSE)
require(dplyr)
df$column_x %>%
na.omit() %>%
tolower() %>%
strsplit(split = " ") %>% # or strsplit(split = "\\W")
unlist() %>%
table() %>%
sort(decreasing = TRUE)

Counting number of instances of a condition per row R

You can use rowSums.

df$no_calls <- rowSums(df == "nc")
df
# rsID sample1 sample2 sample3 sample1304 no_calls
#1 abcd aa bb nc nc 2
#2 efgh nc nc nc nc 4
#3 ijkl aa ab aa nc 1

Or, as pointed out by MrFlick, to exclude the first column from the row sums, you can slightly modify the approach to

df$no_calls <- rowSums(df[-1] == "nc")

Regarding the row names: They are not counted in rowSums and you can make a simple test to demonstrate it:

rownames(df)[1] <- "nc"  # name first row "nc"
rowSums(df == "nc") # compute the row sums
#nc 2 3
# 2 4 1 # still the same in first row


Related Topics



Leave a reply



Submit