How to count the frequency of a string for each row in R
df$count <- rowSums(df[-1] == "NC")
# V1 V2 V3 V4 count
# 1 rs1 NC AB NC 2
# 2 rs2 AB NC AA 1
# 3 rs3 NC NC NC 3
We can use rowSums
on the matrix that is created from this expression df[-1] == "NC"
.
Counting overall word frequency when each sentence is a separate row in a dataframe
You can just use table()
on the unlisted strsplit()
of your column
table(unlist(strsplit(df$Words, " ")))
# Luke Luker Sky Skywalker Syker Walk
# 3 1 1 1 1 2
and if you need it sorted
sort(table(unlist(strsplit(df$Words, " "))), decreasing = TRUE)
# Luke Walk Luker Sky Skywalker Syker
# 3 2 1 1 1 1
where df$words
is your column of interest.
R: Count the frequency of pairwise matching strings between all rows of a matrix
You are comparing wrong values:
apply(labels, 1, function(x) colMeans(x == t(labels)))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1.0 0.0 0.0 0.0 0.0 0.4 0.0 0.0 1.0 0.0
[2,] 0.0 1.0 0.0 0.0 0.2 0.4 0.0 0.0 0.0 1.0
[3,] 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.2 0.0 0.0
[4,] 0.0 0.0 0.0 1.0 0.0 0.0 0.2 0.6 0.0 0.0
[5,] 0.0 0.2 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.2
[6,] 0.4 0.4 0.0 0.0 0.0 1.0 0.0 0.0 0.4 0.4
[7,] 0.0 0.0 0.0 0.2 0.0 0.0 1.0 0.0 0.0 0.0
[8,] 0.0 0.0 0.2 0.6 0.0 0.0 0.0 1.0 0.0 0.0
[9,] 1.0 0.0 0.0 0.0 0.0 0.4 0.0 0.0 1.0 0.0
[10,] 0.0 1.0 0.0 0.0 0.2 0.4 0.0 0.0 0.0 1.0
Count the frequency of strings in a dataframe R
You can use sapply()
to go the counts
and match every item in counts
against the strings
column in df
using grepl()
this will return a logical
vector (TRUE
if match, FALSE
if non-match). You can sum this vector up to get the number of matches.
sapply(df, function(x) {
sapply(counts, function(y) {
sum(grepl(y, x))
})
})
This will return:
strings
pi 5
in 2
pie 2
ie 2
Frequency of each word in a set of strings
Pipes do the job.
df <- data.frame(column_x = c("hello world", "hello morning hello",
"bye bye world"), stringsAsFactors = FALSE)
require(dplyr)
df$column_x %>%
na.omit() %>%
tolower() %>%
strsplit(split = " ") %>% # or strsplit(split = "\\W")
unlist() %>%
table() %>%
sort(decreasing = TRUE)
Counting number of instances of a condition per row R
You can use rowSums
.
df$no_calls <- rowSums(df == "nc")
df
# rsID sample1 sample2 sample3 sample1304 no_calls
#1 abcd aa bb nc nc 2
#2 efgh nc nc nc nc 4
#3 ijkl aa ab aa nc 1
Or, as pointed out by MrFlick, to exclude the first column from the row sums, you can slightly modify the approach to
df$no_calls <- rowSums(df[-1] == "nc")
Regarding the row names: They are not counted in rowSums
and you can make a simple test to demonstrate it:
rownames(df)[1] <- "nc" # name first row "nc"
rowSums(df == "nc") # compute the row sums
#nc 2 3
# 2 4 1 # still the same in first row
Related Topics
How to Tell Lapply to Ignore an Error and Process the Next Thing in the List
Adding Percentage Labels to a Bar Chart in Ggplot2
How to Add a General Label to Facets in Ggplot2
Fill and Border Colour in Geom_Point (Scale_Colour_Manual) in Ggplot
Subsetting Data.Table Using Variables with Same Name as Column
How to Make R Beep/Play a Sound at the End of a Script
How to Draw Stacked Bars in Ggplot2 That Show Percentages Based on Group
Differencebetween Assign() and <<- in R
Deleting Reversed Duplicates with R
Changing Line Colors with Ggplot()
Remove/Collapse Consecutive Duplicate Values in Sequence
Add Multiple Columns to R Data.Table in One Function Call
Alternative to Expand.Grid for Data.Frames
Find Common Substrings Between Two Character Variables