Counting Number of Instances of a Condition Per Row R

Counting number of instances of a condition per row R

You can use rowSums.

df$no_calls <- rowSums(df == "nc")
df
# rsID sample1 sample2 sample3 sample1304 no_calls
#1 abcd aa bb nc nc 2
#2 efgh nc nc nc nc 4
#3 ijkl aa ab aa nc 1

Or, as pointed out by MrFlick, to exclude the first column from the row sums, you can slightly modify the approach to

df$no_calls <- rowSums(df[-1] == "nc")

Regarding the row names: They are not counted in rowSums and you can make a simple test to demonstrate it:

rownames(df)[1] <- "nc"  # name first row "nc"
rowSums(df == "nc") # compute the row sums
#nc 2 3
# 2 4 1 # still the same in first row

Count occurrences of value in a set of variables in R (per row)

Try

apply(df,MARGIN=1,table)

Where df is your data.frame. This will return a list of the same length of the amount of rows in your data.frame. Each item of the list corresponds to a row of the data.frame (in the same order), and it is a table where the content is the number of occurrences and the names are the corresponding values.

For instance:

df=data.frame(V1=c(10,20,10,20),V2=c(20,30,20,30),V3=c(20,10,20,10))
#create a data.frame containing some data
df #show the data.frame
V1 V2 V3
1 10 20 20
2 20 30 10
3 10 20 20
4 20 30 10
apply(df,MARGIN=1,table) #apply the function table on each row (MARGIN=1)
[[1]]

10 20
1 2

[[2]]

10 20 30
1 1 1

[[3]]

10 20
1 2

[[4]]

10 20 30
1 1 1

#desired result

Counting number of rows if certain conditions are met

Try this:

library(dplyr)

df_count <- df %>% summarise(con1 = sum(B < 0 & C < 0),
con2 = sum(B > 0 & C > 0),
con3 = sum(B < 0 & C > 0),
con4 = sum(B > 0 & C < 0))

df_count
con1 con2 con3 con4
2 2 0 2

count the number of columns for each row by condition on character and missing

You could use rowSums to count number of NAs or empty values in each row and then subtract it from number of columns in the dataframe.

test$num <- ncol(test) - rowSums(is.na(test) | test == "")
test
# a b c d num
#1 aa aa aa 3
#2 bb <NA> bb 2
#3 cc aa <NA> 2
#4 dd <NA> <NA> 1
#5 cc cc 2
#6 <NA> dd dd dd 3

R function that counts rows where conditions are met

We can use rowSums by making the vector c(1, 8, 4) length same as the 'Task' columns length and do a ==, and get the rowSums

i1 <- startsWith(names(df1), 'Task')
df1$COUNT <- rowSums(df1[i1] == c(1, 8, 4)[col(df1[i1])])
df1$COUNT
#[1] 1 1 2 1 3

Or with sweep

rowSums(sweep(df1[i1], 2, c(1, 8, 4), `==`))

Or another option is apply

df1$COUNT <- apply(df1[i1], 1, function(x) sum(x == c(1, 8, 4)))

NOTE: None of the solutions require any external package

data

df1 <- data.frame(Participant = 1:5, Task1 = c(4, 3, 1, 5, 1),
Task2 = c(8, 8, 3, 6, 8), Task3 = c(1, 7, 4, 4, 4))

Count number of rows that fulfill multiple conditions in R

Depends on what you're measure of efficiency is but

sum(df$Ethnicity== 'Asian' & df$Set == 3)

R: count times per column a condition is met and row names appear in a list

We may do this with rowwise

library(dplyr)
df2 %>%
rowwise %>%
mutate(x = +(sum(df1[[rownames]][df1$rownames %in% x]) >= 5),
y = +(sum(df1[[rownames]][df1$rownames %in% y]) >= 5)) %>%
ungroup

-output

# A tibble: 3 × 5
rownames batch totalcount x y
<chr> <chr> <int> <int> <int>
1 sample1 a 10 1 0
2 sample2 b 15 1 1
3 sample3 a 6 0 1

Or based on the data, a base R option would be

out <- aggregate(. ~ grp, FUN = sum, 
transform(df1, grp = c('x', 'y')[1 + (rownames %in% y)] )[-1])
df2[out$grp] <- +(t(out[-1]) >= 5)

-output

> df2
rownames batch totalcount x y
1 sample1 a 10 1 0
2 sample2 b 15 1 1
3 sample3 a 6 0 1

data

df1 <- structure(list(rownames = c("m1", "m2", "m3", "m4"), sample1 = c(0L, 
1L, 6L, 3L), sample2 = c(5L, 7L, 2L, 1L), sample3 = c(1L, 5L,
0L, 0L)), class = "data.frame", row.names = c(NA, -4L))

df2 <- structure(list(rownames = c("sample1", "sample2", "sample3"),
batch = c("a", "b", "a"), totalcount = c(10L, 15L, 6L)),
class = "data.frame", row.names = c(NA,
-3L))

Count occurrence of string values per row in dataframe in R (dplyr)

You can use across with rowSums -

library(dplyr)

df %>% mutate(d9 = rowSums(across(all_of(cols), `%in%`, bcde)))

# d1 d2 d3 d4 d5 d6 d7 d8 d9
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#1 b a a a a a a a 0
#2 a a a a c a a a 1
#3 a b a a a a a a 1
#4 a a c a a b a a 2
#5 a a a a a a a a 0
#6 a a b a a a a a 1
#7 a a a a a d a a 1
#8 a a a d a a a a 1

This can also be written in base R -

df$d9 <- rowSums(sapply(df[cols], `%in%`, bcde))

Count number of columns by a condition ( ) for each row

This will give you the vector you are looking for:

rowSums(data > 30)

It will work whether data is a matrix or a data.frame. Also, it uses vectorized functions, hence is a preferred approach over using apply which is little more than a (slow) for loop.

If data is a data.frame, you can add the result as a column by doing:

data$yr.above <- rowSums(data > 30)

or if data is a matrix:

data <- cbind(data, yr.above = rowSums(data > 30))

You can also create a whole new data.frame:

data.frame(yr.above = rowSums(data > 30))

or a whole new matrix:

cbind(yr.above = rowSums(data > 30))


Related Topics



Leave a reply



Submit