Counting the Number of Values Greater Than 0 in R in Multiple Columns

counting the number of values greater than 0 in R in multiple columns

We can use

colSums(myDF[c("L2", "L3", "L4")] > 0)

Count rows which have value 0 for each column (in R)

Try this solution with base R. One feature of matrices is that they can be easily indexed so that you can apply functions at row or column level like your question. Here the code:

#Code
colSums(ma>0)

Output:

colSums(ma>0)
x y
2 4

Some data used:

#Data
ma <- structure(c(0, 0, 2, 0.3, 3, 0.1, 2, 1), .Dim = c(4L, 2L), .Dimnames = list(
c("a", "b", "c", "d"), c("x", "y")))

You can also save the output like this:

#Save
res <- colSums(ma>0)

Number of column values greater than 0 for given row?

You just need to use the apply function:

## Example Data
dd = data.frame(col1 = c(0, .3, 0), col2=c(0, 1, 0),
col3=c(0.4, 0, 0.8))
apply(dd, 1, function(i) sum(i > 0))

So to add this too your existing data frame:

dd$col4 =  apply(dd, 1, function(i) sum(i > 0))

Alternatively, we could convert the data frame to logical values then use rowSums

rowSums(dd > 0)

Count number of non-NA values greater than 0 by group

We can use

colSums(df[c("L2", "L3", "L4")] > 0, na.rm = TRUE)

Or you may want a sum per person:

m <- rowsum((df[c("L2", "L3", "L4")] > 0) + 0, df[["Name"]], na.rm = TRUE)

# L2 L3 L4
#Carl 1 1 2
#Joe 1 2 1

There is something fun here. df[c("L2", "L3", "L4")] > 0 is a logical matrix (with NA):

  • Although colSums can work with it without trouble, rowsum can not. So a fix is to add a 0 to this matrix to cast it to a 0-1 numerical matrix;
  • when adding this 0, we must do (df[c("L2", "L3", "L4")] > 0) + 0 not df[c("L2", "L3", "L4")] > 0 + 0. The operation precedence in R means + is prior to >. Have a try on this toy example:

    5 > 4 + 0  ## FALSE
    (5 > 4) + 0 ## 1

    So we want a bracket to evaluate > first, then +.

If you want the result to be a data frame, just cast the resulting matrix into a data frame by:

data.frame(m)

Follow-up

People stop responding, because your specific question on getting a function is less interesting than getting the summary dataset.

Well, if you still take my approach, I would define such function as:

extract <- function (person) {
m <- rowsum((df[c("L2", "L3", "L4")] > 0) + 0, df[["Name"]], na.rm = TRUE)
rowSums(m)[[person]]
}

Then you can call

extract("Joe")
# 4
extract("Carl")
# 4

Note, this is obviously not the most efficient way to write such a function. Because if you only want to extract the sum for one person, there is no need to proceed all data. We can do:

extract2 <- function (person) {
## subset data
sub <- subset(df, df$Name == person, select = c("L2", "L3", "L4"))
## get sum
sum(sub > 0, na.rm = TRUE)
}

Then you can call

extract2("Joe")
# 4
extract2("Carl")
# 4

Count occurrences of value in multiple columns with duplicates

You could just subset the to vector:

data.table(table(unlist(toy_data[,c(from,to[to!=from])])))

V1 N
1: A 3
2: B 1
3: C 2
4: D 1
5: E 2
6: F 1

R count values larger than zero in data frame columns

Here is one way:

> mat <- data.frame(A=c(12,10,0,14,0,60),B=c(0,0,0,0,13,65))
>
> keep <- (colSums(mat > 0) / nrow(mat)) > 0.5
> keep
A B
TRUE FALSE
>
> mat[, keep, drop = FALSE]
A
1 12
2 10
3 0
4 14
5 0
6 60

Counting values greater than 0 in a given area (specific Rows * Columns) - Python, Excel, Pandas

Use a boolean mask and sum it:

N = sum((df['Participant'] == 1) & (df['Condition'] == 1) & (df['RT'].notna()))
print(N)

# Output
1

Details:

m1 = df['Participant'] == 1
m2 = df['Condition'] == 1
m3 = df['RT'].notna()
df[['m1', 'm2', 'm3']] = pd.concat([m1, m2, m3], axis=1)
print(df)

# Output
Participant Condition RT m1 m2 m3
0 1 1 0.10 True True True # All True, N = 1
1 1 1 NaN True True False
2 1 2 0.48 True False True
3 2 1 1.20 False True True
4 2 2 NaN False False False
5 2 2 0.58 False False True


Related Topics



Leave a reply



Submit