Count Number of Columns by a Condition (>) for Each Row

count the number of columns for each row by condition on character and missing

You could use rowSums to count number of NAs or empty values in each row and then subtract it from number of columns in the dataframe.

test$num <- ncol(test) - rowSums(is.na(test) | test == "")
test
# a b c d num
#1 aa aa aa 3
#2 bb <NA> bb 2
#3 cc aa <NA> 2
#4 dd <NA> <NA> 1
#5 cc cc 2
#6 <NA> dd dd dd 3

Count number of columns by a condition ( ) for each row

This will give you the vector you are looking for:

rowSums(data > 30)

It will work whether data is a matrix or a data.frame. Also, it uses vectorized functions, hence is a preferred approach over using apply which is little more than a (slow) for loop.

If data is a data.frame, you can add the result as a column by doing:

data$yr.above <- rowSums(data > 30)

or if data is a matrix:

data <- cbind(data, yr.above = rowSums(data > 30))

You can also create a whole new data.frame:

data.frame(yr.above = rowSums(data > 30))

or a whole new matrix:

cbind(yr.above = rowSums(data > 30))

Counting number of instances of a condition per row R

You can use rowSums.

df$no_calls <- rowSums(df == "nc")
df
# rsID sample1 sample2 sample3 sample1304 no_calls
#1 abcd aa bb nc nc 2
#2 efgh nc nc nc nc 4
#3 ijkl aa ab aa nc 1

Or, as pointed out by MrFlick, to exclude the first column from the row sums, you can slightly modify the approach to

df$no_calls <- rowSums(df[-1] == "nc")

Regarding the row names: They are not counted in rowSums and you can make a simple test to demonstrate it:

rownames(df)[1] <- "nc"  # name first row "nc"
rowSums(df == "nc") # compute the row sums
#nc 2 3
# 2 4 1 # still the same in first row

How to count number of columns by condition on another column

Using a dplyr approach:

library(dplyr)

data <- as.data.frame(cbind('01-01-2018' = c(1.2,3.1,0.7,-0.3,2.0), '02-01-2018' = c(-0.1, 2.4, 4.9,-3.3,-2.7), '03-01-2018' = c(3.4, -2.6, -1.8, 0.1, 0.3)))

data$mm <- apply(data,1,median)

data %>%
rowwise %>%
mutate(count = sum(c_across(1:3) > mm))

#> # A tibble: 5 × 5
#> # Rowwise:
#> `01-01-2018` `02-01-2018` `03-01-2018` mm count
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1.2 -0.1 3.4 1.2 1
#> 2 3.1 2.4 -2.6 2.4 1
#> 3 0.7 4.9 -1.8 0.7 1
#> 4 -0.3 -3.3 0.1 -0.3 1
#> 5 2 -2.7 0.3 0.3 1

Count number of columns with INPUT for each row

dplyr without rowwise you may do something like this

library(dplyr)

test1 %>% mutate(item_count = rowSums(cur_data() != ''))
#> # A tibble: 3 x 6
#> name1 name2 name3 name4 name5 item_count
#> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 B565 F226 "" "" "" 2
#> 2 W342 DUPLICATE "" "" "" 2
#> 3 H452 K632 "L553" "DUPLICATE" "R551" 5

Created on 2021-06-07 by the reprex package (v2.0.0)


For revised data

test1 %>% mutate(item_count = rowSums(!is.na(cur_data())))

# A tibble: 3 x 6
name1 name2 name3 name4 name5 item_count
<chr> <chr> <chr> <chr> <chr> <dbl>
1 B565 F226 NA NA NA 2
2 W342 DUPLICATE NA NA NA 2
3 H452 K632 L553 DUPLICATE R551 5

SQL - Using COUNT() as a WHERE condition

You can't use an aggregate (COUNT((NumKids>4)>2)) directly in a WHERE clause, that's what HAVING clauses are for.

Try the following query

select 
Animal, COUNT(*) AS Count
from Table
where NumKids > 4
group by Animal
having COUNT(*) >= 2

Count the number of column for each rows of a pandas where a condition holds

Compare all columns without last by column condition with DataFrame.eq and count Trues by sum:

data['new'] = data.iloc[:, :-1].eq(data['condition'], axis=0).sum(axis=1)

Another idea is compare all columns with remove condition col:

data['new'] = data.drop('condition', axis=1).eq(data['condition'], axis=0).sum(axis=1)

Thank you for comment @Sayandip Dutta, your idea is compare all columns and remove 1:

data['new'] = data.eq(data['condition'], axis=0).sum(axis=1).sub(1)

print (data)
w1 w2 w3 w4 w5 condition new
0 0 5 0 5 7 5 2
1 1 8 0 1 1 1 3
2 0 0 0 0 0 0 5

How to count number of times a condition is met across a row?

My translation of "If a value is not equal to 1, then count it" (using select_dtypes to consider only numeric columns):

df['sums'] = df.select_dtypes('number').ne(1).sum(axis=1)

print(df)

name one two three sums
0 steve 0.4 1.0 1.00 1
1 josh 0.8 0.1 1.00 2
2 mike 0.2 0.1 0.99 3

Count number of rows in each column in a dataframe that specify a specific condition

Your example data:

df <- data.frame(a = 1:3, b = 4:6)
threshold <- c(3, 6)

One option to resolve your question is to use sapply(), which applies a function over a list or vector. In this case, I create a vector for the columns in df with 1:ncol(df). Inside the function, you can count the number of values less than a given threshold by summing the number of TRUE cases:

col_num <- 1:ncol(df)
sapply(col_num, function(x) {sum(df[, x] < threshold[x])})

Or, in a single line:

sapply(1:ncol(df), function(x) {sum(df[, x] < threshold[x])})


Related Topics



Leave a reply



Submit