count the number of columns for each row by condition on character and missing
You could use rowSums
to count number of NA
s or empty values in each row and then subtract it from number of columns in the dataframe.
test$num <- ncol(test) - rowSums(is.na(test) | test == "")
test
# a b c d num
#1 aa aa aa 3
#2 bb <NA> bb 2
#3 cc aa <NA> 2
#4 dd <NA> <NA> 1
#5 cc cc 2
#6 <NA> dd dd dd 3
Count number of columns by a condition ( ) for each row
This will give you the vector you are looking for:
rowSums(data > 30)
It will work whether data
is a matrix or a data.frame. Also, it uses vectorized functions, hence is a preferred approach over using apply
which is little more than a (slow) for loop.
If data
is a data.frame, you can add the result as a column by doing:
data$yr.above <- rowSums(data > 30)
or if data
is a matrix:
data <- cbind(data, yr.above = rowSums(data > 30))
You can also create a whole new data.frame:
data.frame(yr.above = rowSums(data > 30))
or a whole new matrix:
cbind(yr.above = rowSums(data > 30))
Counting number of instances of a condition per row R
You can use rowSums
.
df$no_calls <- rowSums(df == "nc")
df
# rsID sample1 sample2 sample3 sample1304 no_calls
#1 abcd aa bb nc nc 2
#2 efgh nc nc nc nc 4
#3 ijkl aa ab aa nc 1
Or, as pointed out by MrFlick, to exclude the first column from the row sums, you can slightly modify the approach to
df$no_calls <- rowSums(df[-1] == "nc")
Regarding the row names: They are not counted in rowSums
and you can make a simple test to demonstrate it:
rownames(df)[1] <- "nc" # name first row "nc"
rowSums(df == "nc") # compute the row sums
#nc 2 3
# 2 4 1 # still the same in first row
How to count number of columns by condition on another column
Using a dplyr
approach:
library(dplyr)
data <- as.data.frame(cbind('01-01-2018' = c(1.2,3.1,0.7,-0.3,2.0), '02-01-2018' = c(-0.1, 2.4, 4.9,-3.3,-2.7), '03-01-2018' = c(3.4, -2.6, -1.8, 0.1, 0.3)))
data$mm <- apply(data,1,median)
data %>%
rowwise %>%
mutate(count = sum(c_across(1:3) > mm))
#> # A tibble: 5 × 5
#> # Rowwise:
#> `01-01-2018` `02-01-2018` `03-01-2018` mm count
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1.2 -0.1 3.4 1.2 1
#> 2 3.1 2.4 -2.6 2.4 1
#> 3 0.7 4.9 -1.8 0.7 1
#> 4 -0.3 -3.3 0.1 -0.3 1
#> 5 2 -2.7 0.3 0.3 1
Count number of columns with INPUT for each row
dplyr
without rowwise
you may do something like this
library(dplyr)
test1 %>% mutate(item_count = rowSums(cur_data() != ''))
#> # A tibble: 3 x 6
#> name1 name2 name3 name4 name5 item_count
#> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 B565 F226 "" "" "" 2
#> 2 W342 DUPLICATE "" "" "" 2
#> 3 H452 K632 "L553" "DUPLICATE" "R551" 5
Created on 2021-06-07 by the reprex package (v2.0.0)
For revised data
test1 %>% mutate(item_count = rowSums(!is.na(cur_data())))
# A tibble: 3 x 6
name1 name2 name3 name4 name5 item_count
<chr> <chr> <chr> <chr> <chr> <dbl>
1 B565 F226 NA NA NA 2
2 W342 DUPLICATE NA NA NA 2
3 H452 K632 L553 DUPLICATE R551 5
SQL - Using COUNT() as a WHERE condition
You can't use an aggregate (COUNT((NumKids>4)>2)
) directly in a WHERE
clause, that's what HAVING
clauses are for.
Try the following query
select
Animal, COUNT(*) AS Count
from Table
where NumKids > 4
group by Animal
having COUNT(*) >= 2
Count the number of column for each rows of a pandas where a condition holds
Compare all columns without last by column condition
with DataFrame.eq
and count True
s by sum
:
data['new'] = data.iloc[:, :-1].eq(data['condition'], axis=0).sum(axis=1)
Another idea is compare all columns with remove condition
col:
data['new'] = data.drop('condition', axis=1).eq(data['condition'], axis=0).sum(axis=1)
Thank you for comment @Sayandip Dutta, your idea is compare all columns and remove 1
:
data['new'] = data.eq(data['condition'], axis=0).sum(axis=1).sub(1)
print (data)
w1 w2 w3 w4 w5 condition new
0 0 5 0 5 7 5 2
1 1 8 0 1 1 1 3
2 0 0 0 0 0 0 5
How to count number of times a condition is met across a row?
My translation of "If a value is not equal to 1, then count it" (using select_dtypes
to consider only numeric columns):
df['sums'] = df.select_dtypes('number').ne(1).sum(axis=1)
print(df)
name one two three sums
0 steve 0.4 1.0 1.00 1
1 josh 0.8 0.1 1.00 2
2 mike 0.2 0.1 0.99 3
Count number of rows in each column in a dataframe that specify a specific condition
Your example data:
df <- data.frame(a = 1:3, b = 4:6)
threshold <- c(3, 6)
One option to resolve your question is to use sapply()
, which applies a function over a list or vector. In this case, I create a vector for the columns in df
with 1:ncol(df)
. Inside the function, you can count the number of values less than a given threshold by summing the number of TRUE cases:
col_num <- 1:ncol(df)
sapply(col_num, function(x) {sum(df[, x] < threshold[x])})
Or, in a single line:
sapply(1:ncol(df), function(x) {sum(df[, x] < threshold[x])})
Related Topics
Re-Ordering Factor Levels in Data Frame
How to Create a Loop That Includes Both a Code Chunk and Text with Knitr in R
Automatically Delete Files/Folders
Code to Import Data from a Stack Overflow Query into R
Non-Equi Join Using Data.Table: Column Missing from the Output
Using Lists Inside Data.Table Columns
How to Update R Packages in Default Library on Windows 7
Rstudio Rmarkdown: Both Portrait and Landscape Layout in a Single PDF
Add Multiple Columns to R Data.Table in One Function Call
Run a for Loop in Parallel in R
Remove Rows from Data Frame Where a Row Matches a String
How to Flatten a List of Lists
Align Ggplot2 Plots Vertically
Reverse Order of Discrete Y Axis in Ggplot2
How to Read Only Lines That Fulfil a Condition from a CSV into R
Proper Idiom for Adding Zero Count Rows in Tidyr/Dplyr