Count number of rows matching 1 criteria
You may try
library(dplyr)
data %>%
filter(A == 4, B == 2) %>%
nrow
[1] 1
data2 %>%
filter(C == "hat", D == "sock") %>%
nrow
[1] 1
where
data %>%
filter(A == 4, B == 2)
A B
1 4 2
data2 %>%
filter(C == "hat", D == "sock")
C D
1 hat sock
How to use R dplyr's summarize to count the number of rows that match a criteria?
You can use sum
on logical vectors - it will automatically convert them into numeric values (TRUE
being equal to 1 and FALSE
being equal to 0), so you need only do:
test %>%
group_by(location) %>%
summarize(total_score = sum(score),
n_outliers = sum(more_than_300))
#> # A tibble: 2 x 3
#> location total_score n_outliers
#> <chr> <dbl> <int>
#> 1 away 927 2
#> 2 home 552 0
Or, if these are your only 3 columns, an equivalent would be:
test %>%
group_by(location) %>%
summarize(across(everything(), sum))
In fact, you don't need to make the more_than_300
column - it would suffice to do:
test %>%
group_by(location) %>%
summarize(total_score = sum(score),
n_outliers = sum(score > 300))
get dataframe row count based on conditions
You are asking for the condition where all the conditions are true,
so len of the frame is the answer, unless I misunderstand what you are asking
In [17]: df = DataFrame(randn(20,4),columns=list('ABCD'))
In [18]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)]
Out[18]:
A B C D
12 0.491683 0.137766 0.859753 -1.041487
13 0.376200 0.575667 1.534179 1.247358
14 0.428739 1.539973 1.057848 -1.254489
In [19]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)].count()
Out[19]:
A 3
B 3
C 3
D 3
dtype: int64
In [20]: len(df[(df['A']>0) & (df['B']>0) & (df['C']>0)])
Out[20]: 3
Count rows matching a criteria relative to current row
Edit 2019-03-07 to cope with OP's expanded dataset
This can be solved by aggregating in a non-equi self-join
library(data.table)
# coerce character dates to IDate class
cols <- c("start", "end")
setDT(df)[, (cols) := lapply(.SD, as.IDate), .SDcols = cols]
# non-equi self-join and aggregate
tmp <- df[df, on = .(id, start <= end, end >= start), .N, by = .EACHI]
# append counts to original dataset
df[, overlapping.rows := tmp$N]
df
id start end overlapping.rows
1: 174095 2018-12-19 2018-12-31 2
2: 227156 2018-12-19 2018-12-31 1
3: 210610 2018-04-13 2018-09-27 1
4: 27677 2018-04-12 2018-04-26 2
5: 370474 2017-07-13 2017-08-19 1
6: 303693 2017-02-20 2017-04-09 1
7: 74744 2016-10-03 2016-11-05 1
8: 174095 2018-12-01 2018-12-18 2
9: 27677 2018-03-01 2018-05-29 2
10: 111111 2018-01-01 2018-01-31 1
11: 111111 2018-11-11 2018-12-31 1
12: 174095 2018-11-30 2018-12-25 3
Using data.table chaining the code can be written in a more compact but also more convoluted way:
library(data.table)
cols <- c("start", "end")
setDT(df)[, (cols) := lapply(.SD, as.IDate), .SDcols = cols][
, overlapping.rows := df[df, on = .(id, start <= end, end >= start), .N, by = .EACHI]$N][]
Note that the part to append the results to the original df
is based on Frank's comment.
My original attempt to use a second join to append the results to the original df
failed in case there are different counts for the same id
as pointed out by the OP. This can be fixed by including the row number in the second join:
library(data.table)
# coerce character dates to IDate class
cols <- c("start", "end")
setDT(df)[, (cols) := lapply(.SD, as.IDate), .SDcols = cols]
# append row number
tmp <- df[, rn := .I][
# non-equi self-join and aggregate
df, on = .(id, start <= end, end >= start), .(rn = i.rn, .N), by = .EACHI]
# append counts to original dataset by joining on row number
df[tmp, on = "rn", overlapping.rows := N][, rn := NULL]
df
id start end overlapping.rows
1: 174095 2018-12-19 2018-12-31 2
2: 227156 2018-12-19 2018-12-31 1
3: 210610 2018-04-13 2018-09-27 1
4: 27677 2018-04-12 2018-04-26 2
5: 370474 2017-07-13 2017-08-19 1
6: 303693 2017-02-20 2017-04-09 1
7: 74744 2016-10-03 2016-11-05 1
8: 174095 2018-12-01 2018-12-18 2
9: 27677 2018-03-01 2018-05-29 2
10: 111111 2018-01-01 2018-01-31 1
11: 111111 2018-11-11 2018-12-31 1
12: 174095 2018-11-30 2018-12-25 3
Explanation
The join condition in the non-equi join does the trick. Two intervals do not overlap if the first one ends before the second one starts or the first interval starts after the second interval has ended,
e1 < s2 OR e2 < s1
Now, if two intervals do intersect/overlap then the opposite of the above must be true. By negating and applying De Morgan's law we get the conditions
s2 <= e1 AND e2 >= s1
which are used in the non-equi join.
Data
OP's expanded dataset as described in OP's EDIT 2019-03-06:
library(data.table)
df <- fread("id start end
174095 2018-12-19 2018-12-31
227156 2018-12-19 2018-12-31
210610 2018-04-13 2018-09-27
27677 2018-04-12 2018-04-26
370474 2017-07-13 2017-08-19
303693 2017-02-20 2017-04-09
74744 2016-10-03 2016-11-05
174095 2018-12-01 2018-12-18
27677 2018-03-01 2018-05-29
111111 2018-01-01 2018-01-31
111111 2018-11-11 2018-12-31
174095 2018-11-30 2018-12-25")
R language count rows of dataframe matching a criteria
With base R:
nrow(population[population$Height > 1.70 & population$Weight > 60, ])
With dpylr:
library(dpylr)
population %>% filter(Height > 1.70 & Weight > 60) %>% nrow()
R function that counts rows where conditions are met
We can use rowSums
by making the vector c(1, 8, 4)
length same as the 'Task' columns length and do a ==
, and get the rowSums
i1 <- startsWith(names(df1), 'Task')
df1$COUNT <- rowSums(df1[i1] == c(1, 8, 4)[col(df1[i1])])
df1$COUNT
#[1] 1 1 2 1 3
Or with sweep
rowSums(sweep(df1[i1], 2, c(1, 8, 4), `==`))
Or another option is apply
df1$COUNT <- apply(df1[i1], 1, function(x) sum(x == c(1, 8, 4)))
NOTE: None of the solutions require any external package
data
df1 <- data.frame(Participant = 1:5, Task1 = c(4, 3, 1, 5, 1),
Task2 = c(8, 8, 3, 6, 8), Task3 = c(1, 7, 4, 4, 4))
Count Number of Rows in a Dataframe that Match Dynamic Conditions
Here is an approach using dplyr
and purrr
:
library(dplyr)
library(purrr)
df %>%
group_by(ID) %>%
mutate(xp = map_int(year, function(x) sum(cur_data()$year < x)))
purrr::map_int
runs the anonymous function for all elements of the year
column. dplyr::cur_data()
returns the data of the current group as a data frame.
Related Topics
Data.Frame Without Ruining Column Names
How to Generate Distributions Given, Mean, Sd, Skew and Kurtosis in R
How to Pass Parameters to a Shiny App via Url
Apply a Function Over Groups of Columns
Cumulative Sum Until Maximum Reached, Then Repeat from Zero in the Next Row
How to Fit a Smooth Curve to My Data in R
Merging Two Columns into One in R
Recommendations for Windows Text Editor for R
Libstdc++.So.6: Version 'Glibcxx_3.4.26' Not Found on Linux
Finding Out Which Functions Are Called Within a Given Function
Ggmap Error: Geomrasterann Was Built with an Incompatible Version of Ggproto
Returning Anonymous Functions from Lapply - What Is Going Wrong
Find Start and End Positions/Indices of Runs/Consecutive Values
Remove Grid, Background Color, and Top and Right Borders from Ggplot2
Similarity Scores Based on String Comparison in R (Edit Distance)