Count Number of Rows Within Each Group

Count number of records returned by group by

You can do both in one query using the OVER clause on another COUNT

select
    count(*) RecordsPerGroup,
    COUNT(*) OVER () AS TotalRecords
from temptable
group by column_1, column_2, column_3, column_4

Count number of rows within each group

Current best practice (tidyverse) is:

require(dplyr)
df1 %>% count(Year, Month)

count number of rows in a data frame in R based on group

Here's an example that shows how table(.) (or, more closely matching your desired output, data.frame(table(.)) does what it sounds like you are asking for.

Note also how to share reproducible sample data in a way that others can copy and paste into their session.

Here's the (reproducible) sample data:

mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L), 
                       MONTH.YEAR = c("JAN. 2012", "JAN. 2012", 
                                      "FEB. 2012", "FEB. 2012", 
                                      "MAR. 2012"), 
                       VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)), 
                  .Names = c("ID", "MONTH.YEAR", "VALUE"), 
                  class = "data.frame", row.names = c(NA, -5L))

mydf
#    ID MONTH.YEAR VALUE
# 1 110  JAN. 2012  1000
# 2 111  JAN. 2012  2000
# 3 121  FEB. 2012  3000
# 4 131  FEB. 2012  4000
# 5 141  MAR. 2012  5000

Here's the calculation of the number of rows per group, in two output display formats:

table(mydf$MONTH.YEAR)
# 
# FEB. 2012 JAN. 2012 MAR. 2012 
#         2         2         1

data.frame(table(mydf$MONTH.YEAR))
#        Var1 Freq
# 1 FEB. 2012    2
# 2 JAN. 2012    2
# 3 MAR. 2012    1

Count rows within each group when condition is satisfied Sql Server

You can do this with two levels of aggregation:

select id, count(*) howManyMonths
from (
    select id
    from mytable
    group by id, year(date), month(date)
    having avg(1.0 * isFull) > 0.6
) t
group by id

The subquery aggregates by id, year and month, and uses a having clause to filter on groups that meet the success rate (avg() comes handy for this). The outer query counts how many month passed the target rate for each id.

How can I count the number of rows within each group using SQL?

SELECT T.SITE,T.DATE,COUNT(*)CNTT FROM YOUR_TABLE AS T GROUP BY T.SITE,T.DATE

Based on your description you need something like this query

How can I count the number of rows per group in Pandas?

Try for pandas 0.25+

df.groupby(['year_of_award']).agg(number_of_rows=('award': 'count'))

else

df.groupby(['year_of_award']).agg({'award': 'count'}).rename(columns={'count': 'number_of_rows'})

How to count how many rows inside a group by group meets a certain criteria

I would suggest using CASE WHEN (standard ISO SQL syntax) like in this example:

SELECT   a.category,
         SUM(CASE WHEN a.is_interesting = 1 THEN 1 END) AS conditional_count,
         COUNT(*) group_count
FROM     a
GROUP BY a.category

This will sum up values of 1 and null values (when the condition is false), which comes down to actually counting the records that meet the condition.

This will however return null when no records meet the conditions. If you want to have 0 in that case, you can either wrap the SUM like this:

COALESCE(SUM(CASE WHEN a.is_interesting = 1 THEN 1 END), 0)

or, shorter, use COUNT instead of SUM:

COUNT(CASE WHEN a.is_interesting = 1 THEN 1 END)

For COUNT it does not matter what value you put in the THEN clause, as long as it is not null. It will count the instances where the expression is not null.

The addition of the ELSE 0 clause also generally returns 0 with SUM:

SUM(CASE WHEN a.is_interesting = 1 THEN 1 ELSE 0 END)

There is however one boundary case where that SUM will still return null. This is when there is no GROUP BY clause and no records meet the WHERE clause. For instance:

SELECT SUM(CASE WHEN 1 = 1 THEN 1 ELSE 0 END)
FROM   a
WHERE  1 = 0

will return null, while the COUNT or COALESCE versions will still return 0.

Add a column that count number of rows until the first 1, by group in R

df <- data.frame(Group=c(1,1,1,1,2,2),
                 var1=c(1,0,0,1,1,1),
                 var2=c(0,0,1,1,0,0),
                 var3=c(0,1,0,0,0,1))

This works for any number of variables as long as the structure is the same as in the example (i.e. Group + many variables that are 0 or 1)

df %>% 
  mutate(rownr = row_number()) %>%
  pivot_longer(-c(Group, rownr)) %>%
  group_by(Group, name) %>%
  mutate(out = cumsum(value != 1 & (cumsum(value) < 1)) + 1,
         out = ifelse(max(out) > n(), 0, max(out))) %>% 
  pivot_wider(names_from = c(name, name), values_from = c(value, out)) %>% 
  select(-rownr)

Returns:

  Group value_var1 value_var2 value_var3 out_var1 out_var2 out_var3
  <dbl>      <dbl>      <dbl>      <dbl>    <dbl>    <dbl>    <dbl>
1     1          1          0          0        1        3        2
2     1          0          0          1        1        3        2
3     1          0          1          0        1        3        2
4     1          1          1          0        1        3        2
5     2          1          0          0        1        0        2
6     2          1          0          1        1        0        2

How can I count a number of conditional rows within r dplyr mutate?

Here is a dplyr only solution:

The trick is to substract the grouping number of X (e.g. cumsum(Product=="X") from the sum of X (e.g. sum(Product=="X") in each Customer group:

library(dplyr)

  df %>%
    arrange(Customer, Date) %>%
    group_by(Customer) %>%
    mutate(nSubsqX1 = sum(Product=="X") - cumsum(Product=="X"))

   Date       Customer Product nSubsqX1
   <date>     <chr>    <chr>      <int>
 1 2020-05-18 A        X              0
 2 2020-02-10 B        X              5
 3 2020-02-12 B        Y              5
 4 2020-03-04 B        Z              5
 5 2020-03-29 B        X              4
 6 2020-04-08 B        X              3
 7 2020-04-30 B        X              2
 8 2020-05-13 B        X              1
 9 2020-05-23 B        Y              1
10 2020-07-02 B        Y              1
11 2020-08-26 B        Y              1
12 2020-12-06 B        X              0
13 2020-01-31 C        X              3
14 2020-09-19 C        X              2
15 2020-10-13 C        X              1
16 2020-11-11 C        X              0
17 2020-12-26 C        Y              0

If the number of rows in a group exceeds X number of observations, randomly sample X number of rows

Here is one way to group by group column and create a condition in slice to check if the number of rows (n()) is greater than 'X', sample the sequence of rows (row_number()) with X or else return row_number() (or sample in case X is different value

library(dplyr)
X <- 2
df %>% 
  group_by(group) %>% 
  slice(if(n() >= X) sample(row_number(), X, replace = FALSE) else 
     sample(row_number())) %>%
  ungroup

-output

# A tibble: 5 × 2
     id group
  <int> <int>
1    10     1
2     8     2
3     4     2
4     1     3
5     9     3