## Count number of records returned by group by

You can do both in one query using the OVER clause on another COUNT

`select`

count(*) RecordsPerGroup,

COUNT(*) OVER () AS TotalRecords

from temptable

group by column_1, column_2, column_3, column_4

## Count number of rows within each group

Current best practice (tidyverse) is:

`require(dplyr)`

df1 %>% count(Year, Month)

## count number of rows in a data frame in R based on group

Here's an example that shows how `table(.)`

(or, more closely matching your desired output, `data.frame(table(.))`

does what it sounds like you are asking for.

Note also how to share reproducible sample data in a way that others can copy and paste into their session.

Here's the (reproducible) sample data:

`mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L), `

MONTH.YEAR = c("JAN. 2012", "JAN. 2012",

"FEB. 2012", "FEB. 2012",

"MAR. 2012"),

VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)),

.Names = c("ID", "MONTH.YEAR", "VALUE"),

class = "data.frame", row.names = c(NA, -5L))

mydf

# ID MONTH.YEAR VALUE

# 1 110 JAN. 2012 1000

# 2 111 JAN. 2012 2000

# 3 121 FEB. 2012 3000

# 4 131 FEB. 2012 4000

# 5 141 MAR. 2012 5000

Here's the calculation of the number of rows per group, in two output display formats:

`table(mydf$MONTH.YEAR)`

#

# FEB. 2012 JAN. 2012 MAR. 2012

# 2 2 1

data.frame(table(mydf$MONTH.YEAR))

# Var1 Freq

# 1 FEB. 2012 2

# 2 JAN. 2012 2

# 3 MAR. 2012 1

## Count rows within each group when condition is satisfied Sql Server

You can do this with two levels of aggregation:

`select id, count(*) howManyMonths`

from (

select id

from mytable

group by id, year(date), month(date)

having avg(1.0 * isFull) > 0.6

) t

group by id

The subquery aggregates by id, year and month, and uses a `having`

clause to filter on groups that meet the success rate (`avg()`

comes handy for this). The outer query counts how many month passed the target rate for each id.

## How can I count the number of rows within each group using SQL?

`SELECT T.SITE,T.DATE,COUNT(*)CNTT FROM YOUR_TABLE AS T GROUP BY T.SITE,T.DATE `

Based on your description you need something like this query

## How can I count the number of rows per group in Pandas?

Try for pandas 0.25+

`df.groupby(['year_of_award']).agg(number_of_rows=('award': 'count'))`

else

`df.groupby(['year_of_award']).agg({'award': 'count'}).rename(columns={'count': 'number_of_rows'})`

## How to count how many rows inside a group by group meets a certain criteria

I would suggest using `CASE WHEN`

(standard ISO SQL syntax) like in this example:

`SELECT a.category,`

SUM(CASE WHEN a.is_interesting = 1 THEN 1 END) AS conditional_count,

COUNT(*) group_count

FROM a

GROUP BY a.category

This will sum up values of 1 and null values (when the condition is false), which comes down to actually counting the records that meet the condition.

This will however return *null* when no records meet the conditions. If you want to have 0 in that case, you can either wrap the `SUM`

like this:

`COALESCE(SUM(CASE WHEN a.is_interesting = 1 THEN 1 END), 0)`

or, shorter, use `COUNT`

instead of `SUM`

:

`COUNT(CASE WHEN a.is_interesting = 1 THEN 1 END)`

For `COUNT`

it does not matter what value you put in the `THEN`

clause, as long as it is not *null*. It will count the instances where the expression is not *null*.

The addition of the `ELSE 0`

clause also generally returns 0 with `SUM`

:

`SUM(CASE WHEN a.is_interesting = 1 THEN 1 ELSE 0 END)`

There is however one boundary case where that `SUM`

will still return *null*. This is when there is no `GROUP BY`

clause and no records meet the `WHERE`

clause. For instance:

`SELECT SUM(CASE WHEN 1 = 1 THEN 1 ELSE 0 END)`

FROM a

WHERE 1 = 0

will return *null*, while the `COUNT`

or `COALESCE`

versions will still return 0.

## Add a column that count number of rows until the first 1, by group in R

`df <- data.frame(Group=c(1,1,1,1,2,2),`

var1=c(1,0,0,1,1,1),

var2=c(0,0,1,1,0,0),

var3=c(0,1,0,0,0,1))

This works for any number of variables as long as the structure is the same as in the example (i.e. Group + many variables that are 0 or 1)

`df %>% `

mutate(rownr = row_number()) %>%

pivot_longer(-c(Group, rownr)) %>%

group_by(Group, name) %>%

mutate(out = cumsum(value != 1 & (cumsum(value) < 1)) + 1,

out = ifelse(max(out) > n(), 0, max(out))) %>%

pivot_wider(names_from = c(name, name), values_from = c(value, out)) %>%

select(-rownr)

Returns:

`Group value_var1 value_var2 value_var3 out_var1 out_var2 out_var3`

<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>

1 1 1 0 0 1 3 2

2 1 0 0 1 1 3 2

3 1 0 1 0 1 3 2

4 1 1 1 0 1 3 2

5 2 1 0 0 1 0 2

6 2 1 0 1 1 0 2

## How can I count a number of conditional rows within r dplyr mutate?

Here is a `dplyr`

only solution:

The trick is to substract the grouping number of X (e.g. `cumsum(Product=="X")`

from the sum of X (e.g. `sum(Product=="X")`

in each `Customer`

group:

`library(dplyr)`

df %>%

arrange(Customer, Date) %>%

group_by(Customer) %>%

mutate(nSubsqX1 = sum(Product=="X") - cumsum(Product=="X"))

` Date Customer Product nSubsqX1`

<date> <chr> <chr> <int>

1 2020-05-18 A X 0

2 2020-02-10 B X 5

3 2020-02-12 B Y 5

4 2020-03-04 B Z 5

5 2020-03-29 B X 4

6 2020-04-08 B X 3

7 2020-04-30 B X 2

8 2020-05-13 B X 1

9 2020-05-23 B Y 1

10 2020-07-02 B Y 1

11 2020-08-26 B Y 1

12 2020-12-06 B X 0

13 2020-01-31 C X 3

14 2020-09-19 C X 2

15 2020-10-13 C X 1

16 2020-11-11 C X 0

17 2020-12-26 C Y 0

## If the number of rows in a group exceeds X number of observations, randomly sample X number of rows

Here is one way to group by group column and create a condition in `slice`

to check if the number of rows (`n()`

) is greater than 'X', sample the sequence of rows (`row_number()`

) with `X`

or else return `row_number()`

(or sample in case `X`

is different value

`library(dplyr)`

X <- 2

df %>%

group_by(group) %>%

slice(if(n() >= X) sample(row_number(), X, replace = FALSE) else

sample(row_number())) %>%

ungroup

-output

`# A tibble: 5 × 2`

id group

<int> <int>

1 10 1

2 8 2

3 4 2

4 1 3

5 9 3

