R Count Distinct Elements Based on Two Columns by Group

R count distinct elements based on two columns by group

Here is a data.table option

data[, N := uniqueN(paste0(customer_id, account_id, "_")), by = start_date]
# customer_id account_id start_date N
# 1: 1 11 2017-01-01 4
# 2: 1 11 2017-01-01 4
# 3: 1 11 2017-01-01 4
# 4: 2 11 2017-01-01 4
# 5: 3 55 2017-01-01 4
# 6: 3 88 2017-01-01 4
# 7: 4 22 2018-02-02 3
# 8: 5 38 2018-02-02 3
# 9: 5 38 2018-02-02 3
#10: 6 13 2018-02-02 3

Or

data[, N := uniqueN(.SD, by = c("customer_id", "account_id")), by = start_date]

R - Count unique/distinct values in two columns together per group

You can subset the data from cur_data() and unlist the data to get a vector. Use n_distinct to count number of unique values.

library(dplyr)

df %>%
group_by(ID) %>%
mutate(Count = n_distinct(unlist(select(cur_data(),
Party, Party2013)), na.rm = TRUE)) %>%
ungroup


# ID Wave Party Party2013 Count
# <int> <int> <chr> <chr> <int>
#1 1 1 A A 2
#2 1 2 A NA 2
#3 1 3 B NA 2
#4 1 4 B NA 2
#5 2 1 A C 3
#6 2 2 B NA 3
#7 2 3 B NA 3
#8 2 4 B NA 3

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Wave = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L), Party = c("A", "A", "B", "B", "A",
"B", "B", "B"), Party2013 = c("A", NA, NA, NA, "C", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))

Count unique values over two columns per group

In summarise(), you could use across() to select multiple columns, unlist them to vectors and count the numbers of unique values by groups.

library(dplyr)

df %>%
group_by(gvkey, Year) %>%
summarise(n_unique = n_distinct(unlist(across(SICS1:SICS2)))) %>%
ungroup()

# # A tibble: 4 × 3
# gvkey Year n_unique
# <int> <int> <int>
# 1 1209 2017 3
# 2 1209 2018 6
# 3 1503 2017 3
# 4 1503 2018 3

Another way is that you need to stack SICS1 and SICS2 together first, and then you could count the number of unique values.

df %>%
tidyr::pivot_longer(SICS1:SICS2) %>%
group_by(gvkey, Year) %>%
summarise(n_unique = n_distinct(value)) %>%
ungroup()

Group by and count unique values in several columns in R

Here's an approach using dplyr::across, which is a handy way to calculate across multiple columns:

my_data <- data.frame(
city = c(rep("A", 3), rep("B", 3)),
col1 = 1:6,
col2 = 0,
col3 = c(1:3, 4, 4, 4),
col4 = 1:2
)

library(dplyr)
my_data %>%
group_by(city) %>%
summarize(across(col1:col4, n_distinct))

# A tibble: 2 x 5
city col1 col2 col3 col4
* <chr> <int> <int> <int> <int>
1 A 3 1 3 2
2 B 3 1 1 2

Grouping on two columns and counting distinct values using R

you are probably running plyr-package and dplyr-package at the same time. They both contain a function named summarise. If not plyr, then probably another package that contains a function named summarise.
Run ?summarise to inspect the available summarise-functions on your system.

Make sure you use summarise() from the dplyr package!!

library( dplyr )
d %>%
dplyr::group_by(A,B)%>%
dplyr::summarise(UNIQUE_COUNT = n_distinct(C)) # <-- dplyr

# # A tibble: 4 x 3
# # Groups: A [?]
# A B UNIQUE_COUNT
# <fct> <fct> <int>
# 1 A R1 2
# 2 A R2 1
# 3 B R1 2
# 4 B R2 1

d %>%
dplyr::group_by(A,B)%>%
plyr::summarise(UNIQUE_COUNT = n_distinct(C)) # <-- plyr

# UNIQUE_COUNT
# 1 4

R group by | count distinct values grouping by another column

One way

test_df |>
distinct() |>
count(post_pagename)

# post_pagename n
# <fct> <int>
# 1 A 3
# 2 B 2
# 3 C 1
# 4 D 1

Or another

test_df |>
group_by(post_pagename) |>
summarise(distinct_visit_ids = n_distinct(visit_id))

# A tibble: 4 x 2
# post_pagename distinct_visit_ids
# <fct> <int>
#1 A 3
#2 B 2
#3 C 1
#4 D 1

*D has one visit, so it must be counted*

Count number of unique values in two columns by group

df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
Dept = c(101, 101, 101, 102, 102, 103),
Emp_Id = c(1, 1, 2, 3, 4, 4))
library(dplyr)

df %>%
group_by(Webpage) %>%
summarise(n = n_distinct(Dept, Emp_Id))
#> # A tibble: 2 x 2
#> Webpage n
#> <dbl> <int>
#> 1 111 3
#> 2 222 2

library(data.table)
setDT(df)[, list(n = uniqueN(paste0(Dept, Emp_Id))), by = Webpage]
#> Webpage n
#> 1: 111 3
#> 2: 222 2

Created on 2021-03-30 by the reprex package (v1.0.0)

Counting unique values based on two columns with repeated rows, R data frame

You can use rle for runs and table for tabulation:

table(rle(df$column2)$values)

# A B
# 2 1

See ?rle and ?table for details.

Or, if you want to take advantage of column1 (which is derived from column2):

table(unique(df)$column2)

dplyr count unique values in two columns without reshaping long

An alternative is using c_across() after dplyr 1.0.0:

library(dplyr)

d %>%
group_by(Group) %>%
mutate(n = n_distinct(c_across(everything())))

# # A tibble: 6 x 4
# # Groups: Group [3]
# Group node1 node2 n
# <chr> <chr> <chr> <int>
# 1 A a w 2
# 2 B b r 3
# 3 B b t 3
# 4 C c z 4
# 5 C c u 4
# 6 C c i 4

Note: everything() in c_across() excludes grouping variables, i.e. Group, so actually n_distinct() takes c(node1, node2) as input. To specify variables, you can also use

  • c_across(node1:node2)
  • c_across(starts_with('node'))


Related Topics



Leave a reply



Submit