Add Count of Unique/Distinct Values by Group to the Original Data

Add count of unique / distinct values by group to the original data

Using ave (since you ask for it specifically):

within(df, { count <- ave(type, color, FUN=function(x) length(unique(x)))})

Make sure that type is character vector and not factor.


Since you also say your data is huge and that speed/performance may therefore be a factor, I'd suggest a data.table solution as well.

require(data.table)
setDT(df)[, count := uniqueN(type), by = color] # v1.9.6+
# if you don't want df to be modified by reference
ans = as.data.table(df)[, count := uniqueN(type), by = color]

uniqueN was implemented in v1.9.6 and is a faster equivalent of length(unique(.)). In addition it also works with data.frames/data.tables.


Other solutions:

Using plyr:

require(plyr)
ddply(df, .(color), mutate, count = length(unique(type)))

Using aggregate:

agg <- aggregate(data=df, type ~ color, function(x) length(unique(x)))
merge(df, agg, by="color", all=TRUE)

Counting unique / distinct values by group in a data frame

This should do the trick:

ddply(myvec,~name,summarise,number_of_distinct_orders=length(unique(order_no)))

This requires package plyr.

R - Count unique/distinct values in two columns together per group

You can subset the data from cur_data() and unlist the data to get a vector. Use n_distinct to count number of unique values.

library(dplyr)

df %>%
group_by(ID) %>%
mutate(Count = n_distinct(unlist(select(cur_data(),
Party, Party2013)), na.rm = TRUE)) %>%
ungroup


# ID Wave Party Party2013 Count
# <int> <int> <chr> <chr> <int>
#1 1 1 A A 2
#2 1 2 A NA 2
#3 1 3 B NA 2
#4 1 4 B NA 2
#5 2 1 A C 3
#6 2 2 B NA 3
#7 2 3 B NA 3
#8 2 4 B NA 3

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Wave = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L), Party = c("A", "A", "B", "B", "A",
"B", "B", "B"), Party2013 = c("A", NA, NA, NA, "C", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))

How to count the number of unique values by group?

I think you've got it all wrong here. There is no need neither in plyr or <- when using data.table.

Recent versions of data.table, v >= 1.9.6, have a new function uniqueN() just for that.

library(data.table) ## >= v1.9.6
setDT(d)[, .(count = uniqueN(color)), by = ID]
# ID count
# 1: A 3
# 2: B 2

If you want to create a new column with the counts, use the := operator

setDT(d)[, count := uniqueN(color), by = ID]

Or with dplyr use the n_distinct function

library(dplyr)
d %>%
group_by(ID) %>%
summarise(count = n_distinct(color))
# Source: local data table [2 x 2]
#
# ID count
# 1 A 3
# 2 B 2

Or (if you want a new column) use mutate instead of summary

d %>%
group_by(ID) %>%
mutate(count = n_distinct(color))

R: Add count for unique values within Group, disregarding other variables within dataframe

We need a group by n_distinct

library(dplyr)
df %>%
group_by(group) %>%
mutate(count = n_distinct(state)) %>%
ungroup

R group_by and count distinct values in dataframe column with condition, using mutate

Since c is unique, you can approach it from the other way - count the number of c values that show up in val.

df %>% 
group_by(id) %>%
mutate(distinctValues = sum(c %in% val))
# # A tibble: 14 x 3
# # Groups: id [6]
# id val distinctValues
# <dbl> <dbl> <int>
# 1 1 100 0
# 2 1 100 0
# 3 2 200 1
# 4 2 300 1
# 5 3 400 0
# 6 4 500 1
# 7 4 500 1
# 8 5 500 1
# 9 5 600 1
# 10 5 600 1
# 11 6 200 2
# 12 6 200 2
# 13 6 300 2
# 14 6 500 2

You could also use distinctValues = sum(unique(val) %in% c) if that seems clearer - it might be a tad less efficient, but not enough to matter unless your data is massive.

Group by and count unique values in several columns in R

Here's an approach using dplyr::across, which is a handy way to calculate across multiple columns:

my_data <- data.frame(
city = c(rep("A", 3), rep("B", 3)),
col1 = 1:6,
col2 = 0,
col3 = c(1:3, 4, 4, 4),
col4 = 1:2
)

library(dplyr)
my_data %>%
group_by(city) %>%
summarize(across(col1:col4, n_distinct))

# A tibble: 2 x 5
city col1 col2 col3 col4
* <chr> <int> <int> <int> <int>
1 A 3 1 3 2
2 B 3 1 1 2

New data frame with unique values and counts

The expected output is not clear. Some assumptions of expected output

  1. Sum of 'N' by 'date'
library(data.table)
dt[, .(N = sum(N, na.rm = TRUE)), by = date]

  1. Count of unique 'article_id' for each date
dt1[, .(N = uniqueN(article_id)), by = date]

  1. Get the first count by 'date'
dt1[, .(N = first(N)), by = date]

R group by | count distinct values grouping by another column

One way

test_df |>
distinct() |>
count(post_pagename)

# post_pagename n
# <fct> <int>
# 1 A 3
# 2 B 2
# 3 C 1
# 4 D 1

Or another

test_df |>
group_by(post_pagename) |>
summarise(distinct_visit_ids = n_distinct(visit_id))

# A tibble: 4 x 2
# post_pagename distinct_visit_ids
# <fct> <int>
#1 A 3
#2 B 2
#3 C 1
#4 D 1

*D has one visit, so it must be counted*


Related Topics



Leave a reply



Submit