R: Count Unique Values by Category

Add count of unique / distinct values by group to the original data

Using ave (since you ask for it specifically):

within(df, { count <- ave(type, color, FUN=function(x) length(unique(x)))})

Make sure that type is character vector and not factor.


Since you also say your data is huge and that speed/performance may therefore be a factor, I'd suggest a data.table solution as well.

require(data.table)
setDT(df)[, count := uniqueN(type), by = color] # v1.9.6+
# if you don't want df to be modified by reference
ans = as.data.table(df)[, count := uniqueN(type), by = color]

uniqueN was implemented in v1.9.6 and is a faster equivalent of length(unique(.)). In addition it also works with data.frames/data.tables.


Other solutions:

Using plyr:

require(plyr)
ddply(df, .(color), mutate, count = length(unique(type)))

Using aggregate:

agg <- aggregate(data=df, type ~ color, function(x) length(unique(x)))
merge(df, agg, by="color", all=TRUE)

Count number of occurences for each unique value

Perhaps table is what you are after?

dummyData = rep(c(1,2, 2, 2), 25)

table(dummyData)
# dummyData
# 1 2
# 25 75

## or another presentation of the same data
as.data.frame(table(dummyData))
# dummyData Freq
# 1 1 25
# 2 2 75

How to count the number of unique values by group?

I think you've got it all wrong here. There is no need neither in plyr or <- when using data.table.

Recent versions of data.table, v >= 1.9.6, have a new function uniqueN() just for that.

library(data.table) ## >= v1.9.6
setDT(d)[, .(count = uniqueN(color)), by = ID]
# ID count
# 1: A 3
# 2: B 2

If you want to create a new column with the counts, use the := operator

setDT(d)[, count := uniqueN(color), by = ID]

Or with dplyr use the n_distinct function

library(dplyr)
d %>%
group_by(ID) %>%
summarise(count = n_distinct(color))
# Source: local data table [2 x 2]
#
# ID count
# 1 A 3
# 2 B 2

Or (if you want a new column) use mutate instead of summary

d %>%
group_by(ID) %>%
mutate(count = n_distinct(color))

Counting unique / distinct values by group in a data frame

A data.table approach

library(data.table)
DT <- data.table(myvec)

DT[, .(number_of_distinct_orders = length(unique(order_no))), by = name]

data.table v >= 1.9.5 has a built in uniqueN function now

DT[, .(number_of_distinct_orders = uniqueN(order_no)), by = name]

Count unique values over two columns per group

In summarise(), you could use across() to select multiple columns, unlist them to vectors and count the numbers of unique values by groups.

library(dplyr)

df %>%
group_by(gvkey, Year) %>%
summarise(n_unique = n_distinct(unlist(across(SICS1:SICS2)))) %>%
ungroup()

# # A tibble: 4 × 3
# gvkey Year n_unique
# <int> <int> <int>
# 1 1209 2017 3
# 2 1209 2018 6
# 3 1503 2017 3
# 4 1503 2018 3

Another way is that you need to stack SICS1 and SICS2 together first, and then you could count the number of unique values.

df %>%
tidyr::pivot_longer(SICS1:SICS2) %>%
group_by(gvkey, Year) %>%
summarise(n_unique = n_distinct(value)) %>%
ungroup()

R - Count unique/distinct values in two columns together per group

You can subset the data from cur_data() and unlist the data to get a vector. Use n_distinct to count number of unique values.

library(dplyr)

df %>%
group_by(ID) %>%
mutate(Count = n_distinct(unlist(select(cur_data(),
Party, Party2013)), na.rm = TRUE)) %>%
ungroup

# ID Wave Party Party2013 Count
# <int> <int> <chr> <chr> <int>
#1 1 1 A A 2
#2 1 2 A NA 2
#3 1 3 B NA 2
#4 1 4 B NA 2
#5 2 1 A C 3
#6 2 2 B NA 3
#7 2 3 B NA 3
#8 2 4 B NA 3

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Wave = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L), Party = c("A", "A", "B", "B", "A",
"B", "B", "B"), Party2013 = c("A", NA, NA, NA, "C", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))

Count unique values by group in R

We can use uniqueN from data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'group' and 'timepoint', get the length of unique elements of 'SID' (uniqueN(SID)).

 library(data.table)
setDT(df1)[, .(UnSID=uniqueN(SID)), .(group, timepoint)]

How to count unique values per subject ID in R

We need to remove the data$ as this will extract the full column

library(dplyr)
BehVari<- individData%>%
group_by(SubID)%>%
summarise(count = n_distinct(Rating.1))
BehVari

How to count unique values a column in R

We can use n_distinct() from dplyr to count the number of unique values for a column in a data frame.

textFile <- "id var1
111 A
109 A
112 A
111 A
108 A"

df <- read.table(text = textFile,header = TRUE)
library(dplyr)
df %>% summarise(count = n_distinct(id))

...and the output:

> df %>% summarise(count = n_distinct(id))
count
1 4

We can also summarise the counts within one or more by_group() columns.

textFile <- "id var1
111 A
109 A
112 A
111 A
108 A
201 B
202 B
202 B
111 B
112 B
109 B"

df <- read.table(text = textFile,header = TRUE)
df %>% group_by(var1) %>% summarise(count = n_distinct(id))

...and the output:

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 2
var1 count
<chr> <int>
1 A 4
2 B 5

Group by and count unique values in several columns in R

Here's an approach using dplyr::across, which is a handy way to calculate across multiple columns:

my_data <- data.frame(
city = c(rep("A", 3), rep("B", 3)),
col1 = 1:6,
col2 = 0,
col3 = c(1:3, 4, 4, 4),
col4 = 1:2
)

library(dplyr)
my_data %>%
group_by(city) %>%
summarize(across(col1:col4, n_distinct))

# A tibble: 2 x 5
city col1 col2 col3 col4
* <chr> <int> <int> <int> <int>
1 A 3 1 3 2
2 B 3 1 1 2


Related Topics



Leave a reply



Submit