Add count of unique / distinct values by group to the original data
Using ave
(since you ask for it specifically):
within(df, { count <- ave(type, color, FUN=function(x) length(unique(x)))})
Make sure that type
is character vector and not factor.
Since you also say your data is huge and that speed/performance may therefore be a factor, I'd suggest a data.table
solution as well.
require(data.table)
setDT(df)[, count := uniqueN(type), by = color] # v1.9.6+
# if you don't want df to be modified by reference
ans = as.data.table(df)[, count := uniqueN(type), by = color]
uniqueN
was implemented in v1.9.6
and is a faster equivalent of length(unique(.))
. In addition it also works with data.frames/data.tables.
Other solutions:
Using plyr:
require(plyr)
ddply(df, .(color), mutate, count = length(unique(type)))
Using aggregate
:
agg <- aggregate(data=df, type ~ color, function(x) length(unique(x)))
merge(df, agg, by="color", all=TRUE)
Count number of occurences for each unique value
Perhaps table is what you are after?
dummyData = rep(c(1,2, 2, 2), 25)
table(dummyData)
# dummyData
# 1 2
# 25 75
## or another presentation of the same data
as.data.frame(table(dummyData))
# dummyData Freq
# 1 1 25
# 2 2 75
How to count the number of unique values by group?
I think you've got it all wrong here. There is no need neither in plyr
or <-
when using data.table
.
Recent versions of data.table, v >= 1.9.6, have a new function uniqueN()
just for that.
library(data.table) ## >= v1.9.6
setDT(d)[, .(count = uniqueN(color)), by = ID]
# ID count
# 1: A 3
# 2: B 2
If you want to create a new column with the counts, use the :=
operator
setDT(d)[, count := uniqueN(color), by = ID]
Or with dplyr
use the n_distinct
function
library(dplyr)
d %>%
group_by(ID) %>%
summarise(count = n_distinct(color))
# Source: local data table [2 x 2]
#
# ID count
# 1 A 3
# 2 B 2
Or (if you want a new column) use mutate
instead of summary
d %>%
group_by(ID) %>%
mutate(count = n_distinct(color))
Counting unique / distinct values by group in a data frame
A data.table
approach
library(data.table)
DT <- data.table(myvec)
DT[, .(number_of_distinct_orders = length(unique(order_no))), by = name]
data.table
v >= 1.9.5 has a built in uniqueN
function now
DT[, .(number_of_distinct_orders = uniqueN(order_no)), by = name]
Count unique values over two columns per group
In summarise()
, you could use across()
to select multiple columns, unlist them to vectors and count the numbers of unique values by groups.
library(dplyr)
df %>%
group_by(gvkey, Year) %>%
summarise(n_unique = n_distinct(unlist(across(SICS1:SICS2)))) %>%
ungroup()
# # A tibble: 4 × 3
# gvkey Year n_unique
# <int> <int> <int>
# 1 1209 2017 3
# 2 1209 2018 6
# 3 1503 2017 3
# 4 1503 2018 3
Another way is that you need to stack SICS1
and SICS2
together first, and then you could count the number of unique values.
df %>%
tidyr::pivot_longer(SICS1:SICS2) %>%
group_by(gvkey, Year) %>%
summarise(n_unique = n_distinct(value)) %>%
ungroup()
R - Count unique/distinct values in two columns together per group
You can subset the data from cur_data()
and unlist
the data to get a vector. Use n_distinct
to count number of unique values.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Count = n_distinct(unlist(select(cur_data(),
Party, Party2013)), na.rm = TRUE)) %>%
ungroup
# ID Wave Party Party2013 Count
# <int> <int> <chr> <chr> <int>
#1 1 1 A A 2
#2 1 2 A NA 2
#3 1 3 B NA 2
#4 1 4 B NA 2
#5 2 1 A C 3
#6 2 2 B NA 3
#7 2 3 B NA 3
#8 2 4 B NA 3
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Wave = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L), Party = c("A", "A", "B", "B", "A",
"B", "B", "B"), Party2013 = c("A", NA, NA, NA, "C", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))
Count unique values by group in R
We can use uniqueN
from data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'group' and 'timepoint', get the length of unique
elements of 'SID' (uniqueN(SID)
).
library(data.table)
setDT(df1)[, .(UnSID=uniqueN(SID)), .(group, timepoint)]
How to count unique values per subject ID in R
We need to remove the data$
as this will extract the full column
library(dplyr)
BehVari<- individData%>%
group_by(SubID)%>%
summarise(count = n_distinct(Rating.1))
BehVari
How to count unique values a column in R
We can use n_distinct()
from dplyr
to count the number of unique values for a column in a data frame.
textFile <- "id var1
111 A
109 A
112 A
111 A
108 A"
df <- read.table(text = textFile,header = TRUE)
library(dplyr)
df %>% summarise(count = n_distinct(id))
...and the output:
> df %>% summarise(count = n_distinct(id))
count
1 4
We can also summarise the counts within one or more by_group()
columns.
textFile <- "id var1
111 A
109 A
112 A
111 A
108 A
201 B
202 B
202 B
111 B
112 B
109 B"
df <- read.table(text = textFile,header = TRUE)
df %>% group_by(var1) %>% summarise(count = n_distinct(id))
...and the output:
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 2
var1 count
<chr> <int>
1 A 4
2 B 5
Group by and count unique values in several columns in R
Here's an approach using dplyr::across
, which is a handy way to calculate across multiple columns:
my_data <- data.frame(
city = c(rep("A", 3), rep("B", 3)),
col1 = 1:6,
col2 = 0,
col3 = c(1:3, 4, 4, 4),
col4 = 1:2
)
library(dplyr)
my_data %>%
group_by(city) %>%
summarize(across(col1:col4, n_distinct))
# A tibble: 2 x 5
city col1 col2 col3 col4
* <chr> <int> <int> <int> <int>
1 A 3 1 3 2
2 B 3 1 1 2
Related Topics
Assign Headers Based on Existing Row in Dataframe in R
R: Arranging Multiple Plots Together Using Gridextra
How to Remove "Rows" with a Na Value
Clustering List for Hclust Function
How to Change the Default Font Size in Ggplot2
"Un-Register" a Doparallel Cluster
How to Make a Post Request with Header and Data Options in R Using Httr::Post
Gathering Wide Columns into Multiple Long Columns Using Pivot_Longer
Speeding Up Julia's Poorly Written R Examples
Circular Heatmap That Looks Like a Donut
Confidence Intervals for Predictions from Logistic Regression
Subset Dataframe Such That All Values in Each Row Are Less Than a Certain Value
How to Get Code Completion for R in Emacs Ess Similar to What Is Available in Rstudio
Avoid That Space in Column Name Is Replaced with Period (".") When Using Read.Csv()