Counting Unique Values Across Variables (Columns) in R

Counting unique values across variables (columns) in R

The trick is to use 'apply' and assign each row to a variable (e.g. x). You can then write a custom function, in this case one that uses 'unique' and 'length' to get the answer that you want.

df <- data.frame('2012'=c(3,5,6), '2009'=c(1,3,7), '2006'=c(4,2,3), '2003'=c(4,2,5), '2000'=c(1,3,6))

df$nunique = apply(df, 1, function(x) {length(unique(x))})

Count unique values across columns in R

Withtidyverse, first convert factor columns to character, use map2 and split partners to individual vector of strings and then count unique values combining with names using n_distinct.

library(tidyverse)

df %>%
mutate_all(as.character) %>%
mutate(uniquecounts = map2_dbl(names, partners,
~ n_distinct(c(.x, str_split(.y, ", ")[[1]]))))


# names partners uniquecounts
#1 John Mary, Ashley, John, Kate 4
#2 Mary Charlie, John, Mary, John 3
#3 Charlie Kate, Marcy 3
#4 David Mary, Claire 3

With same logic in base R

df[] <- lapply(df, as.character)
as.numeric(mapply(function(x, y) length(unique(c(x, y))),
df$names, strsplit(df$partners, ", ")))
#[1] 4 3 3 3

R - Count unique/distinct values in two columns together per group

You can subset the data from cur_data() and unlist the data to get a vector. Use n_distinct to count number of unique values.

library(dplyr)

df %>%
group_by(ID) %>%
mutate(Count = n_distinct(unlist(select(cur_data(),
Party, Party2013)), na.rm = TRUE)) %>%
ungroup


# ID Wave Party Party2013 Count
# <int> <int> <chr> <chr> <int>
#1 1 1 A A 2
#2 1 2 A NA 2
#3 1 3 B NA 2
#4 1 4 B NA 2
#5 2 1 A C 3
#6 2 2 B NA 3
#7 2 3 B NA 3
#8 2 4 B NA 3

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Wave = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L), Party = c("A", "A", "B", "B", "A",
"B", "B", "B"), Party2013 = c("A", NA, NA, NA, "C", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))

Group by and count unique values in several columns in R

Here's an approach using dplyr::across, which is a handy way to calculate across multiple columns:

my_data <- data.frame(
city = c(rep("A", 3), rep("B", 3)),
col1 = 1:6,
col2 = 0,
col3 = c(1:3, 4, 4, 4),
col4 = 1:2
)

library(dplyr)
my_data %>%
group_by(city) %>%
summarize(across(col1:col4, n_distinct))

# A tibble: 2 x 5
city col1 col2 col3 col4
* <chr> <int> <int> <int> <int>
1 A 3 1 3 2
2 B 3 1 1 2

How to count unique values over multiple columns using R?

You could unlist and use table to get count in base R :

stack(table(unlist(df)))
#Same as
#stack(table(as.matrix(df)))

If you prefer tidyverse get data in long format using pivot_longer and count.

df %>%
tidyr::pivot_longer(cols = everything()) %>%
dplyr::count(value)

# A tibble: 5 x 2
# value n
# <chr> <int>
#1 home,leisure 1
#2 home,work 3
#3 leisure,work 1
#4 work,home 3
#5 work,home,leisure 1

data

df <- structure(list(X1 = c("home,work", "leisure,work", "home,leisure"
), X2 = c("work,home", "work,home,leisure", "work,home"), X3 = c("home,work",
"work,home", "home,work")), class = "data.frame", row.names = c(NA, -3L))

How to count unique values a column in R

We can use n_distinct() from dplyr to count the number of unique values for a column in a data frame.

textFile <- "id var1
111 A
109 A
112 A
111 A
108 A"

df <- read.table(text = textFile,header = TRUE)
library(dplyr)
df %>% summarise(count = n_distinct(id))

...and the output:

> df %>% summarise(count = n_distinct(id))
count
1 4

We can also summarise the counts within one or more by_group() columns.

textFile <- "id var1
111 A
109 A
112 A
111 A
108 A
201 B
202 B
202 B
111 B
112 B
109 B"

df <- read.table(text = textFile,header = TRUE)
df %>% group_by(var1) %>% summarise(count = n_distinct(id))

...and the output:

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 2
var1 count
<chr> <int>
1 A 4
2 B 5

Find the count of unique values in all columns in a dataframe without including NA values (R)

You can use dplyr::n_distinct with na.rm = T:

library(dplyr)
sapply(dat, n_distinct, na.rm = T)
#map_dbl(dat, n_distinct, na.rm = T)

#nat_country age
# 3 8

In base R, you can use na.omit as well:

sapply(dat, \(x) length(unique(na.omit(x))))
#nat_country age
# 3 8


Related Topics



Leave a reply



Submit