Counting unique values across variables (columns) in R
The trick is to use 'apply' and assign each row to a variable (e.g. x). You can then write a custom function, in this case one that uses 'unique' and 'length' to get the answer that you want.
df <- data.frame('2012'=c(3,5,6), '2009'=c(1,3,7), '2006'=c(4,2,3), '2003'=c(4,2,5), '2000'=c(1,3,6))
df$nunique = apply(df, 1, function(x) {length(unique(x))})
Count unique values across columns in R
Withtidyverse
, first convert factor columns to character, use map2
and split partners
to individual vector of strings and then count unique values combining with names
using n_distinct
.
library(tidyverse)
df %>%
mutate_all(as.character) %>%
mutate(uniquecounts = map2_dbl(names, partners,
~ n_distinct(c(.x, str_split(.y, ", ")[[1]]))))
# names partners uniquecounts
#1 John Mary, Ashley, John, Kate 4
#2 Mary Charlie, John, Mary, John 3
#3 Charlie Kate, Marcy 3
#4 David Mary, Claire 3
With same logic in base R
df[] <- lapply(df, as.character)
as.numeric(mapply(function(x, y) length(unique(c(x, y))),
df$names, strsplit(df$partners, ", ")))
#[1] 4 3 3 3
R - Count unique/distinct values in two columns together per group
You can subset the data from cur_data()
and unlist
the data to get a vector. Use n_distinct
to count number of unique values.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Count = n_distinct(unlist(select(cur_data(),
Party, Party2013)), na.rm = TRUE)) %>%
ungroup
# ID Wave Party Party2013 Count
# <int> <int> <chr> <chr> <int>
#1 1 1 A A 2
#2 1 2 A NA 2
#3 1 3 B NA 2
#4 1 4 B NA 2
#5 2 1 A C 3
#6 2 2 B NA 3
#7 2 3 B NA 3
#8 2 4 B NA 3
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Wave = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L), Party = c("A", "A", "B", "B", "A",
"B", "B", "B"), Party2013 = c("A", NA, NA, NA, "C", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))
Group by and count unique values in several columns in R
Here's an approach using dplyr::across
, which is a handy way to calculate across multiple columns:
my_data <- data.frame(
city = c(rep("A", 3), rep("B", 3)),
col1 = 1:6,
col2 = 0,
col3 = c(1:3, 4, 4, 4),
col4 = 1:2
)
library(dplyr)
my_data %>%
group_by(city) %>%
summarize(across(col1:col4, n_distinct))
# A tibble: 2 x 5
city col1 col2 col3 col4
* <chr> <int> <int> <int> <int>
1 A 3 1 3 2
2 B 3 1 1 2
How to count unique values over multiple columns using R?
You could unlist
and use table
to get count in base R :
stack(table(unlist(df)))
#Same as
#stack(table(as.matrix(df)))
If you prefer tidyverse
get data in long format using pivot_longer
and count
.
df %>%
tidyr::pivot_longer(cols = everything()) %>%
dplyr::count(value)
# A tibble: 5 x 2
# value n
# <chr> <int>
#1 home,leisure 1
#2 home,work 3
#3 leisure,work 1
#4 work,home 3
#5 work,home,leisure 1
data
df <- structure(list(X1 = c("home,work", "leisure,work", "home,leisure"
), X2 = c("work,home", "work,home,leisure", "work,home"), X3 = c("home,work",
"work,home", "home,work")), class = "data.frame", row.names = c(NA, -3L))
How to count unique values a column in R
We can use n_distinct()
from dplyr
to count the number of unique values for a column in a data frame.
textFile <- "id var1
111 A
109 A
112 A
111 A
108 A"
df <- read.table(text = textFile,header = TRUE)
library(dplyr)
df %>% summarise(count = n_distinct(id))
...and the output:
> df %>% summarise(count = n_distinct(id))
count
1 4
We can also summarise the counts within one or more by_group()
columns.
textFile <- "id var1
111 A
109 A
112 A
111 A
108 A
201 B
202 B
202 B
111 B
112 B
109 B"
df <- read.table(text = textFile,header = TRUE)
df %>% group_by(var1) %>% summarise(count = n_distinct(id))
...and the output:
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 2
var1 count
<chr> <int>
1 A 4
2 B 5
Find the count of unique values in all columns in a dataframe without including NA values (R)
You can use dplyr::n_distinct
with na.rm = T
:
library(dplyr)
sapply(dat, n_distinct, na.rm = T)
#map_dbl(dat, n_distinct, na.rm = T)
#nat_country age
# 3 8
In base R, you can use na.omit
as well:
sapply(dat, \(x) length(unique(na.omit(x))))
#nat_country age
# 3 8
Related Topics
Creating a Boxplot for Each Column in R
How to Sort a Data Frame by Alphabetic Order of a Character Variable in R
How to Save for Loop Results in Data Frame Using Cbind
How to Get to the Next Line in the R Command Prompt Without Executing
How to Force a Line Break in Rmarkdown'S Title
Creating Grouped Bar-Plot of Multi-Column Data in R
Concatenate String Columns and Order in Alphabetical Order
Fitting a Linear Model With Multiple Lhs
Rotating and Spacing Axis Labels in Ggplot2
Expert R Users, What's in Your .Rprofile
Extracting Specific Columns from a Data Frame
How to Change Language Settings in R
What Exactly Is Copy-On-Modify Semantics in R, and Where Is the Canonical Source
Ggplot Does Not Work If It Is Inside a For Loop Although It Works Outside of It
How to Generate Permutations or Combinations of Object in R
How to Spread Repeated Measures of Multiple Variables into Wide Format