How to Select All Unique Combinations of Two Columns in an R Data Frame

How to filter for unique combination of columns from an R dataframe

The following should do it:

unique(df[,c('session','first','last')])

where df is your data frame.

How can I find the unique combinations based on two columns?

For the given data set, it is enough to check the column "Genus" for values appearing twice and then to remove the corresponding rows from the dataframe.

df %>% count(Genus) -> countGenus
filter(df, Genus %in% filter(countGenus,n==1)$Genus)

select unique combinations of some columns in R, and random value for another column

I figured out a fast and simple solution.

First, randomly permute the rows:

myD <- myD[sample(1:dim(myD)[1],replace=FALSE),]

Next, keep only the first row for each unique combination of x and y:

myD <- myD[!duplicated(myD[,c("x","y")]),]

Select rows from dataframe with unique combination of values from multiple columns

Have you tried distinct function from dplyr? For your case, it can be something like

library(dplyr)
df %>% distinct(team, opponent_team, date)

Another alternative is to use duplicated function from base R inside filter function of dplyr like below.

filter(!duplicated(team, opponent_team, date)

Creating a df of unique combinations of columns in R where order doesn't matter

A base R method is to create all the combination of political_spectrum_values taking 3 at a time using expand.grid, sort them by row and select unique rows.

df <- expand.grid(first_person = political_spectrum_values, 
second_person = political_spectrum_values,
third_person = political_spectrum_values)

df[] <- t(apply(df, 1, sort))
unique(df)

If needed as a single string

unique(apply(df, 1, function(x) paste0(sort(x), collapse = "_")))

Numbering rows based on unique combinations of multiple columns in R

We can use rowid from data.table

library(data.table)
df1$Id <- with(df1, rowid(Treatments, Replicates))

Or using data.table syntax

setDT(df1)[, Id := rowid(Treatments, Replicates))]

If we need the group id, use .GRP

setDT(df1)[, Id := .GRP, .(Treatments, Replicates)]

Or using dplyr

df1 %>%
group_by(Treatments, Replicates) %>%
mutate(Id = row_number())

To get the group indices, in the devel version

df1 %>%
group_by(Treatments, Replicates) %>%
mutate(Id = cur_group_id())

Or in the current dplyr version

df1 %>%
mutate(Id = group_indices(., Treatments, Replicates))

In base R, this can be done using ave

df1$Id <- with(df1, ave(seq_along(Treatments), Treatments,
Replicates, FUN = seq_along))

data

df1 <- structure(list(Treatments = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L), Replicates = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L), Value = c(4L, 5L, 7L, 9L, 25L, 39L, 43L, 24L,
12L, 9L, 4L, 2L), Id = c(NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_)), row.names = c(NA,
-12L), class = "data.frame")


Related Topics



Leave a reply



Submit