Unique Rows, Considering Two Columns, in R, Without Order

Unique rows, considering two columns, in R, without order

There are lot's of ways to do this, here is one:

unique(t(apply(df, 1, sort)))
duplicated(t(apply(df, 1, sort)))

One gives the unique rows, the other gives the mask.

Subset with unique cases, based on multiple columns

You can use the duplicated() function to find the unique combinations:

> df[!duplicated(df[1:3]),]
v1 v2 v3 v4 v5
1 7 1 A 100 98
2 7 2 A 98 97
3 8 1 C NA 80
6 9 3 C 75 75

To get only the duplicates, you can check it in both directions:

> df[duplicated(df[1:3]) | duplicated(df[1:3], fromLast=TRUE),]
v1 v2 v3 v4 v5
3 8 1 C NA 80
4 8 1 C 78 75
5 8 1 C 50 62

Identifying unique pairs of values from two columns in a dataframe

We sort by row using apply with MARGIN=1, get a logical index using duplicated and then subset the original dataset based on that.

 myDf[!duplicated(t(apply(myDf, 1, sort))),]
# Var1 Var2
#1 dennis mennis
#2 marcus cool
#3 bat man

Unique on a dataframe with only selected columns

Ok, if it doesn't matter which value in the non-duplicated column you select, this should be pretty easy:

dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))
> dat[!duplicated(dat[,c('id','id2')]),]
id id2 somevalue
1 1 1 x
3 3 4 z

Inside the duplicated call, I'm simply passing only those columns from dat that I don't want duplicates of. This code will automatically always select the first of any ambiguous values. (In this case, x.)

Merge two columns in R present within a same data.frame without any conditions and finding unique values

Let's recreate your data:

DF <- read.table(text = "    V1 V2
4 b c
14 g h
10 d g
6 b f
2 a e
5 b e
12 e f
1 a b
3 a f
9 c h
11 d h
7 c d
8 c g
13 f g", header = TRUE, stringsAsFactors = FALSE)

Unlist the two columns into one vector and find unique values in that vector:

u1 <- unique(unlist(DF[, c("V1", "V2")]))
sort(u1)
#[1] "a" "b" "c" "d" "e" "f" "g" "h"

A second vector:

u2 <- c("d", "e", "f")

Find the intersection:

intersect(u1, u2)
#[1] "d" "e" "f"

Find the set difference:

setdiff(u1, u2)
#[1] "b" "g" "a" "c" "h"

Find the count of unique values in all columns in a dataframe without including NA values (R)

You can use dplyr::n_distinct with na.rm = T:

library(dplyr)
sapply(dat, n_distinct, na.rm = T)
#map_dbl(dat, n_distinct, na.rm = T)

#nat_country age
# 3 8

In base R, you can use na.omit as well:

sapply(dat, \(x) length(unique(na.omit(x))))
#nat_country age
# 3 8

How to get unique pairs from dataframe in R?

We can try with apply to loop through the rows, sort the elements, transpose the output, apply the duplicated, negate it to return a logical index of TRUE/FALSE for unique and duplicates and use that to subset the rows.

m1[!duplicated(t(apply(m1, 1, sort))),]
# [,1] [,2]
#[1,] "CHC.AU.Equity" "SGP.AU.Equity"
#[2,] "CMA.AU.Equity" "SGP.AU.Equity"
#[3,] "AJA.AU.Equity" "AOG.AU.Equity"
#[4,] "AJA.AU.Equity" "GOZ.AU.Equity"
#[5,] "AJA.AU.Equity" "SCG.AU.Equity"
#[6,] "ABP.AU.Equity" "AOG.AU.Equity"
#[7,] "AOG.AU.Equity" "FET.AU.Equity"

Find unique entries in otherwise identical rows

A data.table alternative. Coerce data frame to a data.table (setDT). Melt data to long format (melt(df, id.vars = "ID")).

Within each group defined by 'ID' and 'variable' (corresponding to the columns in the wide format) (by = .(ID, variable)), count number of unique values (uniqueN(value)) and check if it's equal to the number of rows in the subgroup (== .N). If so (if), select the entire subgroup (.SD).

Finally, reshape the data back to wide format (dcast).

library(data.table)
setDT(df)
d = melt(df, id.vars = "ID")
dcast(d[ , if(uniqueN(value) == .N) .SD, by = .(ID, variable)], ID + rowid(ID, variable) ~ variable)
# ID ID_1 x2 x3 x5
# 1: 1 1 <NA> 7 x
# 2: 1 2 <NA> 10 p
# 3: 3 1 c 9 z
# 4: 3 2 d 11 q


Related Topics



Leave a reply



Submit