Unique rows, considering two columns, in R, without order
There are lot's of ways to do this, here is one:
unique(t(apply(df, 1, sort)))
duplicated(t(apply(df, 1, sort)))
One gives the unique rows, the other gives the mask.
Subset with unique cases, based on multiple columns
You can use the duplicated()
function to find the unique combinations:
> df[!duplicated(df[1:3]),]
v1 v2 v3 v4 v5
1 7 1 A 100 98
2 7 2 A 98 97
3 8 1 C NA 80
6 9 3 C 75 75
To get only the duplicates, you can check it in both directions:
> df[duplicated(df[1:3]) | duplicated(df[1:3], fromLast=TRUE),]
v1 v2 v3 v4 v5
3 8 1 C NA 80
4 8 1 C 78 75
5 8 1 C 50 62
Identifying unique pairs of values from two columns in a dataframe
We sort
by row using apply
with MARGIN=1
, get a logical index using duplicated
and then subset the original dataset based on that.
myDf[!duplicated(t(apply(myDf, 1, sort))),]
# Var1 Var2
#1 dennis mennis
#2 marcus cool
#3 bat man
Unique on a dataframe with only selected columns
Ok, if it doesn't matter which value in the non-duplicated column you select, this should be pretty easy:
dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))
> dat[!duplicated(dat[,c('id','id2')]),]
id id2 somevalue
1 1 1 x
3 3 4 z
Inside the duplicated
call, I'm simply passing only those columns from dat
that I don't want duplicates of. This code will automatically always select the first of any ambiguous values. (In this case, x.)
Merge two columns in R present within a same data.frame without any conditions and finding unique values
Let's recreate your data:
DF <- read.table(text = " V1 V2
4 b c
14 g h
10 d g
6 b f
2 a e
5 b e
12 e f
1 a b
3 a f
9 c h
11 d h
7 c d
8 c g
13 f g", header = TRUE, stringsAsFactors = FALSE)
Unlist the two columns into one vector and find unique values in that vector:
u1 <- unique(unlist(DF[, c("V1", "V2")]))
sort(u1)
#[1] "a" "b" "c" "d" "e" "f" "g" "h"
A second vector:
u2 <- c("d", "e", "f")
Find the intersection:
intersect(u1, u2)
#[1] "d" "e" "f"
Find the set difference:
setdiff(u1, u2)
#[1] "b" "g" "a" "c" "h"
Find the count of unique values in all columns in a dataframe without including NA values (R)
You can use dplyr::n_distinct
with na.rm = T
:
library(dplyr)
sapply(dat, n_distinct, na.rm = T)
#map_dbl(dat, n_distinct, na.rm = T)
#nat_country age
# 3 8
In base R, you can use na.omit
as well:
sapply(dat, \(x) length(unique(na.omit(x))))
#nat_country age
# 3 8
How to get unique pairs from dataframe in R?
We can try with apply
to loop through the rows, sort
the elements, transpose the output, apply the duplicated
, negate it to return a logical index of TRUE/FALSE for unique and duplicates and use that to subset the rows.
m1[!duplicated(t(apply(m1, 1, sort))),]
# [,1] [,2]
#[1,] "CHC.AU.Equity" "SGP.AU.Equity"
#[2,] "CMA.AU.Equity" "SGP.AU.Equity"
#[3,] "AJA.AU.Equity" "AOG.AU.Equity"
#[4,] "AJA.AU.Equity" "GOZ.AU.Equity"
#[5,] "AJA.AU.Equity" "SCG.AU.Equity"
#[6,] "ABP.AU.Equity" "AOG.AU.Equity"
#[7,] "AOG.AU.Equity" "FET.AU.Equity"
Find unique entries in otherwise identical rows
A data.table
alternative. Coerce data frame to a data.table
(setDT
). Melt data to long format (melt(df, id.vars = "ID")
).
Within each group defined by 'ID' and 'variable' (corresponding to the columns in the wide format) (by = .(ID, variable)
), count number of unique values (uniqueN(value)
) and check if it's equal to the number of rows in the subgroup (== .N
). If so (if
), select the entire subgroup (.SD
).
Finally, reshape the data back to wide format (dcast
).
library(data.table)
setDT(df)
d = melt(df, id.vars = "ID")
dcast(d[ , if(uniqueN(value) == .N) .SD, by = .(ID, variable)], ID + rowid(ID, variable) ~ variable)
# ID ID_1 x2 x3 x5
# 1: 1 1 <NA> 7 x
# 2: 1 2 <NA> 10 p
# 3: 3 1 c 9 z
# 4: 3 2 d 11 q
Related Topics
Why Is the Parallel Package Slower Than Just Using Apply
How to Change Multiple Date Formats in Same Column
Conditional Merge/Replacement in R
Use a Value from the Previous Row in an R Data.Table Calculation
How to Make Consistent-Width Plots in Ggplot (With Legends)
How to Subtract Months from a Date in R
Plotting Time-Series With Date Labels on X-Axis
Detect At Least One Match Between Each Data Frame Row and Values in Vector
How to Set Up Conda-Installed R For Use With Rstudio
How to Merge 2 Vectors Alternating Indexes
Convert Column With Pipe Delimited Data into Dummy Variables
Idiomatic R Code For Partitioning a Vector by an Index and Performing an Operation on That Partition
Why Does X[Y] Join of Data.Tables Not Allow a Full Outer Join, or a Left Join
Remove Na Values from a Vector
Ggplot Legends - Change Labels, Order and Title
What's Wrong With My Function to Load Multiple .Csv Files into Single Dataframe in R Using Rbind