R equivalent of SELECT DISTINCT on two or more fields/variables
unique
works on data.frame
so unique(df[c("var1","var2")])
should be what you want.
Another option is distinct
from dplyr
package:
df %>% distinct(var1, var2) # or distinct(df, var1, var2)
Note:
For older versions of dplyr (< 0.5.0, 2016-06-24) distinct
required additional step
df %>% select(var1, var2) %>% distinct
(or oldish way distinct(select(df, var1, var2))
).
Subset with unique cases, based on multiple columns
You can use the duplicated()
function to find the unique combinations:
> df[!duplicated(df[1:3]),]
v1 v2 v3 v4 v5
1 7 1 A 100 98
2 7 2 A 98 97
3 8 1 C NA 80
6 9 3 C 75 75
To get only the duplicates, you can check it in both directions:
> df[duplicated(df[1:3]) | duplicated(df[1:3], fromLast=TRUE),]
v1 v2 v3 v4 v5
3 8 1 C NA 80
4 8 1 C 78 75
5 8 1 C 50 62
unique() for more than one variable
How about using unique()
itself?
df <- data.frame(yad = c("BARBIE", "BARBIE", "BAKUGAN", "BAKUGAN"),
per = c("AYLIK", "AYLIK", "2 AYLIK", "2 AYLIK"),
hmm = 1:4)
df
# yad per hmm
# 1 BARBIE AYLIK 1
# 2 BARBIE AYLIK 2
# 3 BAKUGAN 2 AYLIK 3
# 4 BAKUGAN 2 AYLIK 4
unique(df[c("yad", "per")])
# yad per
# 1 BARBIE AYLIK
# 3 BAKUGAN 2 AYLIK
R - Count unique/distinct values in two columns together per group
You can subset the data from cur_data()
and unlist
the data to get a vector. Use n_distinct
to count number of unique values.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Count = n_distinct(unlist(select(cur_data(),
Party, Party2013)), na.rm = TRUE)) %>%
ungroup
# ID Wave Party Party2013 Count
# <int> <int> <chr> <chr> <int>
#1 1 1 A A 2
#2 1 2 A NA 2
#3 1 3 B NA 2
#4 1 4 B NA 2
#5 2 1 A C 3
#6 2 2 B NA 3
#7 2 3 B NA 3
#8 2 4 B NA 3
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Wave = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L), Party = c("A", "A", "B", "B", "A",
"B", "B", "B"), Party2013 = c("A", NA, NA, NA, "C", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))
filter distinct value based on two columns with inverse string values in `r`
We can split the 'City.Pair' by '-', sort
the elements in the list
output, paste them together to give a
vector`, check for duplicates ('i1') and use the logical vector to subset the rows of 'data2'.
i1 <- !duplicated(apply(sapply(strsplit(as.character(data2$City.Pair), "-"),
sort), 2, paste, collapse="-"))
data2[i1,]
# City.Pair Origin.City Destination.City Total.Passengers Total.Revenue
#1 LIS-BRU LISBON BRUSSELS 100 100.66
#2 LIS-LHR LISBON LONDON 5000 5000.25
#3 LAD-LIS LUANDA LISBON 200 200.75
#5 FAO-MAN FARO MANCHESTER 4000 4000.1
#7 LIS-ORY LISBON PARIS 4000 4000.05
Or using separate
with pmin/pmax
library(dplyr)
library(tidyr)
separate(data2, City.Pair, into = c("City", "City2"), remove = FALSE) %>%
filter(!duplicated(pmin(City, City2), pmax(City, City2))) %>%
select(-City, -City2)
# City.Pair Origin.City Destination.City Total.Passengers Total.Revenue
#1 LIS-BRU LISBON BRUSSELS 100 100.66
#2 LIS-LHR LISBON LONDON 5000 5000.25
#3 LAD-LIS LUANDA LISBON 200 200.75
#4 FAO-MAN FARO MANCHESTER 4000 4000.1
#5 LIS-ORY LISBON PARIS 4000 4000.05
Select groups with more than one distinct value
Several possibilities, here's my favorite
library(data.table)
setDT(df)[, if(+var(number)) .SD, by = from]
# from number
# 1: 2 1
# 2: 2 2
Basically, per each group we are checking if there is any variance, if TRUE
, then return the group values
With base R, I would go with
df[as.logical(with(df, ave(number, from, FUN = var))), ]
# from number
# 3 2 1
# 4 2 2
Edit: for a non numerical data you could try the new uniqueN
function for the devel version of data.table
(or use length(unique(number)) > 1
instead
setDT(df)[, if(uniqueN(number) > 1) .SD, by = from]
Related Topics
How to Perform Update Query with Subquery in Access
Get All Dates in Date Range in SQL Server
How to Pass Table Name as a Parameter in Oracle
Delete All Data in SQL Server Database
Difference Between Select Unique and Select Distinct
How to Trim a String in SQL Server Before 2017
SQL Server Convert Select a Column and Convert It to a String
Connect SQL Server in Ruby on Rails
Why Is 30 the Default Length for Varchar When Using Cast
Fast Way to Generate Concatenated Strings in Oracle
Simple SQL Select from 2 Tables (What Is a Join)
How to Identify All Stored Procedures Referring a Particular Table
Comparing Two Bitmasks in SQL to See If Any of the Bits Match