R Equivalent of Select Distinct on Two or More Fields/Variables

R equivalent of SELECT DISTINCT on two or more fields/variables

unique works on data.frame so unique(df[c("var1","var2")]) should be what you want.

Another option is distinct from dplyr package:

df %>% distinct(var1, var2) # or distinct(df, var1, var2)

Note:

For older versions of dplyr (< 0.5.0, 2016-06-24) distinct required additional step

df %>% select(var1, var2) %>% distinct

(or oldish way distinct(select(df, var1, var2))).

Subset with unique cases, based on multiple columns

You can use the duplicated() function to find the unique combinations:

> df[!duplicated(df[1:3]),]
  v1 v2 v3  v4 v5
1  7  1  A 100 98
2  7  2  A  98 97
3  8  1  C  NA 80
6  9  3  C  75 75

To get only the duplicates, you can check it in both directions:

> df[duplicated(df[1:3]) | duplicated(df[1:3], fromLast=TRUE),]
  v1 v2 v3 v4 v5
3  8  1  C NA 80
4  8  1  C 78 75
5  8  1  C 50 62

unique() for more than one variable

How about using unique() itself?

df <- data.frame(yad = c("BARBIE", "BARBIE", "BAKUGAN", "BAKUGAN"),
                 per = c("AYLIK",  "AYLIK",  "2 AYLIK", "2 AYLIK"),
                 hmm = 1:4)

df
#       yad     per hmm
# 1  BARBIE   AYLIK   1
# 2  BARBIE   AYLIK   2
# 3 BAKUGAN 2 AYLIK   3
# 4 BAKUGAN 2 AYLIK   4

unique(df[c("yad", "per")])
#       yad     per
# 1  BARBIE   AYLIK
# 3 BAKUGAN 2 AYLIK

R - Count unique/distinct values in two columns together per group

You can subset the data from cur_data() and unlist the data to get a vector. Use n_distinct to count number of unique values.

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(Count = n_distinct(unlist(select(cur_data(), 
                   Party, Party2013)), na.rm = TRUE)) %>%
  ungroup

#     ID  Wave Party Party2013 Count
#  <int> <int> <chr> <chr>     <int>
#1     1     1 A     A             2
#2     1     2 A     NA            2
#3     1     3 B     NA            2
#4     1     4 B     NA            2
#5     2     1 A     C             3
#6     2     2 B     NA            3
#7     2     3 B     NA            3
#8     2     4 B     NA            3

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Wave = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L), Party = c("A", "A", "B", "B", "A", 
"B", "B", "B"), Party2013 = c("A", NA, NA, NA, "C", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))

filter distinct value based on two columns with inverse string values in `r`

We can split the 'City.Pair' by '-', sort the elements in the list output, paste them together to give avector`, check for duplicates ('i1') and use the logical vector to subset the rows of 'data2'.

i1 <- !duplicated(apply(sapply(strsplit(as.character(data2$City.Pair), "-"), 
                sort), 2, paste, collapse="-"))
data2[i1,]
#    City.Pair Origin.City Destination.City Total.Passengers Total.Revenue
#1   LIS-BRU      LISBON         BRUSSELS              100        100.66
#2   LIS-LHR      LISBON           LONDON             5000       5000.25
#3   LAD-LIS      LUANDA           LISBON              200        200.75
#5   FAO-MAN        FARO       MANCHESTER             4000        4000.1
#7   LIS-ORY      LISBON            PARIS             4000       4000.05

Or using separate with pmin/pmax

library(dplyr)
library(tidyr)
separate(data2, City.Pair, into = c("City", "City2"), remove = FALSE) %>% 
         filter(!duplicated(pmin(City, City2), pmax(City, City2))) %>%
         select(-City, -City2)
#  City.Pair Origin.City Destination.City Total.Passengers Total.Revenue
#1   LIS-BRU      LISBON         BRUSSELS              100        100.66
#2   LIS-LHR      LISBON           LONDON             5000       5000.25
#3   LAD-LIS      LUANDA           LISBON              200        200.75
#4   FAO-MAN        FARO       MANCHESTER             4000        4000.1
#5   LIS-ORY      LISBON            PARIS             4000       4000.05

Select groups with more than one distinct value

Several possibilities, here's my favorite

library(data.table)
setDT(df)[, if(+var(number)) .SD, by = from]
#    from number
# 1:    2      1
# 2:    2      2

Basically, per each group we are checking if there is any variance, if TRUE, then return the group values

With base R, I would go with

df[as.logical(with(df, ave(number, from, FUN = var))), ]
#   from number
# 3    2      1
# 4    2      2

Edit: for a non numerical data you could try the new uniqueN function for the devel version of data.table (or use length(unique(number)) > 1 instead

setDT(df)[, if(uniqueN(number) > 1) .SD, by = from]

R Equivalent of Select Distinct on Two or More Fields/Variables