Find duplicated elements with dplyr
I guess you could use filter
for this purpose:
mtcars %>%
group_by(carb) %>%
filter(n()>1)
Small example (note that I added summarize()
to prove that the resulting data set does not contain rows with duplicate 'carb'. I used 'carb' instead of 'cyl' because 'carb' has unique values whereas 'cyl' does not):
mtcars %>% group_by(carb) %>% summarize(n=n())
#Source: local data frame [6 x 2]
#
# carb n
#1 1 7
#2 2 10
#3 3 3
#4 4 10
#5 6 1
#6 8 1
mtcars %>% group_by(carb) %>% filter(n()>1) %>% summarize(n=n())
#Source: local data frame [4 x 2]
#
# carb n
#1 1 7
#2 2 10
#3 3 3
#4 4 10
High-performance way to find duplicated rows (using dplyr) on big data set
For large-ish data, it's often useful to try a data.table approch. In this case you can find duplicate rows using:
library(data.table)
setDT(df1, key = c("valA", "valB", "Score"))
df1[, N := .N, by = key(df1)] # count rows per group
df1[N > 1]
Find duplicated character values in two columns with dplyr
You could try:
library(dplyr)
dat %>%
filter(duplicated(paste0(`First name`, `Last name`)))
Output on the basis of data below:
First name Last name
1 Peter Parker
If you'd like to have all the duplications returned, you could do:
dat %>%
group_by(`First name`, `Last name`) %>%
filter(n() > 1)
Output on the basis of data below:
# A tibble: 2 x 2
# Groups: First name, Last name [1]
`First name` `Last name`
<fct> <fct>
1 Peter Parker
2 Peter Parker
Example data:
dat <-
data.frame(
`First name` = c("Peter", "Peter", "John", "John"),
`Last name` = c("Parker", "Parker", "Biscuit", "Chocolate"),
check.names = FALSE
)
dat
First name Last name
1 Peter Parker
2 Peter Parker
3 John Biscuit
4 John Chocolate
Finding ALL duplicate rows, including elements with smaller subscripts
duplicated
has a fromLast
argument. The "Example" section of ?duplicated
shows you how to use it. Just call duplicated
twice, once with fromLast=FALSE
and once with fromLast=TRUE
and take the rows where either are TRUE
.
Some late Edit:
You didn't provide a reproducible example, so here's an illustration kindly contributed by @jbaums
vec <- c("a", "b", "c","c","c")
vec[duplicated(vec) | duplicated(vec, fromLast=TRUE)]
## [1] "c" "c" "c"
Edit: And an example for the case of a data frame:
df <- data.frame(rbind(c("a","a"),c("b","b"),c("c","c"),c("c","c")))
df[duplicated(df) | duplicated(df, fromLast=TRUE), ]
## X1 X2
## 3 c c
## 4 c c
Find duplicate rows in data frame based on multiple columns in r
We can do
library(data.table)
unique(setDT(data_concern_join2),
by = c('locid', 'stdate', 'sttime', 'charnam', 'valunit'))
find duplicates with grouped variables
We gather
the 'from', 'to' columns to 'long' format, grouped by 'val', filter
the groups having more than one unique elements, then pull
the unique 'val' elements
library(dplyr)
library(tidyr)
df1 %>%
gather(key, val, from:to) %>%
group_by(val) %>%
filter(n_distinct(group) > 1) %>%
distinct(val) %>%
pull(val)
#[1] 1 4
Or using base R
we can just table
to find the frequency, and get the ids out of it
out <- with(df1, colSums(table(rep(group, 2), unlist(df1[1:2])) > 0)) > 1
names(which(out))
#[1] "1" "4"
data
df1 <- structure(list(from = c(1L, 2L, 3L, 4L, 6L, 8L), to = c(2L, 4L,
4L, 5L, 1L, 7L), group = c("metro", "metro", "metro", "train",
"train", "train")), class = "data.frame", row.names = c(NA, -6L
))
Find unique entries in otherwise identical rows
A data.table
alternative. Coerce data frame to a data.table
(setDT
). Melt data to long format (melt(df, id.vars = "ID")
).
Within each group defined by 'ID' and 'variable' (corresponding to the columns in the wide format) (by = .(ID, variable)
), count number of unique values (uniqueN(value)
) and check if it's equal to the number of rows in the subgroup (== .N
). If so (if
), select the entire subgroup (.SD
).
Finally, reshape the data back to wide format (dcast
).
library(data.table)
setDT(df)
d = melt(df, id.vars = "ID")
dcast(d[ , if(uniqueN(value) == .N) .SD, by = .(ID, variable)], ID + rowid(ID, variable) ~ variable)
# ID ID_1 x2 x3 x5
# 1: 1 1 <NA> 7 x
# 2: 1 2 <NA> 10 p
# 3: 3 1 c 9 z
# 4: 3 2 d 11 q
Related Topics
How to Add a Diagonal Line to a Plot
I Want to Split Street Address into Two Columns. One With Street Number Other With Street Name
Collapse/Concatenate/Aggregate a Column to a Single Comma Separated String Within Each Group
Linear Regression and Group by in R
Ggplot'S Qplot Does Not Execute on Sourcing
Compare Two Data.Frames to Find the Rows in Data.Frame 1 That Are Not Present in Data.Frame 2
Plot a Legend Outside of the Plotting Area in Base Graphics
Add a Common Legend For Combined Ggplots
How to Add a Suffix (Or Prefix) Elements of an Existing List
Using Ggplot2, How to Insert a Break in the Axis
R Count Distinct Elements Based on Two Columns by Group
How to Convert a Factor to Integer\Numeric Without Loss of Information