how to remove unique entry and keep duplicates in R
Another option in base R Using duplicated
dx[dx$ID %in% dx$ID[duplicated(dx$ID)],]
# ID Cat1 Cat2 Cat3 Cat4
# 1 A0001 358 11.2500 37428 0
# 2 A0001 279 14.6875 38605 0
# 5 A0020 367 8.8750 37797 0
# 6 A0020 339 9.6250 39324 0
data.table using duplicated
using duplicated
and fromLast
version you get :
library(data.table)
setkey(setDT(dx),ID) # or with data.table 1.9.5+: setDT(dx,key="ID")
dx[duplicated(dx) |duplicated(dx,fromLast=T)]
# ID Cat1 Cat2 Cat3 Cat4
# 1: A0001 358 11.2500 37428 0
# 2: A0001 279 14.6875 38605 0
# 3: A0020 367 8.8750 37797 0
# 4: A0020 339 9.6250 39324 0
This can be applied to base R also but I prefer data.table here for syntax sugar.
How to subset your dataframe to only keep the first duplicate?
You could use dplyr for this and after filtering on the max postDate, use a distinct (unique) to remove all duplicate rows. Of course if there are differences in the rows with max postDate you will get all of those records.
occurrence <- occurrence %>%
group_by(userId) %>%
filter(postDate == max(postDate)) %>%
distinct
occurence
# A tibble: 1 x 6
# Groups: userId [1]
userId occurrence profile.birthday profile.gender postDate count
<dbl> <int> <int> <chr> <chr> <int>
1 100469891698 6 47 Female 583 days 0
Remove duplicated rows
just isolate your data frame to the columns you need, then use the unique function :D
# in the above example, you only need the first three columns
deduped.data <- unique( yourdata[ , 1:3 ] )
# the fourth column no longer 'distinguishes' them,
# so they're duplicates and thrown out.
Remove duplicate in a large list while keeping the named number in R
Try this:
df <- readRDS('MEPList.rds')
df1 <- as.data.frame(do.call(rbind,df))
df2 <- df1[!duplicated(df1$V1),,drop=F]
Output:
head(df2)
V1
GUE.NGL.mepid 197701
GUE.NGL.mepid.1 197533
GUE.NGL.mepid.2 197521
GUE.NGL.mepid.3 187917
GUE.NGL.mepid.4 124986
GUE.NGL.mepid.5 197529
Then you could format the rownames()
to get the names.
Filtering a dataframe showing only duplicates
Considering df
as your input, you can use dplyr
and try:
df %>% group_by(V1) %>% filter(n() > 1)
for the duplicates
and
df %>% group_by(V1) %>% filter(n() == 1)
for the unique entries.
Remove duplicated row in column and keep last row in R
You could also solve this with aggregate
, like below:
aggregate(. ~ COL3, data = df, FUN = tail, 1)
Or another way in dplyr
:
library(dplyr)
df %>%
group_by(COL3) %>%
slice(n())
This of course assumes that you're only after duplicates in COL3
- otherwise you'll need to rephrase the problem (as the example doesn't seem to be particularly complex).
How can I remove all duplicates so that NONE are left in a data frame?
This will extract the rows which appear only once (assuming your data frame is named df
):
df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]
How it works: The function duplicated
tests whether a line appears at least for the second time starting at line one. If the argument fromLast = TRUE
is used, the function starts at the last line.
Boths boolean results are combined with |
(logical 'or') into a new vector which indicates all lines appearing more than once. The result of this is negated using !
thereby creating a boolean vector indicating lines appearing only once.
Related Topics
How to Fix Outofmemoryerror (Java): Gc Overhead Limit Exceeded in R
How to Use a MACro Variable in R? (Similar to %Let in Sas)
How to Create an Edge List from a Matrix in R
R View() Does Not Display All Columns of Data Frame
Long and Wide Data - When to Use What
Coding Variable Values into Classes Using R
How to Export an Excel Sheet Range to a Picture, from Within R
How to Use Grid to Edit a Ggplot2 Object to Add Math Expressions to Facet Labels
Ggplot2 Equivalent of Matplot():Plot a Matrix/Array by Columns
Efficiently Locf by Groups in a Single R Data.Table
R- Converting Data from Fraction to Decimal
R - Store a Matrix into a Single Dataframe Cell
How to Extract Elements from a List with Mixed Elements
Error with Ggplot2 Mapping Variable to Y and Using Stat="Bin"