Remove duplicates keeping entry with largest absolute value
First. Sort in the order putting the less desired items last within id
groups
aa <- a[order(a$id, -abs(a$value) ), ] #sort by id and reverse of abs(value)
Then: Remove items after the first within id
groups
aa[ !duplicated(aa$id), ] # take the first row within each id
id value
2 1 2
4 2 -4
5 3 -5
6 4 6
Remove duplicates, keeping most frequent row
Use table
and which.max
to extract the mode
:
df %>%
group_by(id, var) %>%
summarise(val = {t <- table(val); names(t)[which.max(t)] })
# A tibble: 2 x 3
# Groups: id [?]
# id var val
# <fct> <fct> <chr>
#1 a b d
#2 a c f
Another way to do this in base R: Create a three way contingency table directly, and then find the max column along the third axis:
apply(table(df), c(1, 2), function(v) names(v)[which.max(v)])
# var
#id b c
# a "d" "f"
Convert this to a data frame:
as.data.frame.table(
apply(table(df), c(1, 2), function(v) names(v)[which.max(v)])
)
# id var Freq
#1 a b d
#2 a c f
Choosing duplicates according to maximum (for multiple columns) in R
You can group by ID
, Gender
and Race
and summarise the Value
variables to get their max.
library(dplyr)
df %>%
group_by(ID, Gender, Race) %>%
summarise(across(starts_with('Value'), max, na.rm = TRUE), .groups = "drop")
# ID Gender Race Value1 Value2 Value3
# <int> <chr> <chr> <int> <int> <int>
#1 1 M B 45 76 95
#2 2 F W 67 100 92
#3 3 F A 34 95 90
#4 4 M W 78 37 13
#5 5 M B 80 105 45
Conditionally removing duplicates in Excel based on largest value in a Column
If you don not want to use VBA you can try this:
Select the table and from HOME -> Sort & Filter
select Custom Sort
.
First sort by word
with order : A to Z
then Add Level
and sort by value
with order: Smallest to Largest
.
Create a column Key
with the following formula (i am assuming you data have headers and start from cell A1
. Then in cell C2
put the formula =IF(B2<>B3,0,1)
and copy down.
Then copy and paste special as values column C
and sort on the column Key
Descending
or filter on 1
Finally, copy the filtered table and paste it elsewhere then select it all and Data -> Remove Duplicates
and your done. Or you can do this last part first and then run the steps above.
Remove duplicates from dataset based on criteria
There are many ways to get the expected output. One option with dplyr
would be to group by 'ID', 'Sub' columns, get the top Score observation with top_n
, and if there are duplicate rows, use distinct
.
library(dplyr)
df1 %>%
group_by(ID, Sub) %>%
top_n(1) %>%
distinct()
ID Sub Score
# (int) (chr) (int)
#1 2 Mat 34
#2 3 Mat 67
#3 4 Mat 22
#4 5 Sci 78
#5 6 Mat 32
#6 1 Mat 56
#7 1 Sci 40
Or with data.table
, we convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'ID', 'Sub', we order
the 'Score' in descending and subset the first row of each group combination (.SD[1L]
or head(.SD, 1)
can be used).
library(data.table)
setDT(df1)[order(-Score), .SD[1L] ,.(ID, Sub)]
Or another option is unique
after we order
by the columns so that it will only select the first observation for each duplicate.
unique(setDT(df1)[order(ID, Sub,-Score)], by = c('ID', 'Sub'))
Or with base R
, we order
the columns, and use duplicated
to remove the rows that are duplicates for the first two columns.
df2 <- df1[with(df1, order(ID, Sub, -Score)),]
df2[!duplicated(df2[1:2]),]
how to remove unique entry and keep duplicates in R
Another option in base R Using duplicated
dx[dx$ID %in% dx$ID[duplicated(dx$ID)],]
# ID Cat1 Cat2 Cat3 Cat4
# 1 A0001 358 11.2500 37428 0
# 2 A0001 279 14.6875 38605 0
# 5 A0020 367 8.8750 37797 0
# 6 A0020 339 9.6250 39324 0
data.table using duplicated
using duplicated
and fromLast
version you get :
library(data.table)
setkey(setDT(dx),ID) # or with data.table 1.9.5+: setDT(dx,key="ID")
dx[duplicated(dx) |duplicated(dx,fromLast=T)]
# ID Cat1 Cat2 Cat3 Cat4
# 1: A0001 358 11.2500 37428 0
# 2: A0001 279 14.6875 38605 0
# 3: A0020 367 8.8750 37797 0
# 4: A0020 339 9.6250 39324 0
This can be applied to base R also but I prefer data.table here for syntax sugar.
Related Topics
Linear Regression Loop for Each Independent Variable Individually Against Dependent
Remove/Collapse Consecutive Duplicate Values in Sequence
Min for Each Row in a Data Frame
Last Observation Carried Forward in a Data Frame
How to Get the Maximum Value by Group
Calculate Row-Wise Proportions
Using Rcpp Within Parallel Code via Snow to Make a Cluster
Why Does R Use Partial Matching
Reduce PDF File Size of Plots by Filtering Hidden Objects
Generate an Incrementally Increasing Sequence Like 112123123412345
How to Remove an Element from a List
How to Change the Formatting of Numbers on an Axis with Ggplot
Convert Column Classes in Data.Table
How to Add a Ggplot2 Subtitle with Different Size and Colour