Deleting Rows That Are Duplicated in One Column Based on the Conditions of Another Column

Deleting rows that are duplicated in one column based on the conditions of another column

Lets say you have data in df

df = df[order(df[,'Date'],-df[,'Depth']),]
df = df[!duplicated(df$Date),]

Deleting rows that are duplicated in one column based on value in another column

Here is an idea

library(dplyr)

df %>%
group_by(id) %>%
filter(!any(n() > 1 & group == 998))

# A tibble: 3 x 2
# Groups: id [2]
id group
<int> <int>
1 2 2
2 2 3
3 3 998

In case you want to remove only the 998 entry from the group then,

df %>% 
group_by(id) %>%
filter(!(n() > 1 & group == 998))

Remove duplicates in one column based on another column

You can also just apply duplicated from both directions:

testDF %>%
filter(!is.na(Y) | (!duplicated(X) & !duplicated(X, fromLast = TRUE) ))

(highly influenced by this: Find duplicated elements with dplyr - I'll let others decide if this is close enough to be a duplicate)

To make your code even more readable, you can even put this in a function (perhaps with a better function name than mine):

all_duplicates <- function(x) {
duplicated(x) | duplicated(x, fromLast = TRUE)
}
testDF %>%
filter(!is.na(Y) | !all_duplicates(X) )

Remove duplicate rows in one column based on another column and keep other columns intact

This will create a new dataframe with the requirements that you asked for.

To explain, you don't actually need to delete anything, you just need to group the val1/2s by the common values, in this case id and second.

library(tidyverse)

new_df <- df %>%
group_by(id, second) %>%
summarise(var1 = mean(var1),
var2 = mean(var2)
)

Removing Duplicates from one Column based on conditions of another in R

You can use the distinct function from dplyr

df_cleaned <- df %>% distinct(PERSONUM, retained,.keep_all=TRUE)

The above code keeps records who have distinct "PERSONUM" and "retained" values

remove duplicate row based on conditional matching in another column

I think the following solution will help you:

library(dplyr)

df %>%
group_by(county, mid) %>%
mutate(duplicate = n() > 1) %>%
filter(!duplicate | (duplicate & kpi == "B")) %>%
select(-duplicate)

# A tibble: 71 x 3
# Groups: county, mid [71]
county mid kpi
<chr> <chr> <chr>
1 Athens 1.1 A
2 Athens 1.2 A
3 Athens 1.3 A
4 Athens 1.4 A
5 Athens 1.5 A
6 Athens 1.6 A
7 Athens 2.1.1 A
8 Athens 2.1.2 A
9 Athens 2.1.3 A
10 Athens 2.1.4 A
# ... with 61 more rows


Related Topics



Leave a reply



Submit