Deleting rows that are duplicated in one column based on the conditions of another column
Lets say you have data in df
df = df[order(df[,'Date'],-df[,'Depth']),]
df = df[!duplicated(df$Date),]
Deleting rows that are duplicated in one column based on value in another column
Here is an idea
library(dplyr)
df %>%
group_by(id) %>%
filter(!any(n() > 1 & group == 998))
# A tibble: 3 x 2
# Groups: id [2]
id group
<int> <int>
1 2 2
2 2 3
3 3 998
In case you want to remove only the 998 entry from the group then,
df %>%
group_by(id) %>%
filter(!(n() > 1 & group == 998))
Remove duplicates in one column based on another column
You can also just apply duplicated
from both directions:
testDF %>%
filter(!is.na(Y) | (!duplicated(X) & !duplicated(X, fromLast = TRUE) ))
(highly influenced by this: Find duplicated elements with dplyr - I'll let others decide if this is close enough to be a duplicate)
To make your code even more readable, you can even put this in a function (perhaps with a better function name than mine):
all_duplicates <- function(x) {
duplicated(x) | duplicated(x, fromLast = TRUE)
}
testDF %>%
filter(!is.na(Y) | !all_duplicates(X) )
Remove duplicate rows in one column based on another column and keep other columns intact
This will create a new dataframe with the requirements that you asked for.
To explain, you don't actually need to delete anything, you just need to group the val1/2s by the common values, in this case id and second.
library(tidyverse)
new_df <- df %>%
group_by(id, second) %>%
summarise(var1 = mean(var1),
var2 = mean(var2)
)
Removing Duplicates from one Column based on conditions of another in R
You can use the distinct function from dplyr
df_cleaned <- df %>% distinct(PERSONUM, retained,.keep_all=TRUE)
The above code keeps records who have distinct "PERSONUM" and "retained" values
remove duplicate row based on conditional matching in another column
I think the following solution will help you:
library(dplyr)
df %>%
group_by(county, mid) %>%
mutate(duplicate = n() > 1) %>%
filter(!duplicate | (duplicate & kpi == "B")) %>%
select(-duplicate)
# A tibble: 71 x 3
# Groups: county, mid [71]
county mid kpi
<chr> <chr> <chr>
1 Athens 1.1 A
2 Athens 1.2 A
3 Athens 1.3 A
4 Athens 1.4 A
5 Athens 1.5 A
6 Athens 1.6 A
7 Athens 2.1.1 A
8 Athens 2.1.2 A
9 Athens 2.1.3 A
10 Athens 2.1.4 A
# ... with 61 more rows
Related Topics
How to Highlight Time Ranges on a Plot
Jitter If Multiple Outliers in Ggplot2 Boxplot
R Ggplot Barplot; Fill Based on Two Separate Variables
Calculating the Difference Between Consecutive Rows by Group Using Dplyr
Plot Logistic Regression Curve in R
Shiny Selectinput to Select All from Dropdown
Merge/Combine Columns with Same Name But Incomplete Data
Why Is Foreach() %Do% Sometimes Slower Than For
Fit a No-Intercept Model in Caret
Coding Variable Values into Classes Using R
Anti-Aliasing in R Graphics Under Windows (As Per MAC)
How to Find Useful R Tutorials with Various Implementations
Install R Packages from Github Downloading Master.Zip
How to Write a Function That Calls a Function That Calls Data.Table
Plotting a Large Number of Custom Functions in Ggplot in R Using Stat_Function()
How to Fix Outofmemoryerror (Java): Gc Overhead Limit Exceeded in R