Remove duplicated rows
just isolate your data frame to the columns you need, then use the unique function :D
# in the above example, you only need the first three columns
deduped.data <- unique( yourdata[ , 1:3 ] )
# the fourth column no longer 'distinguishes' them,
# so they're duplicates and thrown out.
remove rows with duplicate values in any other adjacent column
You may use anyDuplicated
for each row.
library(data.table)
setDT(df)
df[apply(df, 1, anyDuplicated) == 0]
# V1 V2 V3
#1: 3 2 1
#2: 2 3 1
#3: 3 1 2
#4: 1 3 2
#5: 2 1 3
#6: 1 2 3
Remove duplicate values across a few columns but keep rows
Base R way using apply
:
cols <- grep('z_\\d+', names(dat))
dat[cols] <- t(apply(dat[cols], 1, function(x) replace(x, duplicated(x), 0)))
# id z_1 z_2 z_3 z_4 z_5 z_6
#1 1 100 20 0 0 23 0
#2 2 290 0 0 0 0 0
#3 3 38 0 0 0 25 0
#4 4 129 0 0 127 0 0
#5 5 0 0 0 38 0 0
#6 6 290 0 98 78 0 9
tidyverse
way without reshaping can be done using pmap
:
library(tidyverse)
dat %>%
mutate(result = pmap(select(., matches('z_\\d+')), ~{
x <- c(...)
replace(x, duplicated(x), 0)
})) %>%
select(id, result) %>%
unnest_wider(result)
Since tests performed by @thelatemail suggests reshaping is a better option than handling the data rowwise you might want to consider it.
dat %>%
pivot_longer(cols = matches('z_\\d+')) %>%
group_by(id) %>%
mutate(value = replace(value, duplicated(value), 0)) %>%
pivot_wider()
Deleting rows that are duplicated in one column based on the conditions of another column
Lets say you have data in df
df = df[order(df[,'Date'],-df[,'Depth']),]
df = df[!duplicated(df$Date),]
Remove duplicates based on conditions in rows in a dataframe
Use slice_max
after grouping by 'Name'
library(dplyr)
data_people %>%
group_by(Name) %>%
slice_max(n = 1, order_by = X._Scoring) %>%
ungroup
-output
# A tibble: 2 x 4
Name Information Height X._Scoring
<chr> <chr> <dbl> <dbl>
1 John Doe This is an information 1.88 0.89
2 Margarita Pan This is an information as well 1.47 0.78
Or if we want to keep the minimum value, then use slice_min
data_people %>%
group_by(Name) %>%
slice_min(n = 1, order_by = X._Scoring) %>%
ungroup
# A tibble: 2 x 4
Name Information Height X._Scoring
<chr> <chr> <dbl> <dbl>
1 John Doe This is an information NA 0.56
2 Margarita Pan This is an information as well 1.47 0.78
how do I remove rows with duplicate values of columns in pandas data frame?
Using drop_duplicates
with subset
with list of columns to check for duplicates on and keep='first'
to keep first of duplicates.
If dataframe
is:
df = pd.DataFrame({'Column1': ["'cat'", "'toy'", "'cat'"],
'Column2': ["'bat'", "'flower'", "'bat'"],
'Column3': ["'xyz'", "'abc'", "'lmn'"]})
print(df)
Result:
Column1 Column2 Column3
0 'cat' 'bat' 'xyz'
1 'toy' 'flower' 'abc'
2 'cat' 'bat' 'lmn'
Then:
result_df = df.drop_duplicates(subset=['Column1', 'Column2'], keep='first')
print(result_df)
Result:
Column1 Column2 Column3
0 'cat' 'bat' 'xyz'
1 'toy' 'flower' 'abc'
Delete rows in R data.frame based on duplicate values in one column only
I think you actually want to use a filter()
operation for this in combination with arrange()
For example:
df %>%
arrange(desc(`Date Taken`)) %>%
group_by(ID) %>%
filter(row_number(`Date Taken`) == 1)
would get you the most recent observation for each ID.
You could also use a summarise()
:
df %>%
arrange(desc(`Date Taken`)) %>%
group_by(ID) %>%
summarise(ID = first(ID))
If you didn't care about Date Taken
making it into the result.
Related Topics
R: Removing Duplicate Elements in a Vector
How to Get a Minimum Value by Group
Vector of Cumulative Sums in R
Merge Data.Frames with Duplicates
Finding Number of Elements in One Vector That Are Less Than an Element in Another Vector
R: Read in Random Rows from File Using Fread or Equivalent
How to Change Gender Factor into an Numerical Coding in R
R Convert String Date (E.G. "October 1, 2014") to Date Format
Removing Unicode Symbols from Column Names
Create All Subvectors of a Certain Length (Moving Window)
How to Read Large Numbers Precisely in R and Perform Arithmetic on Them
Lm and Predict - Agreement of Data.Frame Names
Drawing Minor Ticks (Not Grid Ticks) in Ggplot2 in a Date Format Axis
Cannot Install Library(Xlsx) in R and Look for an Alternative
How to Embed Plots into a Tab in Rmarkdown in a Procedural Fashion
Add New Value to New Column Based on If Value Exists in Other Dataframe in R