Removing Row with Duplicated Values in All Columns of a Data Frame (R)

Remove duplicated rows

just isolate your data frame to the columns you need, then use the unique function :D

# in the above example, you only need the first three columns
deduped.data <- unique( yourdata[ , 1:3 ] )
# the fourth column no longer 'distinguishes' them,
# so they're duplicates and thrown out.

remove rows with duplicate values in any other adjacent column

You may use anyDuplicated for each row.

library(data.table)

setDT(df)
df[apply(df, 1, anyDuplicated) == 0]

# V1 V2 V3
#1: 3 2 1
#2: 2 3 1
#3: 3 1 2
#4: 1 3 2
#5: 2 1 3
#6: 1 2 3

Remove duplicate values across a few columns but keep rows

Base R way using apply :

cols <- grep('z_\\d+', names(dat))
dat[cols] <- t(apply(dat[cols], 1, function(x) replace(x, duplicated(x), 0)))

# id z_1 z_2 z_3 z_4 z_5 z_6
#1 1 100 20 0 0 23 0
#2 2 290 0 0 0 0 0
#3 3 38 0 0 0 25 0
#4 4 129 0 0 127 0 0
#5 5 0 0 0 38 0 0
#6 6 290 0 98 78 0 9

tidyverse way without reshaping can be done using pmap :

library(tidyverse)

dat %>%
mutate(result = pmap(select(., matches('z_\\d+')), ~{
x <- c(...)
replace(x, duplicated(x), 0)
})) %>%
select(id, result) %>%
unnest_wider(result)

Since tests performed by @thelatemail suggests reshaping is a better option than handling the data rowwise you might want to consider it.

dat %>%
pivot_longer(cols = matches('z_\\d+')) %>%
group_by(id) %>%
mutate(value = replace(value, duplicated(value), 0)) %>%
pivot_wider()

Deleting rows that are duplicated in one column based on the conditions of another column

Lets say you have data in df

df = df[order(df[,'Date'],-df[,'Depth']),]
df = df[!duplicated(df$Date),]

Remove duplicates based on conditions in rows in a dataframe

Use slice_max after grouping by 'Name'

library(dplyr)
data_people %>%
group_by(Name) %>%
slice_max(n = 1, order_by = X._Scoring) %>%
ungroup

-output

# A tibble: 2 x 4
Name Information Height X._Scoring
<chr> <chr> <dbl> <dbl>
1 John Doe This is an information 1.88 0.89
2 Margarita Pan This is an information as well 1.47 0.78

Or if we want to keep the minimum value, then use slice_min

data_people %>% 
group_by(Name) %>%
slice_min(n = 1, order_by = X._Scoring) %>%
ungroup
# A tibble: 2 x 4
Name Information Height X._Scoring
<chr> <chr> <dbl> <dbl>
1 John Doe This is an information NA 0.56
2 Margarita Pan This is an information as well 1.47 0.78

how do I remove rows with duplicate values of columns in pandas data frame?

Using drop_duplicates with subset with list of columns to check for duplicates on and keep='first' to keep first of duplicates.

If dataframe is:

df = pd.DataFrame({'Column1': ["'cat'", "'toy'", "'cat'"],
'Column2': ["'bat'", "'flower'", "'bat'"],
'Column3': ["'xyz'", "'abc'", "'lmn'"]})
print(df)

Result:

  Column1   Column2 Column3
0 'cat' 'bat' 'xyz'
1 'toy' 'flower' 'abc'
2 'cat' 'bat' 'lmn'

Then:

result_df = df.drop_duplicates(subset=['Column1', 'Column2'], keep='first')
print(result_df)

Result:

  Column1   Column2 Column3
0 'cat' 'bat' 'xyz'
1 'toy' 'flower' 'abc'

Delete rows in R data.frame based on duplicate values in one column only

I think you actually want to use a filter() operation for this in combination with arrange()

For example:

df %>%
arrange(desc(`Date Taken`)) %>%
group_by(ID) %>%
filter(row_number(`Date Taken`) == 1)

would get you the most recent observation for each ID.

You could also use a summarise():

df %>%
arrange(desc(`Date Taken`)) %>%
group_by(ID) %>%
summarise(ID = first(ID))

If you didn't care about Date Taken making it into the result.



Related Topics



Leave a reply



Submit