How to Remove Duplicated Column Names in R

How do I remove duplicated columns from a data frame in R?

An option is




How to remove duplicate column names in R?

Your real dataframe is of class data.table, while your small example is not. You can try:

df[,!duplicated(colnames(df)), with=F]

R remove duplicated columns

X3 X4 X5 X6 X7
1 step1 step2 step3 step4 step10

remove first occurrence of duplicate column names data.table

Here is a more flexible way:

g <- as.integer(ave(names(dt), names(dt), FUN = length))

# for duplicated column names, keep the 1st occurrence
dt[, g == 1 | (rowid(names(dt)) == 1), with = FALSE]

# keep the 2nd occurrence
dt[, g == 1 | (rowid(names(dt)) == 2), with = FALSE]

# keep the 2nd and 3rd occurrences
dt[, g == 1 | (rowid(names(dt)) %in% c(2, 3)), with = FALSE]

# keep the last occurrence
dt[, g == rowid(names(dt)), with = FALSE]

How to delete duplicated columns in a tibble in the tidyverse

Building off the answer provided by Ronak, if you want to do this in dplyr, then you can just use his provided solution with select_if.


df <- data.frame("x" = runif(3),
"SYC SJ Equity...406" = c("a", "a", "b"),
"SYC SJ Equity...407" = c("a", "a", "b"),
"y" = runif(3))

df %>%
select_if(!duplicated(sub("\\.\\.\\..*", "", names(.))))

How to remove duplicated (by name) column in data.tables in R?

How about

dt[, .SD, .SDcols = unique(names(dt))]

This selects the first occurrence of each name (I'm not sure how you want to handle this).

As @DavidArenburg suggests in comments above, you could use check.names=TRUE in data.table() or fread()

Related Topics

Leave a reply
