How to Remove Duplicated Column Names in R

How do I remove duplicated columns from a data frame in R?

An option is

df[!duplicated(as.list(df))]

Or

df[!duplicated(unclass(df))]

How to remove duplicate column names in R?

Your real dataframe is of class data.table, while your small example is not. You can try:

df[,!duplicated(colnames(df)), with=F]

R remove duplicated columns

 df[!duplicated(as.list(df))]
X3 X4 X5 X6 X7
1 step1 step2 step3 step4 step10

remove first occurrence of duplicate column names data.table

Here is a more flexible way:

g <- as.integer(ave(names(dt), names(dt), FUN = length))

# for duplicated column names, keep the 1st occurrence
dt[, g == 1 | (rowid(names(dt)) == 1), with = FALSE]

# keep the 2nd occurrence
dt[, g == 1 | (rowid(names(dt)) == 2), with = FALSE]

# keep the 2nd and 3rd occurrences
dt[, g == 1 | (rowid(names(dt)) %in% c(2, 3)), with = FALSE]

# keep the last occurrence
dt[, g == rowid(names(dt)), with = FALSE]

How to delete duplicated columns in a tibble in the tidyverse

Building off the answer provided by Ronak, if you want to do this in dplyr, then you can just use his provided solution with select_if.

library(dplyr)

df <- data.frame("x" = runif(3),
"SYC SJ Equity...406" = c("a", "a", "b"),
"SYC SJ Equity...407" = c("a", "a", "b"),
"y" = runif(3))

df %>%
select_if(!duplicated(sub("\\.\\.\\..*", "", names(.))))

How to remove duplicated (by name) column in data.tables in R?

How about

dt[, .SD, .SDcols = unique(names(dt))]

This selects the first occurrence of each name (I'm not sure how you want to handle this).

As @DavidArenburg suggests in comments above, you could use check.names=TRUE in data.table() or fread()



Related Topics



Leave a reply



Submit