Rename Columns in Multiple Dataframes, R

Changing Column names of multiple data frames using a for loop with data frames loaded into a List

L <- lapply(L, function(x){
colnames(x) <- c("NewName1", "NewName2")
x
} )

Rename specific columns across multiple data sets

This is what you are looking for:

 dat=lapply(list(df2=df2,df3=df3),function(x){names(x)[which(names(x)=="help")]="var2";x})
list2env(dat,.GlobalEnv)

How to change the names of columns in multiple dataframes in a list based on conditions?

If all the 6 columns that you want are present in all the dataframes, the lookup approach sounds good to me.

Here is an example to do it with 4 columns.

Creating a lookup table and some fake data

lookup_table <- data.frame(orignal_col = c('col1', 'col2', 'col3', 'col4'), 
new_col = c( "tribal_name", "st_usps_cd", "scc", "description"))

df1 <- data.frame(a = 1:3, col1 = 1:3, col2 = 3:5, col3 = 4:6, col4 = 2:4)
df2 <- data.frame(a = 1:3, col1 = 1:3, b = 1:3,col2 = 3:5, col3 = 4:6, col4 = 2:4)
all_input <- list(df1, df2)

all_input
#[[1]]
# a col1 col2 col3 col4
#1 1 1 3 4 2
#2 2 2 4 5 3
#3 3 3 5 6 4

#[[2]]
# a col1 b col2 col3 col4
#1 1 1 1 3 4 2
#2 2 2 2 4 5 3
#3 3 3 3 5 6 4

We can use lapply on the list and use match to replace column names.

lapply(all_input, function(x) {
names(x)[match(lookup_table$orignal_col, names(x))] <- lookup_table$new_col
x
})

#[[1]]
# a tribal_name st_usps_cd scc description
#1 1 1 3 4 2
#2 2 2 4 5 3
#3 3 3 5 6 4

#[[2]]
# a tribal_name b st_usps_cd scc description
#1 1 1 1 3 4 2
#2 2 2 2 4 5 3
#3 3 3 3 5 6 4

Notice how both the dataframes have some additional columns other than those common 4 columns but the name changes only for 4 columns and rest of them remain unchanged.

Renaming multiple columns in different data frames combined into one object

The lapply function will read all of your dataframes into a list object called dfs. The col_names argument is where you provide the names of your columns. skip is to ignore the first row, which has the wrong header names (delete this if you want to include the first row of your excel files).

dplyr::bind_rows will stack the list of dataframes into one tibble object.

sprintf("%s.xlsx", Sys.Date()) creates the file name using today's date. You can modify the output format using the format function (e.g. format(Sys.Date(), "%m-%d-%Y"). Then xlsx::write.xlsx outputs the dataframe. Note: it must be a dataframe not a tibble object which is why I used as.data.frame.

library(dplyr)
library(xlsx)
library(readxl)

# Provide a character vector of column names you want to col_names
dfs <- lapply(myfiles, readxl::read_excel, col_names = cols, skip = 1)
df <- dplyr::bind_rows(dfs)

xlsx::write.xlsx(as.data.frame(df), sprintf("%s.xlsx", Sys.Date()))

R - renaming multiple columns in multiple dataframes, using nested loop

The problem is that paste() returns a string, so your code is effectively doing things like:

names("Student1")[names("Student1")==oldnames[i]] = newnames[i]

but, of course, the string "Student1" isn't the same as the variable Student1 that contains your data frame, so this doesn't get you very far. The error message is a little confusing but ultimately means that you're trying to assign to something that can't be assigned to.

The simplest solution is to make use of the functions get() and assign() which take a string naming a variable (like the string "Student1") and allow you to retrieve and assign the variable. For example, this will rename one of the columns of Student1:

dfname = "Student1"
df = get(dfname)
names(df)[names(df)=="Name.1"] = "Name"
assign(dfname, df)

So, you can write:

for (j in 1:29) {
oldnames = c(paste('Name', j, sep="."),
paste('Nationality', j, sep="."),
paste('Membership.number', j, sep="."))
newnames = c("Name", "Nationality", "Membership.number")
dfname = paste("Student", j, sep="")
df = get(dfname)
for (i in 1:3) {
names(df)[names(df) == oldnames[i]] = newnames[i]
}
assign(dfname, df)
}

Note that I fixed the oldnames definition to use j instead of i and moved the definitions that depended only on j out of the inner loop. One caveat here is that this only works at "top level" (i.e., entered at the R prompt). If you put it in a function, then assign() gets trickier because you need to specify where you want the variable assigned (at the top level with the rest of the global variables, within the function, etc.).

This code can still be improved. It turns out that your definition of oldnames can be rewritten as:

oldnames = paste(c("Name","Nationality","Membership.number"), j, sep=".")

which means that you can actually write:

newnames = c("Name","Nationality","Membership.number")
oldnames = paste(newnames, j, sep=".")

You can go one step further and use the function match. This function gets the index of each of the elements of its first argument within its second argument and can be used to retrieve the positions of all the oldnames in the names() vector simultaneously. Then, you don't even need the inner loop:

for (j in 1:29) {
newnames = c("Name","Nationality","Membership.number")
oldnames = paste(newnames, j, sep=".")
dfname = paste("Student", j, sep="")
df = get(dfname)
names(df)[match(oldnames, names(df))] = newnames
assign(dfname, df)
}

This sort of use of match to find and replace values in a vector is a very common R technique.

Finally, if there aren't any other columns in the data frames (so you really just want to remove all suffixes that consist of a period and some digits from the end of all names), then a common trick in R is to use sub() to modify the names using regular expressions:

for (j in 1:29) {
newnames = c("Name","Nationality","Membership.number")
oldnames = paste(newnames, j, sep=".")
dfname = paste("Student", j, sep="")
df = get(dfname)
names(df) = sub("\\.[0-9]+$", "", names(df))
assign(dfname, df)
}

Note that, in R, backslashes in regular expressions need to be doubled up, so the above "\\." will match a period. I use this sub-based technique all the time when cleaning up datasets that have unwanted prefixes and suffixes on a bunch of column names.

Happy R-ing!

Rename variables for multiple dataframe, using a loop, referencing the dataframe names from a list

We can get the values of the object names with mget into a list, loop over the list with lapply, set the column names to replicated 'VALUE' (not recommended at all - as data.frame column names should be unique)

lst1 <- lapply(mget(df_list), function(x) setNames(x, rep("VALUE", ncol(x))))

R - Replace pattern in colnames in multiple dataframes

Try using this :

result <- lapply(mget(import_names_vector), function(x) 
setNames(x, gsub("_01", "1", colnames(x))))

Now to get these changed information in your individual dataframes use list2env.

list2env(result, .GlobalEnv)

Similar to your attempt you can do :

for (i in import_names_vector) {
assign(i, setNames(get(i), gsub("_01", "1", colnames(get(i)))))
}

Rename columns in multiple data sets R

We can name the output list and then use list2env to make changes in the objects that are present in the global env (not recommended though)

names(l) <- c('A', 'B')
list2env(l, .GlobalEnv)

-check the objects

A
# var_1 var_2
#1 1 1
#2 2 2
#3 3 3
#4 4 4

B
# var_3 var_4
#1 1 1
#2 3 2
#3 4 3
#4 7 4

data

A <- data.frame(`Var 1` = c(1,2,3,4), `Var 2` = c(1,2,3,4), check.names = FALSE)
B <- data.frame(`Var 3` = c(1,3,4,7), `Var 4` = c(1,2,3,4), check.names = FALSE)


Related Topics



Leave a reply



Submit