Changing Column names of multiple data frames using a for loop with data frames loaded into a List
L <- lapply(L, function(x){
colnames(x) <- c("NewName1", "NewName2")
x
} )
Rename specific columns across multiple data sets
This is what you are looking for:
dat=lapply(list(df2=df2,df3=df3),function(x){names(x)[which(names(x)=="help")]="var2";x})
list2env(dat,.GlobalEnv)
How to change the names of columns in multiple dataframes in a list based on conditions?
If all the 6 columns that you want are present in all the dataframes, the lookup approach sounds good to me.
Here is an example to do it with 4 columns.
Creating a lookup table and some fake data
lookup_table <- data.frame(orignal_col = c('col1', 'col2', 'col3', 'col4'),
new_col = c( "tribal_name", "st_usps_cd", "scc", "description"))
df1 <- data.frame(a = 1:3, col1 = 1:3, col2 = 3:5, col3 = 4:6, col4 = 2:4)
df2 <- data.frame(a = 1:3, col1 = 1:3, b = 1:3,col2 = 3:5, col3 = 4:6, col4 = 2:4)
all_input <- list(df1, df2)
all_input
#[[1]]
# a col1 col2 col3 col4
#1 1 1 3 4 2
#2 2 2 4 5 3
#3 3 3 5 6 4
#[[2]]
# a col1 b col2 col3 col4
#1 1 1 1 3 4 2
#2 2 2 2 4 5 3
#3 3 3 3 5 6 4
We can use lapply
on the list and use match
to replace column names.
lapply(all_input, function(x) {
names(x)[match(lookup_table$orignal_col, names(x))] <- lookup_table$new_col
x
})
#[[1]]
# a tribal_name st_usps_cd scc description
#1 1 1 3 4 2
#2 2 2 4 5 3
#3 3 3 5 6 4
#[[2]]
# a tribal_name b st_usps_cd scc description
#1 1 1 1 3 4 2
#2 2 2 2 4 5 3
#3 3 3 3 5 6 4
Notice how both the dataframes have some additional columns other than those common 4 columns but the name changes only for 4 columns and rest of them remain unchanged.
Renaming multiple columns in different data frames combined into one object
The lapply
function will read all of your dataframes into a list object called dfs
. The col_names
argument is where you provide the names of your columns. skip
is to ignore the first row, which has the wrong header names (delete this if you want to include the first row of your excel files).
dplyr::bind_rows
will stack the list of dataframes into one tibble object.
sprintf("%s.xlsx", Sys.Date())
creates the file name using today's date. You can modify the output format using the format
function (e.g. format(Sys.Date(), "%m-%d-%Y"
). Then xlsx::write.xlsx
outputs the dataframe. Note: it must be a dataframe not a tibble object which is why I used as.data.frame
.
library(dplyr)
library(xlsx)
library(readxl)
# Provide a character vector of column names you want to col_names
dfs <- lapply(myfiles, readxl::read_excel, col_names = cols, skip = 1)
df <- dplyr::bind_rows(dfs)
xlsx::write.xlsx(as.data.frame(df), sprintf("%s.xlsx", Sys.Date()))
R - renaming multiple columns in multiple dataframes, using nested loop
The problem is that paste()
returns a string, so your code is effectively doing things like:
names("Student1")[names("Student1")==oldnames[i]] = newnames[i]
but, of course, the string "Student1"
isn't the same as the variable Student1
that contains your data frame, so this doesn't get you very far. The error message is a little confusing but ultimately means that you're trying to assign to something that can't be assigned to.
The simplest solution is to make use of the functions get()
and assign()
which take a string naming a variable (like the string "Student1"
) and allow you to retrieve and assign the variable. For example, this will rename one of the columns of Student1
:
dfname = "Student1"
df = get(dfname)
names(df)[names(df)=="Name.1"] = "Name"
assign(dfname, df)
So, you can write:
for (j in 1:29) {
oldnames = c(paste('Name', j, sep="."),
paste('Nationality', j, sep="."),
paste('Membership.number', j, sep="."))
newnames = c("Name", "Nationality", "Membership.number")
dfname = paste("Student", j, sep="")
df = get(dfname)
for (i in 1:3) {
names(df)[names(df) == oldnames[i]] = newnames[i]
}
assign(dfname, df)
}
Note that I fixed the oldnames
definition to use j
instead of i
and moved the definitions that depended only on j
out of the inner loop. One caveat here is that this only works at "top level" (i.e., entered at the R prompt). If you put it in a function, then assign()
gets trickier because you need to specify where you want the variable assigned (at the top level with the rest of the global variables, within the function, etc.).
This code can still be improved. It turns out that your definition of oldnames
can be rewritten as:
oldnames = paste(c("Name","Nationality","Membership.number"), j, sep=".")
which means that you can actually write:
newnames = c("Name","Nationality","Membership.number")
oldnames = paste(newnames, j, sep=".")
You can go one step further and use the function match
. This function gets the index of each of the elements of its first argument within its second argument and can be used to retrieve the positions of all the oldnames
in the names()
vector simultaneously. Then, you don't even need the inner loop:
for (j in 1:29) {
newnames = c("Name","Nationality","Membership.number")
oldnames = paste(newnames, j, sep=".")
dfname = paste("Student", j, sep="")
df = get(dfname)
names(df)[match(oldnames, names(df))] = newnames
assign(dfname, df)
}
This sort of use of match
to find and replace values in a vector is a very common R technique.
Finally, if there aren't any other columns in the data frames (so you really just want to remove all suffixes that consist of a period and some digits from the end of all names), then a common trick in R is to use sub()
to modify the names using regular expressions:
for (j in 1:29) {
newnames = c("Name","Nationality","Membership.number")
oldnames = paste(newnames, j, sep=".")
dfname = paste("Student", j, sep="")
df = get(dfname)
names(df) = sub("\\.[0-9]+$", "", names(df))
assign(dfname, df)
}
Note that, in R, backslashes in regular expressions need to be doubled up, so the above "\\."
will match a period. I use this sub
-based technique all the time when cleaning up datasets that have unwanted prefixes and suffixes on a bunch of column names.
Happy R-ing!
Rename variables for multiple dataframe, using a loop, referencing the dataframe names from a list
We can get the values of the object names with mget
into a list
, loop over the list
with lapply
, set the column names to rep
licated 'VALUE' (not recommended at all - as data.frame
column names should be unique)
lst1 <- lapply(mget(df_list), function(x) setNames(x, rep("VALUE", ncol(x))))
R - Replace pattern in colnames in multiple dataframes
Try using this :
result <- lapply(mget(import_names_vector), function(x)
setNames(x, gsub("_01", "1", colnames(x))))
Now to get these changed information in your individual dataframes use list2env
.
list2env(result, .GlobalEnv)
Similar to your attempt you can do :
for (i in import_names_vector) {
assign(i, setNames(get(i), gsub("_01", "1", colnames(get(i)))))
}
Rename columns in multiple data sets R
We can name the output list
and then use list2env
to make changes in the objects that are present in the global env (not recommended though)
names(l) <- c('A', 'B')
list2env(l, .GlobalEnv)
-check the objects
A
# var_1 var_2
#1 1 1
#2 2 2
#3 3 3
#4 4 4
B
# var_3 var_4
#1 1 1
#2 3 2
#3 4 3
#4 7 4
data
A <- data.frame(`Var 1` = c(1,2,3,4), `Var 2` = c(1,2,3,4), check.names = FALSE)
B <- data.frame(`Var 3` = c(1,3,4,7), `Var 4` = c(1,2,3,4), check.names = FALSE)
Related Topics
Variable Assignment Within a For-Loop
Importing Multiple Excel Files with Filenames in R
Ggplot2 and Geom_Density: How to Remove Baseline
Error in Terms.Formula(Formula):'.' in Formula and No 'Data' Argument
Locator Equivalent in Ggplot2 (For Maps)
How to Create a Vector of Functions
R 'Inf' When It Has Class 'Date' Is Printing 'Na'
Understanding Ddply Error Message - Argument "By" Is Missing, with No Default
Replace Nas with Mean of the Same Column of a Data.Table
Replicate a List to Create a List-Of-Lists
Determine Season from Date Using Lubridate in R
Remove Weekend Data in a Dataframe
List and Description of All Packages in Cran from Within R
How to Add Gaussian Curve to Histogram Created with Qplot
R, Conditionally Remove Duplicate Rows
How to Plot Pie Charts in Haplonet Haplotype Networks {Pegas}
R Error: Unknown Timezone with As.Posixct()
How to Import Only One Function from Another Package, Without Loading the Entire Namespace