Loop Through Data Frame and Variable Names

Loop through data frame and variable names

To further add to Beasterfield's answer, it seems like you want to do some number of complex operations on each of the data frames.

It is possible to have complex functions within an apply statement. So where you now have:

for (i in dflist) {
# Do some complex things
}

This can be translated to:

lapply(dflist, function(df) {
# Do some complex operations on each data frame, df
# More steps

# Make sure the last thing is NULL. The last statement within the function will be
# returned to lapply, which will try to combine these as a list across all data frames.
# You don't actually care about this, you just want to run the function.
NULL
})

A more concrete example using plot:

# Assuming we have a data frame with our points on the x, and y axes,
lapply(dflist, function(df) {
x2 <- df$x^2
log_y <- log(df$y)
plot(x,y)
NULL
})

You can also write complex functions which take multiple arguments:

lapply(dflist, function(df, arg1, arg2) {
# Do something on each data.frame, df
# arg1 == 1, arg2 == 2 (see next line)
}, 1, 2) # extra arguments are passed in here

Hope this helps you out!

Loop through dataframe column names - R

To answer the exact question and fix the code given, see the example below

df <- iris # data

for (i in colnames(df)){
print(class(df[[i]]))
}
# [1] "numeric"
# [1] "numeric"
# [1] "numeric"
# [1] "numeric"
# [1] "factor"
  1. you need to used colnames to get the column names of df.
  2. you access each column using df[[i]] if you want to know the class of that. df[i] is of class data.frame.

Make a dataframe column with names of variables in the for loop

You can leverage on the locals() function that acts like a dict to hold the local variables:

abc = [1, 2, 3, 4]
def1 = [5, 6, 7, 8] # rename 'def' to 'def1' since 'def' is a Python keyword

df = pd.DataFrame()

for i in ['abc', 'def1']:
df[i] = locals()[i]

Here, locals()['abc'] will resolve to the variable name abc which holds [1, 2, 3, 4]. Thus, effectively it is the same as running the following code for the first iteration of the for loop:

df['abc] = [1, 2, 3, 4] 

Result:

print(df)

abc def1
0 1 5
1 2 6
2 3 7
3 4 8

Use a variable in a loop to name a data frame in R

You're overwriting write_name with each evaluation of the loop, so you're probably getting the contents of only the last lake name (Tikitapu). The lazy evaluation used in for loops probably isn't helping either. One way to fix this is to convert the for loop to an lapply:

library(tidyverse)

listOfTibbles <- lapply(
1:8,
function(i) {
name <- Lake_names[i]
read_file <- gsub(" ", "", paste("Data\\", name, "_daily_levels.csv"))
write_name <- gsub(" ", "", paste(name, "_daily_levels"))
# Note the edit to the next line: you need a return value.
as_tibble(read.csv(read_file)) # read file with lake elevations
}
)

lapply returns a list, each element of the list is the result of applying the function defined in its second argument to the elements of the first argument in turn. So you should get a list of eight tibbles from this code. To combine the list of tibbles into one tibble, you can

oneBigTibble <- listOfTibbles %>% bind_rows()

This is untested code as I don't have access to your CSV files.

Run a loop to generate variable names in Python

Use the inbuilt glob package

from glob import glob

fullpath = f'C:\Users\siddhn\Desktop\phone[1-6].csv'
dfs = [pd.read_csv(file) for file in glob(fullpath)]

print(dfs[0])

How to iterate over columns of pandas dataframe to run regression

for column in df:
print(df[column])

How to use a for loop to extract columns from a data frame

If you must do it with a for loop, you could work off this:

new <- list()      # construct as list -- data.frames are fancy lists
cols <- c(1, 5, 3) # use a vector of column indices
for (i in seq_along(cols)) {
# append the list at each column
new[[i]] <- mtcars[, cols[i], drop = FALSE]
}

new <- as.data.frame(new) # make list into data.frame
identical(new, mtcars[, cols]) # check that this produces the same thing
#> [1] TRUE
head(new)
#> mpg drat disp
#> Mazda RX4 21.0 3.90 160
#> Mazda RX4 Wag 21.0 3.90 160
#> Datsun 710 22.8 3.85 108
#> Hornet 4 Drive 21.4 3.08 258
#> Hornet Sportabout 18.7 3.15 360
#> Valiant 18.1 2.76 225
str(new)
#> 'data.frame': 32 obs. of 3 variables:
#> $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#> $ disp: num 160 160 108 258 360 ...

Created on 2022-05-20 by the reprex package (v2.0.1)

Edits

With more information, the below should work. However, the for loops don't seem necessary and the apply family functions seem good enough. Hopefully if a for loop is necessary for you process then the combination of these will be enough to get you what you need.

data <- Reduce(
cbind,
lapply(
1:20,
function(i) {
out <- data.frame(
id = order(runif(5)),
event = runif(5) < .5,
other_col = runif(5)
)
colnames(out) <- paste0(colnames(out), i)
out
}
)
)

# just a quick peak
str(data[, c(1:3, 9:12, 21:24)])
#> 'data.frame': 5 obs. of 11 variables:
#> $ id1 : int 3 2 1 4 5
#> $ event1 : logi FALSE FALSE TRUE TRUE FALSE
#> $ other_col1: num 0.617 0.951 0.511 0.185 0.667
#> $ other_col3: num 0.6856 0.0524 0.5786 0.9265 0.2291
#> $ id4 : int 4 2 1 5 3
#> $ event4 : logi TRUE TRUE FALSE FALSE FALSE
#> $ other_col4: num 0.0849 0.8345 0.8465 0.1958 0.2534
#> $ other_col7: num 0.656 0.353 0.604 0.973 0.381
#> $ id8 : int 2 3 5 4 1
#> $ event8 : logi TRUE FALSE FALSE TRUE TRUE
#> $ other_col8: num 0.646 0.693 0.534 0.624 0.625

result <- lapply(1:20, function(i) {
# make pattern (must have letters before number)
pattern <- paste0("[a-z]", i, "$")

# find the column indeces that match the pattern
ind <- grep(pattern, colnames(data))

# extract those indices
res <- data[, ind, ]

# optional: rename columns
colnames(res) <- sub(paste0(i, "$"), "", colnames(res))
res
})

head(result)
#> [[1]]
#> id event other_col
#> 1 3 FALSE 0.6174577
#> 2 2 FALSE 0.9509916
#> 3 1 TRUE 0.5107370
#> 4 4 TRUE 0.1851543
#> 5 5 FALSE 0.6670226
#>
#> [[2]]
#> id event other_col
#> 1 3 TRUE 0.8261719
#> 2 4 FALSE 0.4171351
#> 3 1 TRUE 0.5640345
#> 4 5 TRUE 0.6825371
#> 5 2 FALSE 0.4381013
#>
#> [[3]]
#> id event other_col
#> 1 4 FALSE 0.68559712
#> 2 3 FALSE 0.05241906
#> 3 2 FALSE 0.57857342
#> 4 1 TRUE 0.92649458
#> 5 5 TRUE 0.22908630
#>
#> [[4]]
#> id event other_col
#> 1 4 TRUE 0.08491369
#> 2 2 TRUE 0.83452439
#> 3 1 FALSE 0.84650621
#> 4 5 FALSE 0.19578470
#> 5 3 FALSE 0.25342999
#>
#> [[5]]
#> id event other_col
#> 1 4 FALSE 0.8912857
#> 2 1 FALSE 0.1261470
#> 3 3 FALSE 0.7962369
#> 4 5 TRUE 0.3911494
#> 5 2 FALSE 0.6041862
#>
#> [[6]]
#> id event other_col
#> 1 4 TRUE 0.8987728
#> 2 2 TRUE 0.2830371
#> 3 5 FALSE 0.6696249
#> 4 3 FALSE 0.6249742
#> 5 1 FALSE 0.4754757

Created on 2022-05-22 by the reprex package (v2.0.1)

Loop through variable names in R

Let's pretend that df is your data and first 15 columns are answers.
In this case you can use this

lapply(df[,1:15], function(x) {chisq.test(x, df$Sex)}) 


Related Topics



Leave a reply



Submit