How Can R Loop Over Data Frames

Looping through list of data frames in R

> df1 <- data.frame("Row One"=x, "Row Two"=y)
> df2 <- data.frame("Row Two"=y,"Row One"=x)
> dfList <- list(df1,df2)
> lapply(dfList, function(x) {
names(x)[ grep("One", names(x))] <- "R1"
names(x)[ grep("Two", names(x))] <- "R2"
x} )
[[1]]
R1 R2
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5

[[2]]
R2 R1
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5

How to loop through numbered dataframes in R environment. I have to loop through 22 (potentially 22*6) dataframes in R

You can use get(object_name) to get an object by name

for (i in time) {
df <- get(paste0("y_V_", i))
}

Will get the dataframe y_V_{i} where i is the time index.
You can do the letter as well:

for (i in time) {
for (l in letter_vector) {
df <- get(paste0("y_", l, "_", i))
}
}

Will write y_{l}_{i} to df, given that they all exist. That's up to you


Edit: use assign to write to a pasted object name

for (i in time) {
for (l in letter_vector) {
df <- get(paste0("y_", l, "_", i))
assign(paste0("df_", l, "_", i), df)
}
}

Second edit. You can write the dataframes to a list:

# first initialize the list
list_with_dfs <- list()

for (i in time) {
for (l in letter_vector) {
df <- get(paste0("y_", l, "_", i))
assign(paste0("df_", l, "_", i), df)

# Then write to the list
list_with_dfs[[length(list_with_dfs) + 1]] <- get(paste0("df_", l, "_", i))

# Or just use the df
list_with_dfs[[length(list_with_dfs) + 1]] <- df
}
}

How can R loop over data frames?

Put all your data frames into a list, and then loop/lapply over them. It'll be much easier on you in the long run.

dfList <- list(df1=df1, df2=df2, ....)

dfList <- lapply(dfList, function(df) {
df$gender[df$prefix == "Mrs."] <- "F"
df
})

dfList$df1

R loop through columns in list of data frames

You need to define j and df_target inside the function, and set what should it return (as it is now, it makes the calculation of df_target, but doesn't return's it):

fnc <- function(x){
df_target <- NULL
j <- 1
for(i in seq(2, 7, 2)) {
df_target[[j]] <- (x[i]*x[i+1])/(sum(x[i+1]))
j <- j+1
}
return(df_target)
}

But keep in mind that this will output a matrix of lists, as for each element of df.list that sapply will select, you'll be creating a 3 element list of df_target, so the output will look like this in the console:

> sapply(df.list, fnc)
df2010 df2011 df2012 df2013
[1,] List,1 List,1 List,1 List,1
[2,] List,1 List,1 List,1 List,1
[3,] List,1 List,1 List,1 List,1

But will be this:

Sample Image

To get a cleaner output, we can set df_target to create a data frame with the values from each year:

fnc <- function(x){
df_target <- as.data.frame(matrix(nrow=nrow(x), ncol=3))
for(i in seq(2, 7, 2)) {
df_target[,i/2] <- (x[i]*x[i+1])/(sum(x[i+1]))
}
return(df_target)}

This returns a df per year, but if we use sapply we'll get a similar output of matrix of lists, so its better to define the function to already loop trough every year:

fnc <- function(y){
df_target.list <- list()
k=1
for(j in y){
df_target <- as.data.frame(matrix(nrow=nrow(j), ncol=3))
for(i in seq(2, 7, 2)) {
df_target[,i/2] <- (j[i]*j[i+1])/(sum(j[i+1]))
}
df_target.list[[names(y)[k]]] = df_target
k=k+1
}
return(df_target.list)}

Output:

> fnc(df.list)
$df2010
V1 V2 V3
1 -0.10971160 0.01688244 -0.16339367
2 0.05440564 0.57554210 -0.06803244
3 0.03185178 0.90598561 -0.68692401

$df2011
V1 V2 V3
1 -0.43090055 0.007152131 0.3930606
2 0.15050644 0.329092942 -0.1367295
3 0.07336839 -0.423631930 -0.1504056

$df2012
V1 V2 V3
1 0.5540294 0.4561862 0.09169914
2 0.1153931 -1.1311450 0.81853691

$df2013
V1 V2 V3
1 0.4322934 0.5286973 0.2136495
2 -0.2412705 0.1316942 0.1455196

Filter data in loop over vector and bind data frames

I think you've got a typo/error in your filter; do you get the correct output when you change "block" to "value" in your grepl? E.g.

library(tidyverse)
area <- data.frame(
land = c("68N03E220090", "68N03E244635", "68N03E244352", "68N03E223241"),
type = c("home", "mobile", "home", "vacant"),
object_id = c(NA, 7, NA, 34)
)

block <- c("68N03E22", "68N03E24")

datalist = list()

for (value in block){
df <- area %>% filter(is.na(object_id) & grepl(paste0("^", value),land))
df$value <- value
datalist[[value]] <- df # add it to your list
}

df_filtered <- dplyr::bind_rows(datalist)

df_filtered
#> land type object_id value
#> 1 68N03E220090 home NA 68N03E22
#> 2 68N03E244352 home NA 68N03E24

For this example, you could also avoid the for-loop by using:

df_filtered_2 <- area %>%
filter(is.na(object_id) & grepl(pattern = paste0(block, collapse = "|"), x = land)) %>%
mutate(value = str_sub(land, 1, 8))

identical(df_filtered, df_filtered_2)
#> [1] TRUE

use dplyr mutate() inside for loop over multiple data frames

I don't know if I understood your problem correctly. However, he will try to help. But let us do it in R-style, without unnecessary for loops and keeping the topic as simple as possible. However, since this approach may be somewhat obscure for you, let me guide you through it step by step.

Let's start with data preparation in one data frame or rather tibble (better data frame)

library(tidyverse)
df = tibble(
d = paste0("d", 1:4),
data = list(
tibble(P=c(1,5,2,3,4,7,5,6,7), E=c(4,5,6,4,5,6,4,5,6)),
tibble(P=c(0,9,8,5,4,7,5), E=c(6,5,4,6,5,4,5)),
tibble(P=c(6,5,4,6,5,4,6,5,4), E=c(3,2,1,5,5,5,5,5,5)),
tibble(P=c(5,9,9,5,2,2,1,8,5,7,6,5),E=c(8,8,8,8,8,8,8,8,8,8,8,8))
)
)
df

output

# A tibble: 4 x 2
d data
<chr> <list>
1 d1 <tibble [9 x 4]>
2 d2 <tibble [7 x 4]>
3 d3 <tibble [9 x 4]>
4 d4 <tibble [12 x 4]>

I know it can be a little confusing. But look at what is df$data[[1]]

data=df$data[[1]]
data

output

# A tibble: 9 x 2
P E
<dbl> <dbl>
1 1 4
2 5 5
3 2 6
4 3 4
5 4 5
6 7 6
7 5 4
8 6 5
9 7 6

As you can see, it's a data frame in a data frame.

Now let's do the first function that adds your measurement time to such a data frame in the data frame.

add_ts = function(data) data %>% mutate(tswm = seq(1, length(data$P))*30)

See how simple it can be. So let's test it.

add_ts(data) 
# A tibble: 9 x 3
P E tswm
<dbl> <dbl> <dbl>
1 1 4 30
2 5 5 60
3 2 6 90
4 3 4 120
5 4 5 150
6 7 6 180
7 5 4 210
8 6 5 240
9 7 6 270

Did you expect it. I think so. So let's do a second function that adds test time. This one is just a tad more difficult.

add_tsd = function(data){
tsdidx = (which(data$P==5)[1]):(which(data$P==5)[2]-1)
data = data %>% mutate(tstd = NA)
data$tstd[tsdidx]=seq(1,length(tsdidx))*30
data
}

Let's test it right away

add_tsd(data)
# A tibble: 9 x 3
P E tstd
<dbl> <dbl> <dbl>
1 1 4 NA
2 5 5 30
3 2 6 60
4 3 4 90
5 4 5 120
6 7 6 150
7 5 4 NA
8 6 5 NA
9 7 6 NA

Combine these two functions into one

add_ts_tsd = function(data) add_ts(data) %>% add_tsd()
add_ts_tsd(data)

# A tibble: 9 x 4
P E tswm tstd
<dbl> <dbl> <dbl> <dbl>
1 1 4 30 NA
2 5 5 60 30
3 2 6 90 60
4 3 4 120 90
5 4 5 150 120
6 7 6 180 150
7 5 4 210 NA
8 6 5 240 NA
9 7 6 270 NA

We're doing fantastic. Well, let's do this now

df %>% mutate(data = map(data, add_ts_tsd)) 
# A tibble: 4 x 2
d data
<chr> <list>
1 d1 <tibble [9 x 4]>
2 d2 <tibble [7 x 4]>
3 d3 <tibble [9 x 4]>
4 d4 <tibble [12 x 4]>

Hmm, can't you see anything? Well, let's get these internal data frame.

df %>% mutate(data = map(data, add_ts_tsd)) %>% unnest(data)

# A tibble: 37 x 5
d P E tswm tstd
<chr> <dbl> <dbl> <dbl> <dbl>
1 d1 1 4 30 NA
2 d1 5 5 60 30
3 d1 2 6 90 60
4 d1 3 4 120 90
5 d1 4 5 150 120
6 d1 7 6 180 150
7 d1 5 4 210 NA
8 d1 6 5 240 NA
9 d1 7 6 270 NA
10 d2 0 6 30 NA

Bingo! Task completed. Simple and elegant. First of all, legibly.

Loop over multiple data frames

We can loop through columns, bind them, and keep the resulting 8 dataframes in a list:

res <- lapply(1:8, function(i){ cbind(data1[i], data2[i], data3[i], data4[i]) })

Loop through a list of dataframes to create dataframes in R

You should give your demo data frame definitely an "ID" column as well! Then you do not have to hope that the demographics are correctly assigned to the observations, especially if the script is still changing during the work process. That may easily be done using transform (I simply use the consecutive ID's 1:3 here in the example).

res <- lapply(list(df1, df2, df3, df4), merge, transform(demo, ID=1:3))
res
# [[1]]
# ID b c df sex age vital_sts
# 1 1 x gh z m 30 a
# 2 2 y fg x m 50 a
# 3 3 z xv y f 62 d
#
# [[2]]
# ID v hg fd sex age vital_sts
# 1 1 a yty z m 30 a
# 2 2 mm zc x m 50 a
# 3 3 xc cx y f 62 d
#
# [[3]]
# ID t j sd sex age vital_sts
# 1 1 ae ewr z m 30 a
# 2 2 yw zd x m 50 a
# 3 3 zs x y f 62 d
#
# [[4]]
# ID u k f sex age vital_sts
# 1 1 df df z m 30 a
# 2 2 y zs x m 50 a
# 3 3 z xf y f 62 d

If you have gazillions of data frames in your workspace, as it looks like, you may list by pattern using mget(ls(pattern=)). (Or better yet, change your code to get them in a list in the first place.)

lapply(mget(ls(pat='^df\\d+')), merge, transform(demo, ID=1:3))

Edit

If I understand you correctly, according to your comment you have a large data frame DAT from which you want to assemble smaller data frames of variable groups and merge the demo to them. In this case I would put the variable names of these groups in a named list vgroups. Next, lapply over it to simultaneously subset dat with "ID" concatenated and merge it to demo.

demo still should have an "ID", because you don't want to trust, all rows are sorted in the same order, just consider for example sort(c(3, 10, 1, 100)) vs. sort(as.character(c(3, 10, 1, 100))) or omitted rows for whatever reason etc.

demo <- transform(demo, ID=1:3)  ## identify demo observations

vgroups <- list(g1=c("b", "c", "df"), g2=c("v", "hg", "fd"), g3=c("t", "j", "sd"),
g4=c("u", "k", "f"))

res1 <- lapply(vgroups, \(x) merge(demo, DAT[, c('ID', x)], by="ID"))
## saying by ID is even more save --^
res1
# $g1
# ID sex age vital_sts b c df
# 1 1 m 30 a x gh z
# 2 2 m 50 a y fg x
# 3 3 f 62 d z xv y
#
# $g2
# ID sex age vital_sts v hg fd
# 1 1 m 30 a a yty z
# 2 2 m 50 a mm zc x
# 3 3 f 62 d xc cx y
#
# $g3
# ID sex age vital_sts t j sd
# 1 1 m 30 a ae ewr z
# 2 2 m 50 a yw zd x
# 3 3 f 62 d zs x y
#
# $g4
# ID sex age vital_sts u k f
# 1 1 m 30 a df df z
# 2 2 m 50 a y zs x
# 3 3 f 62 d z xf y

Access individual data frames:

res1$g1
# ID sex age vital_sts b c df
# 1 1 m 30 a x gh z
# 2 2 m 50 a y fg x
# 3 3 f 62 d z xv y

If you still want the individual data frames in your environment, use list2env:

list2env(res1)
ls()
# [1] "DAT" "demo" "res1" "vgroups"

Data:

DAT <- structure(list(ID = 1:3, b = c("x", "y", "z"), c = c("gh", "fg", 
"xv"), df = c("z", "x", "y"), f = c("z", "x", "y"), fd = c("z",
"x", "y"), hg = c("yty", "zc", "cx"), j = c("ewr", "zd", "x"),
k = c("df", "zs", "xf"), sd = c("z", "x", "y"), t = c("ae",
"yw", "zs"), u = c("df", "y", "z"), v = c("a", "mm", "xc"
), x1 = c("gs", "gs", "gs"), x2 = c("cs", "cs", "cs"), x3 = c("tv",
"tv", "tv"), x4 = c("fb", "fb", "fb")), row.names = c(NA,
-3L), class = "data.frame")

demo <- data.frame(sex = c('m', 'm', 'f'), age = c('30', '50', '62'), vital_sts = c('a', 'a', 'd'))

Loop over data frames

As jogo said in the comments, it is way better to work on your dataframes in a list context. Else, you can use get() and assign() as so:

years = c("2000","2001","2002")  # vector containing the years
for (i in years){
aux = get(paste0("df",i)) # get the variable from the environment (e.g. df2000)
aux["Year"] = i # update the "Year" field
assign(paste0("df",i),aux) # assign it again to the global environment
}

loop over rows of a data.frame and use them as input for a function

Assuming that the foo function is more complicated than just b-a you can use Map.

Map(foo, input$a, input$b)


Related Topics



Leave a reply



Submit