How to Loop Through List and Create Separate Dataframes in R

how to loop through list and create separate dataframes in R

Your existing code creates an object called migr, and assigns it a string with the name of the data.frame you want to create. Then you overwrite the the migr object with the data.frame that you pull from Census. Each iteration of the loop, you overwrite migr, which is why only the data from the last iteration of the loop is saved, and then only as a data.frame named migr.

Instead, you need to use the assign command to assign the data you pull from Census to the value stored in migr, as follows:

library(censusapi)

states <- c("01","02")
for(i in 1:length(states)) {
   region = str_glue("state:{states[i]}")
   migr = str_glue("migr2010_{states[i]}")
   assign(
     x = migr,
     value = getCensus(name = "acs/flows", vintage = 2010,
                       key = "*myAPIkey*",
                       vars = c("MOVEDNET", "MOVEDIN", "MOVEDOUT", "AGE"),
                       region = "county:*", regionin = region)
   )
}

Edit

As others have mentioned, it may be easier to work with a list of data.frames, rather than creating several in the global environment. The easiest way to create that is using lapply, as follows:

 migr2010 <- lapply(
   paste0("state:", c("01", "02")),  # replaces region in the original
   getCensus,
   name = "acs/flows",
   vintage = 2010,
   key = "*myAPIkey*",
   vars = c("MOVEDNET", "MOVEDIN", "MOVEDOUT", "AGE"),
   region = "county:*"
   )

Then, if you want to create a single data.frame out of those, you could use dplyr::bind_rows(migr2010), data.table::rbindlist(migr2010), or do.call(rbind, migr2010) (although do.call is much slower than the other two).

Loop through a list of dataframes to create dataframes in R

You should give your demo data frame definitely an "ID" column as well! Then you do not have to hope that the demographics are correctly assigned to the observations, especially if the script is still changing during the work process. That may easily be done using transform (I simply use the consecutive ID's 1:3 here in the example).

res <- lapply(list(df1, df2, df3, df4), merge, transform(demo, ID=1:3))
res
# [[1]]
#   ID b  c df sex age vital_sts
# 1  1 x gh  z   m  30         a
# 2  2 y fg  x   m  50         a
# 3  3 z xv  y   f  62         d
# 
# [[2]]
#   ID  v  hg fd sex age vital_sts
# 1  1  a yty  z   m  30         a
# 2  2 mm  zc  x   m  50         a
# 3  3 xc  cx  y   f  62         d
# 
# [[3]]
#   ID  t   j sd sex age vital_sts
# 1  1 ae ewr  z   m  30         a
# 2  2 yw  zd  x   m  50         a
# 3  3 zs   x  y   f  62         d
# 
# [[4]]
#   ID  u  k f sex age vital_sts
# 1  1 df df z   m  30         a
# 2  2  y zs x   m  50         a
# 3  3  z xf y   f  62         d

If you have gazillions of data frames in your workspace, as it looks like, you may list by pattern using mget(ls(pattern=)). (Or better yet, change your code to get them in a list in the first place.)

lapply(mget(ls(pat='^df\\d+')), merge, transform(demo, ID=1:3))

Edit

If I understand you correctly, according to your comment you have a large data frame DAT from which you want to assemble smaller data frames of variable groups and merge the demo to them. In this case I would put the variable names of these groups in a named list vgroups. Next, lapply over it to simultaneously subset dat with "ID" concatenated and merge it to demo.

demo still should have an "ID", because you don't want to trust, all rows are sorted in the same order, just consider for example sort(c(3, 10, 1, 100)) vs. sort(as.character(c(3, 10, 1, 100))) or omitted rows for whatever reason etc.

demo <- transform(demo, ID=1:3)  ## identify demo observations

vgroups <- list(g1=c("b", "c", "df"), g2=c("v", "hg", "fd"), g3=c("t", "j", "sd"),
               g4=c("u", "k", "f"))

res1 <- lapply(vgroups, \(x) merge(demo, DAT[, c('ID', x)], by="ID"))  
                          ## saying by ID is even more save --^
res1
# $g1
#   ID sex age vital_sts b  c df
# 1  1   m  30         a x gh  z
# 2  2   m  50         a y fg  x
# 3  3   f  62         d z xv  y
# 
# $g2
#   ID sex age vital_sts  v  hg fd
# 1  1   m  30         a  a yty  z
# 2  2   m  50         a mm  zc  x
# 3  3   f  62         d xc  cx  y
# 
# $g3
#   ID sex age vital_sts  t   j sd
# 1  1   m  30         a ae ewr  z
# 2  2   m  50         a yw  zd  x
# 3  3   f  62         d zs   x  y
# 
# $g4
#   ID sex age vital_sts  u  k f
# 1  1   m  30         a df df z
# 2  2   m  50         a  y zs x
# 3  3   f  62         d  z xf y

Access individual data frames:

res1$g1
#   ID sex age vital_sts b  c df
# 1  1   m  30         a x gh  z
# 2  2   m  50         a y fg  x
# 3  3   f  62         d z xv  y

If you still want the individual data frames in your environment, use list2env:

list2env(res1)
ls()
# [1] "DAT"     "demo"    "res1"    "vgroups"

Data:

DAT <- structure(list(ID = 1:3, b = c("x", "y", "z"), c = c("gh", "fg", 
"xv"), df = c("z", "x", "y"), f = c("z", "x", "y"), fd = c("z", 
"x", "y"), hg = c("yty", "zc", "cx"), j = c("ewr", "zd", "x"), 
    k = c("df", "zs", "xf"), sd = c("z", "x", "y"), t = c("ae", 
    "yw", "zs"), u = c("df", "y", "z"), v = c("a", "mm", "xc"
    ), x1 = c("gs", "gs", "gs"), x2 = c("cs", "cs", "cs"), x3 = c("tv", 
    "tv", "tv"), x4 = c("fb", "fb", "fb")), row.names = c(NA, 
-3L), class = "data.frame")

demo <- data.frame(sex = c('m', 'm', 'f'), age = c('30', '50', '62'), vital_sts = c('a', 'a', 'd'))

Using a loop to create multiple data frames in R

You can save your data.frames into a list by setting up the function as follows:

getstats<- function(games){

  listofdfs <- list() #Create a list in which you intend to save your df's.

  for(i in 1:length(games)){ #Loop through the numbers of ID's instead of the ID's

    #You are going to use games[i] instead of i to get the ID
    url<- paste("http://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&
                EndRange=14400&GameID=",games[i],"&RangeType=2&Season=2015-16&SeasonType=
                Regular+Season&StartPeriod=1&StartRange=0000",sep = "")
    json_data<- fromJSON(paste(readLines(url), collapse=""))
    df<- data.frame(json_data$resultSets[1, "rowSet"])
    names(df)<-unlist(json_data$resultSets[1,"headers"])
    listofdfs[[i]] <- df # save your dataframes into the list
  }

  return(listofdfs) #Return the list of dataframes.
}

gameids<- as.character(c(0021500580:0021500593))
getstats(games = gameids)

Please note that I could not test this because the URLs do not seem to be working properly. I get the connection error below:

Error in file(con, "r") : cannot open the connection

How to loop through a list and create a data frame

This worked for me with a bunch of CSV files with mock data.

Team <- list.files("c:\\Test\\Teams\\", full.names=TRUE)

Team_Split <- data.frame()
print(Team)

for (Team_File in Team) {
  xl <-
    read.csv(Team_File) #Reads the csv from the first file path
  y <- ncol(x1) #creates object with number of columns
  #If statement to standarise number of columns so can bind
  if (y == "37") {
    x1 <- Add_5_Col(x1)
  } else if (y == "38") {
    x1 <- Add_4_Col(x1)
  } else if (y == "39") {
    x1 <- Add_3_Col(x1)
  }
  # Sets Team_Split to xl if it's the first set of data 
  # or binds Team_Split and xl
  print(xl)
  
  if (nrow(Team_Split) == 0) {
    Team_Split <- xl 
  } else {
    Team_Split <- rbind(Team_Split, xl)
  }
}

print(Team_Split)

Looping through list of data frames in R

> df1 <- data.frame("Row One"=x, "Row Two"=y)
> df2 <- data.frame("Row Two"=y,"Row One"=x)
> dfList <- list(df1,df2)
> lapply(dfList, function(x) {
                    names(x)[ grep("One", names(x))] <- "R1"
                    names(x)[ grep("Two", names(x))] <- "R2"
                    x} )
[[1]]
  R1 R2
1  1  1
2  2  2
3  3  3
4  4  4
5  5  5

[[2]]
  R2 R1
1  1  1
2  2  2
3  3  3
4  4  4
5  5  5

How do I create multiple dataframes from a result in a for loop in R?

Don't do it in a loop !! It is done completely different. I'll show you step by step.
My first step is to prepare a function that will generate data similar to yours.

library(tidyverse)

dens = function(year, n) tibble(
  PLOT = paste("HI", sample(1:(n/7), n, replace = T)),
  SIZE = runif(n, 0.1, 3), 
  DENSITY = sample(seq(50,200, by=50), n, replace = T),
  SEEDYR = year-1,
  SAMPYR = year,
  AGE = sample(1:5, n, replace = T),
  SHOOTS = runif(n, 0.1, 3)
)

Let's see how it works and generate some sample data frames

set.seed(123)
density.2007 = dens(2007, 120)
density.2008 = dens(2008, 88)
density.2009 = dens(2009, 135)
density.2010 = dens(2010, 156)

The density.2007 data frame looks like this

# A tibble: 120 x 7
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>
 1 HI 15 1.67      200   2006   2007     4  1.80 
 2 HI 14 0.270     150   2006   2007     2  2.44 
 3 HI 3  0.856      50   2006   2007     3  0.686
 4 HI 10 1.25      200   2006   2007     5  1.43 
 5 HI 11 0.673      50   2006   2007     5  1.40 
 6 HI 5  2.51      150   2006   2007     3  2.23 
 7 HI 14 0.543     150   2006   2007     2  2.17 
 8 HI 5  2.43      200   2006   2007     5  2.51 
 9 HI 9  1.69      100   2006   2007     4  2.67 
10 HI 3  2.02       50   2006   2007     2  2.86 
# ... with 110 more rows

Now they need to be combined into one frame

df = density.2007 %>% 
  bind_rows(density.2008) %>% 
  bind_rows(density.2009) %>% 
  bind_rows(density.2010)

output

# A tibble: 499 x 7
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>
 1 HI 15 1.67      200   2006   2007     4  1.80 
 2 HI 14 0.270     150   2006   2007     2  2.44 
 3 HI 3  0.856      50   2006   2007     3  0.686
 4 HI 10 1.25      200   2006   2007     5  1.43 
 5 HI 11 0.673      50   2006   2007     5  1.40 
 6 HI 5  2.51      150   2006   2007     3  2.23 
 7 HI 14 0.543     150   2006   2007     2  2.17 
 8 HI 5  2.43      200   2006   2007     5  2.51 
 9 HI 9  1.69      100   2006   2007     4  2.67 
10 HI 3  2.02       50   2006   2007     2  2.86 
# ... with 489 more rows

In the next step, count how many times each value of the PLOT variable occurs

PLOT.count = df %>% 
  group_by(PLOT) %>% 
  summarise(PLOT.n = n()) %>% 
  arrange(PLOT.n)

ouptut

# A tibble: 22 x 2
   PLOT  PLOT.n
   <chr>  <int>
 1 HI 20      3
 2 HI 22      5
 3 HI 21      7
 4 HI 18     12
 5 HI 2      19
 6 HI 1      20
 7 HI 15     20
 8 HI 17     21
 9 HI 6      22
10 HI 11     23
# ... with 12 more rows

In the penultimate step, let's append these counters to the original data frame

df = df %>% left_join(PLOT.count, by="PLOT")

output

# A tibble: 499 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 15 1.67      200   2006   2007     4  1.80      20
 2 HI 14 0.270     150   2006   2007     2  2.44      32
 3 HI 3  0.856      50   2006   2007     3  0.686     27
 4 HI 10 1.25      200   2006   2007     5  1.43      25
 5 HI 11 0.673      50   2006   2007     5  1.40      23
 6 HI 5  2.51      150   2006   2007     3  2.23      38
 7 HI 14 0.543     150   2006   2007     2  2.17      32
 8 HI 5  2.43      200   2006   2007     5  2.51      38
 9 HI 9  1.69      100   2006   2007     4  2.67      26
10 HI 3  2.02       50   2006   2007     2  2.86      27
# ... with 489 more rows

Now filter it at will

df %>% filter(PLOT.n > 30)

ouptut

# A tibble: 139 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 14 0.270     150   2006   2007     2  2.44      32
 2 HI 5  2.51      150   2006   2007     3  2.23      38
 3 HI 14 0.543     150   2006   2007     2  2.17      32
 4 HI 5  2.43      200   2006   2007     5  2.51      38
 5 HI 8  0.598      50   2006   2007     1  1.70      34
 6 HI 7  1.94       50   2006   2007     4  1.61      35
 7 HI 14 2.91       50   2006   2007     4  0.215     32
 8 HI 7  0.846     150   2006   2007     4  0.506     35
 9 HI 7  2.38      150   2006   2007     3  1.34      35
10 HI 7  2.62      100   2006   2007     3  0.167     35
# ... with 129 more rows

Or this way

df %>% filter(PLOT.n == min(PLOT.n))
df %>% filter(PLOT.n == median(PLOT.n))
df %>% filter(PLOT.n == max(PLOT.n))

output

# A tibble: 3 x 8
  PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
  <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
1 HI 20 0.392     200   2009   2010     1  0.512      3
2 HI 20 0.859     150   2009   2010     5  2.62       3
3 HI 20 0.882     200   2009   2010     5  1.06       3
> df %>% filter(PLOT.n == median(PLOT.n))
# A tibble: 26 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 9  1.69      100   2006   2007     4  2.67      26
 2 HI 9  2.20       50   2006   2007     4  1.49      26
 3 HI 9  0.587     200   2006   2007     3  1.13      26
 4 HI 9  1.27       50   2006   2007     1  2.55      26
 5 HI 9  1.56      150   2006   2007     3  2.01      26
 6 HI 9  0.198     100   2006   2007     3  2.08      26
 7 HI 9  2.72      150   2007   2008     3  0.421     26
 8 HI 9  0.251     200   2007   2008     2  0.328     26
 9 HI 9  1.83       50   2007   2008     1  0.192     26
10 HI 9  1.97      100   2007   2008     1  0.900     26
# ... with 16 more rows
> df %>% filter(PLOT.n == max(PLOT.n))
# A tibble: 38 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 5  2.51      150   2006   2007     3   2.23     38
 2 HI 5  2.43      200   2006   2007     5   2.51     38
 3 HI 5  2.06      100   2006   2007     5   1.93     38
 4 HI 5  1.25      150   2006   2007     4   2.29     38
 5 HI 5  2.29      200   2006   2007     1   2.97     38
 6 HI 5  0.789     150   2006   2007     2   1.59     38
 7 HI 5  1.11      100   2007   2008     4   2.61     38
 8 HI 5  2.38      150   2007   2008     4   2.95     38
 9 HI 5  2.67      200   2007   2008     3   1.77     38
10 HI 5  2.63      100   2007   2008     1   1.90     38
# ... with 28 more rows

R for loop: creating data frames using split?

It is not recommended to create separate dataframes in the global environment, they are difficult to keep track of. Put them in a list instead. You have started off well by using split and creating list of dataframes. You can then iterate over each dataframe in the list and apply the function on each one of them.

Using by this would look like as :

by(tss, tss$created_at, function(x) {
  bscore3 <- score.sentiment(x$cleaned_text,pos.words,neg.words,.progress='text')
  score3 <- as.integer(bscore3$score[[1]])
  return(score3)
}) -> result

result

Usage of 'for loop' in R to split a dataframe into several dataframes

An easy way to do this is to create a factor vector by appending the string sys to the id numbers, and using it to split the data. There is no need to use a for() loop to produce the desired output, since the result of split() is a list of data frames when the input to be split is a data frame.

The value of the factor is used to name each element in the list generated by split(). In the case of the OP, since sysid is numeric and starts with 1, it's not obvious that the id numbers are being used to name the resulting data frames in the list, as explained in the help for split().

Using the data from the OP we'll illustrate how to use the sysid column to create a factor variable that combines the string sys with the id values, and split it into a list of data frames that can be accessed by name.

rawData <- "Date      sysid   power   temperature
 1.1.2018    1     1000       14
 2.1.2018    1     1200       16
 3.1.2018    1      800       18
 1.1.2018    2     1500        8
 2.1.2018    2      800       18
 3.1.2018    2     1300       11"

data <- read.table(text = rawData,header=TRUE)
sysidName <- paste0("sys",data$sysid)

splitData <- split(data,sysidName)

splitData

...and the output:

> splitData
$`sys1`
      Date sysid power temperature
1 1.1.2018     1  1000          14
2 2.1.2018     1  1200          16
3 3.1.2018     1   800          18

$sys2
      Date sysid power temperature
4 1.1.2018     2  1500           8
5 2.1.2018     2   800          18
6 3.1.2018     2  1300          11

>

At this point one can access individual data frames in the list by using the $ form of the extract operator:

> splitData$sys1
      Date sysid power temperature sysidName
1 1.1.2018     1  1000          14      sys1
2 2.1.2018     1  1200          16      sys1
3 3.1.2018     1   800          18      sys1
>

Also, by using the names() function one can obtain a vector of all the named elements in the list of data frames.

> names(splitData)
[1] "sys1" "sys2"
>

Reiterating the main point from the top of the answer, when split() is used with a data frame, the resulting list is a list of objects of type data.frame(). For example:

> str(splitData["sys1"])
List of 1
 $ sys1:'data.frame':   3 obs. of  4 variables:
  ..$ Date       : Factor w/ 3 levels "1.1.2018","2.1.2018",..: 1 2 3
  ..$ sysid      : int [1:3] 1 1 1
  ..$ power      : int [1:3] 1000 1200 800
  ..$ temperature: int [1:3] 14 16 18
>

If you must use a `for()` loop...

Since the OP asked whether the problem could be solved with a for() loop, the answer is "yes."

# create a vector containing unique values of sysid
ids <- unique(data$sysid)
# initialize output data frame list 
dfList <- list() 
# loop thru unique values and generate named data frames in list() 
for(i in ids){
     dfname <- paste0("sys",i)
     dfList[[dfname]] <- data[data$sysid == i,]
}
dfList

...and the output:

> for(i in ids){
+      dfname <- paste0("sys",i)
+      dfList[[dfname]] <- data[data$sysid == i,]
+ }
> dfList
$`sys1`
      Date sysid power temperature
1 1.1.2018     1  1000          14
2 2.1.2018     1  1200          16
3 3.1.2018     1   800          18

$sys2
      Date sysid power temperature
4 1.1.2018     2  1500           8
5 2.1.2018     2   800          18
6 3.1.2018     2  1300          11

Choosing the "best" answer

Between split(), for() and the other answer using by(), how do we choose the best answer?

One way is to determine which version runs fastest, given that the real data will be much larger than the sample data from the original post.

We can use the microbenchmark package to compare the performance of the three different approaches.

`split()` performance

library(microbenchmark)
> microbenchmark(splitData <- split(data,sysidName),unit="us")
Unit: microseconds
                                expr     min      lq     mean   median       uq     max neval
 splitData <- split(data, sysidName) 144.594 147.359 185.7987 150.1245 170.4705 615.507   100
>

`for()` performance

> microbenchmark(for(i in ids){
+      dfname <- paste0("sys",i)
+      dfList[[dfname]] <- data[data$sysid == i,]
+ },unit="us")
Unit: microseconds
                                                                                              expr      min       lq     mean
 for (i in ids) {     dfname <- paste0("sys", i)     dfList[[dfname]] <- data[data$sysid == i, ] } 2643.755 2857.286 3457.642
   median       uq      max neval
 3099.064 3479.311 8511.609   100
>

`by()` performance

> microbenchmark(df_list <- by(df, df$sysid, function(unique) unique),unit="us")
Unit: microseconds
                                                 expr     min       lq     mean   median       uq      max neval
 df_list <- by(df, df$sysid, function(unique) unique) 256.791 260.5445 304.9296 275.9515 309.5325 1218.372   100
>

...and the winner is:

split(), with an average runtime of 186 microseconds, versus 305 microseconds for by() and a whopping 3,458 microseconds for the for() loop approach.

How to Loop Through List and Create Separate Dataframes in R