Create Multiple Dataframes in Loop

Create multiple dataframes in loop

You can do this (although obviously use exec with extreme caution if this is going to be public-facing code)

for c in companies:
     exec('{} = pd.DataFrame()'.format(c))

Create multiple dataframes inside a for loop - pandas

bases = [base1, base1_a, base1_b]
for base in bases:
    base = base.groupby(["Cliente_id", "Periodo_id"]).agg({"monto_2018": ["sum", "mean", "min", "max"]}).reset_index().round(1)
    base.columns = base.columns.get_level_values(1)
    base = base.set_axis(["Cliente_id", "Periodo_id", 'sum_v', 'mean_v', 'min_v', "max_v"], axis=1, inplace=False)

When you assign base = base.groupby() in the loop, you are reassigning the variable base to refer to a new dataframe which is the result of the groupby() action. This does NOT modify the original bases list.

To get a list of your new data frames, you should just create a new list:

bases = [base1, base1_a, base1_b]
results = [] # -----------------------------> create an empty list
for base in bases:
    base = base.groupby(["Cliente_id", "Periodo_id"]).agg({"monto_2018": ["sum", "mean", "min", "max"]}).reset_index().round(1)
    base.columns = base.columns.get_level_values(1)
    base = base.set_axis(["Cliente_id", "Periodo_id", 'sum_v', 'mean_v', 'min_v', "max_v"], axis=1, inplace=False)
    results.append(base) # ------------------> Add dataframe to the new list

I chose the name results here as a generic name. I strongly encourage you to use a different name that is more descriptive for what you are doing.

Create Multiple Dataframes using Loop & function

IIUC, I was able to achieve what you wanted.

import pandas as pd
import numpy as np

# source data for the dataframe
data = {
"ID":["x","y","z","x","y","z","x","y","a","b","x"],
"Date":["May 01","May 02","May 04","May 01","May 01","May 02","May 01","May 05","May 06","May 08","May 10"],
"Amount":[10,20,30,40,50,60,70,80,90,100,110]
}

df = pd.DataFrame(data)

# convert the Date column to datetime and still maintain the format like "May 01"
df['Date'] = pd.to_datetime(df['Date'], format='%b %d').dt.strftime('%b %d')

# sort the values on ID and Date
df.sort_values(by=['ID', 'Date'], inplace=True)
df.reset_index(inplace=True, drop=True)

print(df)

Original Dataframe:

    Amount    Date ID
0       90  May 06  a
1      100  May 08  b
2       10  May 01  x
3       40  May 01  x
4       70  May 01  x
5      110  May 10  x
6       50  May 01  y
7       20  May 02  y
8       80  May 05  y
9       60  May 02  z
10      30  May 04  z

# create a list of unique ids
list_id = sorted(list(set(df['ID'])))

# create an empty list that would contain dataframes
df_list = []

# count of iterations that must be seperated out
# for example if we want to record 3 entries for 
# each id, the iter would be 3. This will create
# three new dataframes that will hold transactions
# respectively. 
iter = 3
for i in range(iter):
    df_list.append(pd.DataFrame())

for val in list_id:
    tmp_df = df.loc[df['ID'] == val].reset_index(drop=True)

    # consider only the top iter(=3) values to be distributed
    counter = np.minimum(tmp_df.shape[0], iter)
    for idx in range(counter):
        df_list[idx] = df_list[idx].append(tmp_df.loc[tmp_df.index == idx])

for df in df_list:
    df.reset_index(drop=True, inplace=True)
    print(df)

Transaction #1:

   Amount    Date ID
0      90  May 06  a
1     100  May 08  b
2      10  May 01  x
3      50  May 01  y
4      60  May 02  z

Transaction #2:

   Amount    Date ID
0      40  May 01  x
1      20  May 02  y
2      30  May 04  z

Transaction #3:

   Amount    Date ID
0      70  May 01  x
1      80  May 05  y

Note that in your data, there are four transactions for 'x'. If lets say you wanted to track the 4th iterative transaction as well. All you need to do is change the value if 'iter' to 4 and you will get the fourth dataframe as well with the following value:

   Amount    Date ID
0     110  May 10  x

Creating multiple dataframes using for loop with pandas

Use a container to store your dataframes, like a dictionary:

my_dfs = {}
for x in ['A', 'B', 'C']:
    my_dfs[x] = pd.read_csv(r"C:\HSTS\OB\ODO\%s\test.csv" % x, delimiter=';')

Then access the dataframes per key:

my_dfs['A']

It is a much better practice than having many variables floating around. You can then easily access your dataframes programmatically for downstream processing.

Create multiple dataframes with a loop in Python

Here is one approach based on the code given. You should refrain from using it in practice, as it contains redundant code, which makes it hard to maintain. You'll find a more flexible approach below.

Based on your solution

import investpy
import pandas as pd

def _get_asset_data(ticker, country, state=False):
    return investpy.stocks.get_stock_recent_data(ticker, country, state)

df_1 = _get_asset_data('Eco','Colombia')
df_2 = _get_asset_data('JPM','United States')
df_3 = _get_asset_data('TSM','United States')
df_5 = _get_asset_data('CSCO','United States')
df_8 = _get_asset_data('NVDA','United States')
df_9 = _get_asset_data('BLK','United States')

final = pd.concat([df_1, df_2, df_3, df_5, df_8, df_9], axis=1)
final

More versatile solution:

import investpy
import pandas as pd

def _get_asset_data(ticker, country, state=False):
    return investpy.stocks.get_stock_recent_data(ticker, country, state)

stocks = [
    ('Eco', 'Colombia'),
    ('JPM', 'United States'),
    ('TSM', 'United States'),
    ('CSCO', 'United States'),
    ('NVDA', 'United States'),
    ('BLK', 'United States'),
    ]

results = []

for stock in stocks:
    result = _get_asset_data(*stock)
    results.append(result)

final = pd.concat(results, axis=1)
final

Creating multiple dataframes with a loop

I think you think your code is doing something that it is not actually doing.

Specifically, this line: df = pd.read_csv(file)

You might think that in each iteration through the for loop this line is being executed and modified with df being replaced with a string in dfs and file being replaced with a filename in files. While the latter is true, the former is not.

Each iteration through the for loop is reading a csv file and storing it in the variable df effectively overwriting the csv file that was read in during the previous for loop. In other words, df in your for loop is not being replaced with the variable names you defined in dfs.

The key takeaway here is that strings (e.g., 'df1', 'df2', etc.) cannot be substituted and used as variable names when executing code.

One way to achieve the result you want is store each csv file read by pd.read_csv() in a dictionary, where the key is name of the dataframe (e.g., 'df1', 'df2', etc.) and value is the dataframe returned by pd.read_csv().

list_of_dfs = {}
for df, file in zip(dfs, files):
    list_of_dfs[df] = pd.read_csv(file)
    print(list_of_dfs[df].shape)
    print(list_of_dfs[df].dtypes)
    print(list(list_of_dfs[df]))

You can then reference each of your dataframes like this:

print(list_of_dfs['df1'])
print(list_of_dfs['df2'])

You can learn more about dictionaries here:

https://docs.python.org/3.6/tutorial/datastructures.html#dictionaries

How do I create multiple dataframes from a result in a for loop in R?

Don't do it in a loop !! It is done completely different. I'll show you step by step.
My first step is to prepare a function that will generate data similar to yours.

library(tidyverse)

dens = function(year, n) tibble(
  PLOT = paste("HI", sample(1:(n/7), n, replace = T)),
  SIZE = runif(n, 0.1, 3), 
  DENSITY = sample(seq(50,200, by=50), n, replace = T),
  SEEDYR = year-1,
  SAMPYR = year,
  AGE = sample(1:5, n, replace = T),
  SHOOTS = runif(n, 0.1, 3)
)

Let's see how it works and generate some sample data frames

set.seed(123)
density.2007 = dens(2007, 120)
density.2008 = dens(2008, 88)
density.2009 = dens(2009, 135)
density.2010 = dens(2010, 156)

The density.2007 data frame looks like this

# A tibble: 120 x 7
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>
 1 HI 15 1.67      200   2006   2007     4  1.80 
 2 HI 14 0.270     150   2006   2007     2  2.44 
 3 HI 3  0.856      50   2006   2007     3  0.686
 4 HI 10 1.25      200   2006   2007     5  1.43 
 5 HI 11 0.673      50   2006   2007     5  1.40 
 6 HI 5  2.51      150   2006   2007     3  2.23 
 7 HI 14 0.543     150   2006   2007     2  2.17 
 8 HI 5  2.43      200   2006   2007     5  2.51 
 9 HI 9  1.69      100   2006   2007     4  2.67 
10 HI 3  2.02       50   2006   2007     2  2.86 
# ... with 110 more rows

Now they need to be combined into one frame

df = density.2007 %>% 
  bind_rows(density.2008) %>% 
  bind_rows(density.2009) %>% 
  bind_rows(density.2010)

output

# A tibble: 499 x 7
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>
 1 HI 15 1.67      200   2006   2007     4  1.80 
 2 HI 14 0.270     150   2006   2007     2  2.44 
 3 HI 3  0.856      50   2006   2007     3  0.686
 4 HI 10 1.25      200   2006   2007     5  1.43 
 5 HI 11 0.673      50   2006   2007     5  1.40 
 6 HI 5  2.51      150   2006   2007     3  2.23 
 7 HI 14 0.543     150   2006   2007     2  2.17 
 8 HI 5  2.43      200   2006   2007     5  2.51 
 9 HI 9  1.69      100   2006   2007     4  2.67 
10 HI 3  2.02       50   2006   2007     2  2.86 
# ... with 489 more rows

In the next step, count how many times each value of the PLOT variable occurs

PLOT.count = df %>% 
  group_by(PLOT) %>% 
  summarise(PLOT.n = n()) %>% 
  arrange(PLOT.n)

ouptut

# A tibble: 22 x 2
   PLOT  PLOT.n
   <chr>  <int>
 1 HI 20      3
 2 HI 22      5
 3 HI 21      7
 4 HI 18     12
 5 HI 2      19
 6 HI 1      20
 7 HI 15     20
 8 HI 17     21
 9 HI 6      22
10 HI 11     23
# ... with 12 more rows

In the penultimate step, let's append these counters to the original data frame

df = df %>% left_join(PLOT.count, by="PLOT")

output

# A tibble: 499 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 15 1.67      200   2006   2007     4  1.80      20
 2 HI 14 0.270     150   2006   2007     2  2.44      32
 3 HI 3  0.856      50   2006   2007     3  0.686     27
 4 HI 10 1.25      200   2006   2007     5  1.43      25
 5 HI 11 0.673      50   2006   2007     5  1.40      23
 6 HI 5  2.51      150   2006   2007     3  2.23      38
 7 HI 14 0.543     150   2006   2007     2  2.17      32
 8 HI 5  2.43      200   2006   2007     5  2.51      38
 9 HI 9  1.69      100   2006   2007     4  2.67      26
10 HI 3  2.02       50   2006   2007     2  2.86      27
# ... with 489 more rows

Now filter it at will

df %>% filter(PLOT.n > 30)

ouptut

# A tibble: 139 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 14 0.270     150   2006   2007     2  2.44      32
 2 HI 5  2.51      150   2006   2007     3  2.23      38
 3 HI 14 0.543     150   2006   2007     2  2.17      32
 4 HI 5  2.43      200   2006   2007     5  2.51      38
 5 HI 8  0.598      50   2006   2007     1  1.70      34
 6 HI 7  1.94       50   2006   2007     4  1.61      35
 7 HI 14 2.91       50   2006   2007     4  0.215     32
 8 HI 7  0.846     150   2006   2007     4  0.506     35
 9 HI 7  2.38      150   2006   2007     3  1.34      35
10 HI 7  2.62      100   2006   2007     3  0.167     35
# ... with 129 more rows

Or this way

df %>% filter(PLOT.n == min(PLOT.n))
df %>% filter(PLOT.n == median(PLOT.n))
df %>% filter(PLOT.n == max(PLOT.n))

output

# A tibble: 3 x 8
  PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
  <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
1 HI 20 0.392     200   2009   2010     1  0.512      3
2 HI 20 0.859     150   2009   2010     5  2.62       3
3 HI 20 0.882     200   2009   2010     5  1.06       3
> df %>% filter(PLOT.n == median(PLOT.n))
# A tibble: 26 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 9  1.69      100   2006   2007     4  2.67      26
 2 HI 9  2.20       50   2006   2007     4  1.49      26
 3 HI 9  0.587     200   2006   2007     3  1.13      26
 4 HI 9  1.27       50   2006   2007     1  2.55      26
 5 HI 9  1.56      150   2006   2007     3  2.01      26
 6 HI 9  0.198     100   2006   2007     3  2.08      26
 7 HI 9  2.72      150   2007   2008     3  0.421     26
 8 HI 9  0.251     200   2007   2008     2  0.328     26
 9 HI 9  1.83       50   2007   2008     1  0.192     26
10 HI 9  1.97      100   2007   2008     1  0.900     26
# ... with 16 more rows
> df %>% filter(PLOT.n == max(PLOT.n))
# A tibble: 38 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 5  2.51      150   2006   2007     3   2.23     38
 2 HI 5  2.43      200   2006   2007     5   2.51     38
 3 HI 5  2.06      100   2006   2007     5   1.93     38
 4 HI 5  1.25      150   2006   2007     4   2.29     38
 5 HI 5  2.29      200   2006   2007     1   2.97     38
 6 HI 5  0.789     150   2006   2007     2   1.59     38
 7 HI 5  1.11      100   2007   2008     4   2.61     38
 8 HI 5  2.38      150   2007   2008     4   2.95     38
 9 HI 5  2.67      200   2007   2008     3   1.77     38
10 HI 5  2.63      100   2007   2008     1   1.90     38
# ... with 28 more rows

Create Multiple Dataframes in Loop