Create a Pandas Dataframe by Appending One Row At a Time

Create a Pandas Dataframe by appending one row at a time

You can use df.loc[i], where the row with index i will be what you specify it to be in the dataframe.

>>> import pandas as pd
>>> from numpy.random import randint

>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])
>>> for i in range(5):
>>>     df.loc[i] = ['name' + str(i)] + list(randint(10, size=2))

>>> df
     lib qty1 qty2
0  name0    3    3
1  name1    2    4
2  name2    2    8
3  name3    2    1
4  name4    9    6

Appending row to dataframe with concat()

You can transform you dic in pandas DataFrame

import pandas as pd
df = pd.DataFrame(columns=['Name', 'Weight', 'Sample'])
for key in my_dict:
  ...
  #transform your dic in DataFrame
  new_df = pd.DataFrame([row])
  df = pd.concat([df, new_df], axis=0, ignore_index=True)

How do I append all the rows in my pandas dataframe into one big row?

Pandas' unstack function will achieve this. This will provide a multi-index for the column names; you can use to_flat_index if you only want a single layer index:

df = pd.DataFrame(columns = ['index', 'ticker', 'date', 'time', 'vol', 'open', 'close', 'high', 'low'], 
            data = [
                [0, 'AAPL', '2022-01-06', '09:00', 121611, 174.78, 174.00, 175.08, 173.76],
                [1, 'AAPL', '2022-01-06', '10:00', 83471, 174.11, 173.89, 174.64, 173.88],
                [2, 'AAPL', '2022-01-06', '11:00', 76327, 173.99, 173.55, 174.25, 173.16],
                [3, 'AAPL', '2022-01-06', '12:00', 83471, 174.11, 173.89, 174.64, 173.88],
            ]
            )

df.set_index(['ticker','date','time'])[['open','close']].unstack()

Ideally if you can post your questions with some minimum working code it makes it much easier to replicate :)

How to append rows in a pandas dataframe in a for loop?

Suppose your data looks like this:

import pandas as pd
import numpy as np

np.random.seed(2015)
df = pd.DataFrame([])
for i in range(5):
    data = dict(zip(np.random.choice(10, replace=False, size=5),
                    np.random.randint(10, size=5)))
    data = pd.DataFrame(data.items())
    data = data.transpose()
    data.columns = data.iloc[0]
    data = data.drop(data.index[[0]])
    df = df.append(data)
print('{}\n'.format(df))
# 0   0   1   2   3   4   5   6   7   8   9
# 1   6 NaN NaN   8   5 NaN NaN   7   0 NaN
# 1 NaN   9   6 NaN   2 NaN   1 NaN NaN   2
# 1 NaN   2   2   1   2 NaN   1 NaN NaN NaN
# 1   6 NaN   6 NaN   4   4   0 NaN NaN NaN
# 1 NaN   9 NaN   9 NaN   7   1   9 NaN NaN

Then it could be replaced with

np.random.seed(2015)
data = []
for i in range(5):
    data.append(dict(zip(np.random.choice(10, replace=False, size=5),
                         np.random.randint(10, size=5))))
df = pd.DataFrame(data)
print(df)

In other words, do not form a new DataFrame for each row. Instead, collect all the data in a list of dicts, and then call df = pd.DataFrame(data) once at the end, outside the loop.

Each call to df.append requires allocating space for a new DataFrame with one extra row, copying all the data from the original DataFrame into the new DataFrame, and then copying data into the new row. All that allocation and copying makes calling df.append in a loop very inefficient. The time cost of copying grows quadratically with the number of rows. Not only is the call-DataFrame-once code easier to write, its performance will be much better -- the time cost of copying grows linearly with the number of rows.