Create a Pandas Dataframe by appending one row at a time
You can use df.loc[i]
, where the row with index i
will be what you specify it to be in the dataframe.
>>> import pandas as pd
>>> from numpy.random import randint
>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])
>>> for i in range(5):
>>> df.loc[i] = ['name' + str(i)] + list(randint(10, size=2))
>>> df
lib qty1 qty2
0 name0 3 3
1 name1 2 4
2 name2 2 8
3 name3 2 1
4 name4 9 6
Appending row to dataframe with concat()
You can transform you dic in pandas DataFrame
import pandas as pd
df = pd.DataFrame(columns=['Name', 'Weight', 'Sample'])
for key in my_dict:
...
#transform your dic in DataFrame
new_df = pd.DataFrame([row])
df = pd.concat([df, new_df], axis=0, ignore_index=True)
How do I append all the rows in my pandas dataframe into one big row?
Pandas' unstack
function will achieve this. This will provide a multi-index for the column names; you can use to_flat_index
if you only want a single layer index:
df = pd.DataFrame(columns = ['index', 'ticker', 'date', 'time', 'vol', 'open', 'close', 'high', 'low'],
data = [
[0, 'AAPL', '2022-01-06', '09:00', 121611, 174.78, 174.00, 175.08, 173.76],
[1, 'AAPL', '2022-01-06', '10:00', 83471, 174.11, 173.89, 174.64, 173.88],
[2, 'AAPL', '2022-01-06', '11:00', 76327, 173.99, 173.55, 174.25, 173.16],
[3, 'AAPL', '2022-01-06', '12:00', 83471, 174.11, 173.89, 174.64, 173.88],
]
)
df.set_index(['ticker','date','time'])[['open','close']].unstack()
Ideally if you can post your questions with some minimum working code it makes it much easier to replicate :)
How to append rows in a pandas dataframe in a for loop?
Suppose your data looks like this:
import pandas as pd
import numpy as np
np.random.seed(2015)
df = pd.DataFrame([])
for i in range(5):
data = dict(zip(np.random.choice(10, replace=False, size=5),
np.random.randint(10, size=5)))
data = pd.DataFrame(data.items())
data = data.transpose()
data.columns = data.iloc[0]
data = data.drop(data.index[[0]])
df = df.append(data)
print('{}\n'.format(df))
# 0 0 1 2 3 4 5 6 7 8 9
# 1 6 NaN NaN 8 5 NaN NaN 7 0 NaN
# 1 NaN 9 6 NaN 2 NaN 1 NaN NaN 2
# 1 NaN 2 2 1 2 NaN 1 NaN NaN NaN
# 1 6 NaN 6 NaN 4 4 0 NaN NaN NaN
# 1 NaN 9 NaN 9 NaN 7 1 9 NaN NaN
Then it could be replaced with
np.random.seed(2015)
data = []
for i in range(5):
data.append(dict(zip(np.random.choice(10, replace=False, size=5),
np.random.randint(10, size=5))))
df = pd.DataFrame(data)
print(df)
In other words, do not form a new DataFrame for each row. Instead, collect all the data in a list of dicts, and then call df = pd.DataFrame(data)
once at the end, outside the loop.
Each call to df.append
requires allocating space for a new DataFrame with one extra row, copying all the data from the original DataFrame into the new DataFrame, and then copying data into the new row. All that allocation and copying makes calling df.append
in a loop very inefficient. The time cost of copying grows quadratically with the number of rows. Not only is the call-DataFrame-once code easier to write, its performance will be much better -- the time cost of copying grows linearly with the number of rows.
Create a Pandas Dataframe by appending one row at a time
You can use df.loc[i]
, where the row with index i
will be what you specify it to be in the dataframe.
>>> import pandas as pd
>>> from numpy.random import randint
>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])
>>> for i in range(5):
>>> df.loc[i] = ['name' + str(i)] + list(randint(10, size=2))
>>> df
lib qty1 qty2
0 name0 3 3
1 name1 2 4
2 name2 2 8
3 name3 2 1
4 name4 9 6
How to append rows with concat to a Pandas DataFrame
Create a dataframe then concat
:
insert_row = {
"Date": '2022-03-20',
"Index": 1,
"Change": -2,
}
df = pd.concat([df, pd.DataFrame([insert_row])])
print(df)
# Output
Date Index Change
0 2022-03-20 1.0 -2.0
Efficiency of pandas dataframe append
DataFrame append is slow since it effectively means creating an entirely new DataFrame from scratch.
If you just wanted to optimize the code above, you could append all your rows to a list rather than DataFrame (since appending to list is fast) then create the DataFrame outside the loop - passing the list of data.
Similarly if you need to combine many DataFrames, it's fastest to do via a single call to pd.concat rather than many calls to DataFrame.append.
Append rows to a dataframe efficiently
I think itterows
here is not necessary, you can use:
def f(x):
x['Timestamp'] = ...
....
return x
df1 = df.groupby('Timestamp').apply(f)
EDIT: Create counter Series
by GroupBy.cumcount
, multiple and add to Timestamp
:
#if necessary
df['Timestamp'] = df['Timestamp'].astype(np.int64)
df['Timestamp'] = df['Timestamp'] * 1000 + df.groupby('Timestamp').cumcount() * 30
print(df)
Timestamp value
0 1642847484000 11
1 1642847484030 10
2 1642847484060 14
3 1642847484090 20
4 1642847487000 3
5 1642847487030 2
6 1642847487060 9
7 1642847487090 48
8 1642847487120 5
9 1642847487150 20
10 1642847487180 12
11 1642847487210 20
12 1642847489000 56
13 1642847489030 12
14 1642847489060 8
Python: Appending a row into all rows in a dataframe
You can use:
dff[dff2.columns] = dff2.squeeze()
print(dff)
# Output
WA WB WC stv_A stv_B stv_c
0 0.4 0.2 0.4 0.5 0.2 0.4
1 0.1 0.3 0.6 0.5 0.2 0.4
2 0.3 0.2 0.5 0.5 0.2 0.4
3 0.3 0.3 0.4 0.5 0.2 0.4
Related Topics
"Pip Install Unroll": "Python Setup.Py Egg_Info" Failed With Error Code 1
Add Scrolling to a Platformer in Pygame
How to Print a Single Backslash
What Is a Good Way to Draw Images Using Pygame
Using @Property Versus Getters and Setters
Saving Utf-8 Texts With Json.Dumps as Utf8, Not as \U Escape Sequence
How to Implement Nested Dictionaries
Getting a Map() to Return a List in Python 3.X
How to Step Through Python Code to Help Debug Issues
Is There a Standardized Method to Swap Two Variables in Python
How to Send a "Multipart/Form-Data" With Requests in Python
How to Check If a String Represents an Int, Without Using Try/Except