Inserting a new row in pandas dataframe
So, given the following toy dataframe:
import pandas as pd
df = pd.DataFrame(
{
"Prec": {
"2010-01-01": 0.585135,
"2012-12-30": 0.100535,
},
"Tmax": {
"2010-01-01": 3.901162,
"2012-12-30": -3.498832,
},
"Tmin": {
"2010-01-01": -2.057929,
"2012-12-30": -8.125136,
},
"Tmean": {
"2010-01-01": 0.921617,
"2012-12-30": -5.811984,
},
}
)
You can do it like this:
df.index = pd.to_datetime(df.index)
new_row = df.copy()[df.index == "2012-12-30"]
new_row.index = new_row.index + pd.Timedelta(days=1)
df = pd.concat([df, new_row]).sort_index(ignore_index=True)
print(df)
# Output
Prec Tmax Tmin Tmean
2010-01-01 0.585135 3.901162 -2.057929 0.921617
2012-12-30 0.100535 -3.498832 -8.125136 -5.811984
2012-12-31 0.100535 -3.498832 -8.125136 -5.811984
Pandas: Conditionally insert rows into DataFrame while iterating through rows in the middle
You are trying to add a row beyond the dataframe's scope (size/capacity, so to say). You can get the the size of dataframe by using dataframe.shape()
.
If you need to, you would have to extend the index of the dataframe using set_index()
when you try to add a row at the end/after the last row. This should solve your issue.
You could also use dataframe.append()
to add new rows.
Another possible solution would be to use integer slicing with iloc
. iloc
doesn't give an error with slicing, but again, going beyond the scope of the dataframe would still be an issue you have to fix before appending anyhting.
pandas: insert a row after a row where the column contains a specific value
Use concat
by helper DataFrame filtered by None
or misisng values by Series.isna
, set values in columns in DataFrame.assign
and then sort index by DataFrame.sort_index
with created default indices:
df = (pd.concat([df, df[df.label.isna()].assign(text='new_val',label='new_val')])
.sort_index()
.reset_index(drop=True))
print (df)
text label
0 open O
1 the B
2 door D
3 val None
4 new_val new_val
5 close C
6 the E
7 door N
8 val None
9 new_val new_val
Insert empty row after every Nth row in pandas dataframe
The following should scale well with the size of the DataFrame since it doesn't iterate over the rows and doesn't create intermediate DataFrames.
import pandas as pd
df = pd.DataFrame(columns=['a','b'],data=[[3,4],
[5,5],[9,3],[1,2],[9,9],[6,5],[6,5],[6,5],[6,5],
[6,5],[6,5],[6,5],[6,5],[6,5],[6,5],[6,5],[6,5]])
def add_empty_rows(df, n_empty, period):
""" adds 'n_empty' empty rows every 'period' rows to 'df'.
Returns a new DataFrame. """
# to make sure that the DataFrame index is a RangeIndex(start=0, stop=len(df))
# and that the original df object is not mutated.
df = df.reset_index(drop=True)
# length of the new DataFrame containing the NaN rows
len_new_index = len(df) + n_empty*(len(df) // period)
# index of the new DataFrame
new_index = pd.RangeIndex(len_new_index)
# add an offset (= number of NaN rows up to that row)
# to the current df.index to align with new_index.
df.index += n_empty * (df.index
.to_series()
.groupby(df.index // period)
.ngroup())
# reindex by aligning df.index with new_index.
# Values of new_index not present in df.index are filled with NaN.
new_df = df.reindex(new_index)
return new_df
Tests:
# original df
>>> df
a b
0 3 4
1 5 5
2 9 3
3 1 2
4 9 9
5 6 5
6 6 5
7 6 5
8 6 5
9 6 5
10 6 5
11 6 5
12 6 5
13 6 5
14 6 5
15 6 5
16 6 5
# add 2 empty rows every 3 rows
>>> add_empty_rows(df, 2, 3)
a b
0 3.0 4.0
1 5.0 5.0
2 9.0 3.0
3 NaN NaN
4 NaN NaN
5 1.0 2.0
6 9.0 9.0
7 6.0 5.0
8 NaN NaN
9 NaN NaN
10 6.0 5.0
11 6.0 5.0
12 6.0 5.0
13 NaN NaN
14 NaN NaN
15 6.0 5.0
16 6.0 5.0
17 6.0 5.0
18 NaN NaN
19 NaN NaN
20 6.0 5.0
21 6.0 5.0
22 6.0 5.0
23 NaN NaN
24 NaN NaN
25 6.0 5.0
26 6.0 5.0
# add 5 empty rows every 4 rows
>>> add_empty_rows(df, 5, 4)
a b
0 3.0 4.0
1 5.0 5.0
2 9.0 3.0
3 1.0 2.0
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 9.0 9.0
10 6.0 5.0
11 6.0 5.0
12 6.0 5.0
13 NaN NaN
14 NaN NaN
15 NaN NaN
16 NaN NaN
17 NaN NaN
18 6.0 5.0
19 6.0 5.0
20 6.0 5.0
21 6.0 5.0
22 NaN NaN
23 NaN NaN
24 NaN NaN
25 NaN NaN
26 NaN NaN
27 6.0 5.0
28 6.0 5.0
29 6.0 5.0
30 6.0 5.0
31 NaN NaN
32 NaN NaN
33 NaN NaN
34 NaN NaN
35 NaN NaN
36 6.0 5.0
Pandas: Inserting rows based on conditions?
You aren't changing your dataframe in the function at all. You are simply printing some statements. You don't really need a custom function for what you want to do.
Try:
melt
the dataframe to create the required structure.- Filter to keep rows where the value is greater than 0.
- Re-format the "product" column as required (removing the "_count").
melted = df.melt(["id", "status", "type", "location"],
["bb_count","vo_count","tv_count"],
var_name="product")
output = melted[melted["value"].gt(0)].drop("value",axis=1)
output["product"] = output["product"].str.replace("_count","")
.replace({"bb": "broadband",
"vo":"fixedvoice",
"tv":"television"})
>>> output
id status type location product
0 123 open r hongkong broadband
1 456 open r hongkong broadband
4 456 open r hongkong fixedvoice
5 456 closed p India fixedvoice
6 123 open r hongkong television
7 456 open r hongkong television
8 456 closed p India television
Related Topics
Relative Imports - Modulenotfounderror: No Module Named X
Most Recent Previous Business Day in Python
Seaborn Is Not Plotting Within Defined Subplots
Setting Camera Parameters in Opencv/Python
How to Pass Arguments in Pytest by Command Line
What Does Numpy.Random.Seed(0) Do
Windows Is Not Passing Command Line Arguments to Python Programs Executed from the Shell
Moving Matplotlib Legend Outside of the Axis Makes It Cutoff by the Figure Box
Windows Cmd Encoding Change Causes Python Crash
List VS Generator Comprehension Speed with Join Function
Most Efficient Way of Making an If-Elif-Elif-Else Statement When the Else Is Done the Most
Run a .Bat File Using Python Code
Normalize Columns of a Dataframe