Pandas every nth row
I'd use iloc
, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:
df.iloc[::5, :]
How to take the average of every nth row of Pandas dataframe?
(Note: rewritten following clarifications).
out = (
df
.set_axis(
df.index.str.replace(r'_P\d', '', regex=True)
.set_names('group')
)
.groupby(['patient', 'time', 'group'])
.mean()
)
The replace
part on the index eliminates '_P{n}'
as requested. Further, patient
and time
are used as keys for the groupby
. The result (on your sample data) is:
>>> out
DNAJA1 DNAJA1P5 DNAJA2 \
patient time group
P1 0h 0h_T1_TimeC1_PIDC4_Non-Survivor 0.378392 -0.191457 0.222613
P2 0h 0h_T1_TimeC2_PIDC2_Survivor 0.246673 -0.223132 0.255885
P3 0h 0h_T1_TimeC1_PIDC1_Survivor 0.327021 -0.212385 0.266633
P4 0h 0h_T1_TimeC1_PIDC1_Survivor 0.282316 -0.182006 0.245088
P5 0h 0h_T1_TimeC4_PIDC3_Survivor 0.200201 -0.220322 0.217304
DNAJA3 DNAJA4 DNAJB1 \
patient time group
P1 0h 0h_T1_TimeC1_PIDC4_Non-Survivor 0.180402 0.454774 0.579397
P2 0h 0h_T1_TimeC2_PIDC2_Survivor 0.123849 0.205732 0.331627
P3 0h 0h_T1_TimeC1_PIDC1_Survivor 0.104913 0.234903 0.380246
P4 0h 0h_T1_TimeC1_PIDC1_Survivor 0.144778 0.274043 0.350569
P5 0h 0h_T1_TimeC4_PIDC3_Survivor 0.133803 0.259557 0.302817
(...)
Notes:
patient
andtime
are now additional levels in the index. If that is not desired, simply add.reset_index(['patient', 'time'], drop=False)
after.mean()
above.you could consider splitting your index into the parts that matter to you. A simple example (but you should add consistency tests instead of just dropping
time
andpatient
) could be:idxcols = 'time patient t tc pidc survivor'.split()
out = (
df
.set_axis(df.index.str.split('_', expand=True).set_names(idxcols))
.drop(['time', 'patient'], axis=1)
.groupby(idxcols)
.mean()
)
Pandas: Optimal subtract every nth row
Code
# Create boolean mask for matching rows
# m = np.arange(len(df)) % 6 == 5 # for index match
m = df['Samples'].str.contains(r'_BL\d+') # for regex match
# mask the values and backfill to propagate the row
# values corresponding to match in backward direction
df['var1'] = df['var1'] - df['var1'].mask(~m).bfill()
# Delete the matching rows
df = df[~m].copy()
Samples var1 var1
0 something -90.0 -80.0
1 something -80.0 -70.0
2 something -60.0 -70.0
4 something -50.0 60.0
5 something -10.0 90.0
Note: The core logic is specified in the code
so I'll leave the function implementation upto the OP.
Pandas expanding a dataframe's length and populating every nth row
Use Index.repeat
with DataFrame.loc
and then duplicated values set to 0
:
N = 3
df = df.loc[df.index.repeat(N)]
df['requests'] = df['requests'].mask(df.index.duplicated(), 0)
df = df.reset_index(drop=True)
print (df)
frame requests
0 0 214388438.0
1 0 0.0
2 0 0.0
3 1 194980303.0
4 1 0.0
5 1 0.0
6 2 179475934.0
7 2 0.0
8 2 0.0
9 3 165196540.0
10 3 0.0
11 3 0.0
12 4 154815540.0
13 4 0.0
14 4 0.0
15 5 123650671.0
16 5 0.0
17 5 0.0
18 6 119089045.0
19 6 0.0
20 6 0.0
Insert empty row after every Nth row in pandas dataframe
The following should scale well with the size of the DataFrame since it doesn't iterate over the rows and doesn't create intermediate DataFrames.
import pandas as pd
df = pd.DataFrame(columns=['a','b'],data=[[3,4],
[5,5],[9,3],[1,2],[9,9],[6,5],[6,5],[6,5],[6,5],
[6,5],[6,5],[6,5],[6,5],[6,5],[6,5],[6,5],[6,5]])
def add_empty_rows(df, n_empty, period):
""" adds 'n_empty' empty rows every 'period' rows to 'df'.
Returns a new DataFrame. """
# to make sure that the DataFrame index is a RangeIndex(start=0, stop=len(df))
# and that the original df object is not mutated.
df = df.reset_index(drop=True)
# length of the new DataFrame containing the NaN rows
len_new_index = len(df) + n_empty*(len(df) // period)
# index of the new DataFrame
new_index = pd.RangeIndex(len_new_index)
# add an offset (= number of NaN rows up to that row)
# to the current df.index to align with new_index.
df.index += n_empty * (df.index
.to_series()
.groupby(df.index // period)
.ngroup())
# reindex by aligning df.index with new_index.
# Values of new_index not present in df.index are filled with NaN.
new_df = df.reindex(new_index)
return new_df
Tests:
# original df
>>> df
a b
0 3 4
1 5 5
2 9 3
3 1 2
4 9 9
5 6 5
6 6 5
7 6 5
8 6 5
9 6 5
10 6 5
11 6 5
12 6 5
13 6 5
14 6 5
15 6 5
16 6 5
# add 2 empty rows every 3 rows
>>> add_empty_rows(df, 2, 3)
a b
0 3.0 4.0
1 5.0 5.0
2 9.0 3.0
3 NaN NaN
4 NaN NaN
5 1.0 2.0
6 9.0 9.0
7 6.0 5.0
8 NaN NaN
9 NaN NaN
10 6.0 5.0
11 6.0 5.0
12 6.0 5.0
13 NaN NaN
14 NaN NaN
15 6.0 5.0
16 6.0 5.0
17 6.0 5.0
18 NaN NaN
19 NaN NaN
20 6.0 5.0
21 6.0 5.0
22 6.0 5.0
23 NaN NaN
24 NaN NaN
25 6.0 5.0
26 6.0 5.0
# add 5 empty rows every 4 rows
>>> add_empty_rows(df, 5, 4)
a b
0 3.0 4.0
1 5.0 5.0
2 9.0 3.0
3 1.0 2.0
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 9.0 9.0
10 6.0 5.0
11 6.0 5.0
12 6.0 5.0
13 NaN NaN
14 NaN NaN
15 NaN NaN
16 NaN NaN
17 NaN NaN
18 6.0 5.0
19 6.0 5.0
20 6.0 5.0
21 6.0 5.0
22 NaN NaN
23 NaN NaN
24 NaN NaN
25 NaN NaN
26 NaN NaN
27 6.0 5.0
28 6.0 5.0
29 6.0 5.0
30 6.0 5.0
31 NaN NaN
32 NaN NaN
33 NaN NaN
34 NaN NaN
35 NaN NaN
36 6.0 5.0
How do you sample every nth row within a range in a pandas dataframe?
First, we can create a test dataframe:
from pandas import util
tdf= util.testing.makeDataFrame()
then, we can index it in the following way:
tdf[start_index:end_index:step_size]
so, getting every other row from index 10 to 20 would look like this:
tdf[10:20:2]
Slicing Pandas DataFrame every nth row
You can do it with a for
loop:
for i in range(round(len(df)/5)): #This ensures all rows are captured
df.loc[i*5:(i+1)*5,:].to_csv('Stored_files_'+str(i)+'.csv')
So the first iteration it'll be rows 0 to 5 stored with name "Stored_files_0.csv
The second iteration rows 5 to 10 with name "Stored_files_1.csv"
And so on...
Select nth rows every nth element in Python dataframe
You could use a startswith() option for this
df = df[(df['Date'].str.startswith('Ene')) | (df['Date'].str.startswith('Feb'))]
Related Topics
How to Crop an Image with Pygame
Unicodeencodeerror: 'Latin-1' Codec Can't Encode Character
How Does Python Importing Exactly Work
Python SQLite Parameter Substitution with Wildcards in Like
Class Inheritance in Python 3.7 Dataclasses
Can You Give a Django App a Verbose Name for Use Throughout the Admin
Python Regular Expression Re.Match, Why This Code Does Not Work
Calling Matlab Functions from Python
Shipping Python Modules in Pyspark to Other Nodes
Access to Table Objects on Webpage Using Python Selenium
How to Simulate Jumping in Pygame for This Particular Code
Python Script for Django App to Access Models Without Using Manage.Py Shell
Python Progression Path - from Apprentice to Guru
Optimizing Database Queries in Django Rest Framework
Splitting on Last Delimiter in Python String
Calculating Pearson Correlation and Significance in Python