Pandas Every Nth Row

Pandas every nth row

I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:

df.iloc[::5, :]

How to take the average of every nth row of Pandas dataframe?

(Note: rewritten following clarifications).

out = (
    df
    .set_axis(
        df.index.str.replace(r'_P\d', '', regex=True)
        .set_names('group')
    )
    .groupby(['patient', 'time', 'group'])
    .mean()
)

The replace part on the index eliminates '_P{n}' as requested. Further, patient and time are used as keys for the groupby. The result (on your sample data) is:

>>> out
                                                DNAJA1  DNAJA1P5    DNAJA2  \
patient time group                                                           
P1      0h   0h_T1_TimeC1_PIDC4_Non-Survivor  0.378392 -0.191457  0.222613   
P2      0h   0h_T1_TimeC2_PIDC2_Survivor      0.246673 -0.223132  0.255885   
P3      0h   0h_T1_TimeC1_PIDC1_Survivor      0.327021 -0.212385  0.266633   
P4      0h   0h_T1_TimeC1_PIDC1_Survivor      0.282316 -0.182006  0.245088   
P5      0h   0h_T1_TimeC4_PIDC3_Survivor      0.200201 -0.220322  0.217304   

                                                DNAJA3    DNAJA4    DNAJB1  \
patient time group                                                           
P1      0h   0h_T1_TimeC1_PIDC4_Non-Survivor  0.180402  0.454774  0.579397   
P2      0h   0h_T1_TimeC2_PIDC2_Survivor      0.123849  0.205732  0.331627   
P3      0h   0h_T1_TimeC1_PIDC1_Survivor      0.104913  0.234903  0.380246   
P4      0h   0h_T1_TimeC1_PIDC1_Survivor      0.144778  0.274043  0.350569   
P5      0h   0h_T1_TimeC4_PIDC3_Survivor      0.133803  0.259557  0.302817   

(...)

Notes:

patient and time are now additional levels in the index. If that is not desired, simply add .reset_index(['patient', 'time'], drop=False) after .mean() above.

you could consider splitting your index into the parts that matter to you. A simple example (but you should add consistency tests instead of just dropping time and patient) could be:

idxcols = 'time patient t tc pidc survivor'.split()
out = (
    df
    .set_axis(df.index.str.split('_', expand=True).set_names(idxcols))
    .drop(['time', 'patient'], axis=1)
    .groupby(idxcols)
    .mean()
)

Pandas: Optimal subtract every nth row

Code

# Create boolean mask for matching rows
# m = np.arange(len(df)) % 6 == 5 # for index match
m = df['Samples'].str.contains(r'_BL\d+') # for regex match

# mask the values and backfill to propagate the row
# values corresponding to match in backward direction
df['var1'] = df['var1'] - df['var1'].mask(~m).bfill()

# Delete the matching rows
df = df[~m].copy()

     Samples  var1  var1
0  something -90.0 -80.0
1  something -80.0 -70.0
2  something -60.0 -70.0
4  something -50.0  60.0
5  something -10.0  90.0

Note: The core logic is specified in the code so I'll leave the function implementation upto the OP.

Pandas expanding a dataframe's length and populating every nth row

Use Index.repeat with DataFrame.loc and then duplicated values set to 0:

N = 3
df = df.loc[df.index.repeat(N)]
df['requests'] = df['requests'].mask(df.index.duplicated(), 0)
df = df.reset_index(drop=True)
print (df)
    frame     requests
0       0  214388438.0
1       0          0.0
2       0          0.0
3       1  194980303.0
4       1          0.0
5       1          0.0
6       2  179475934.0
7       2          0.0
8       2          0.0
9       3  165196540.0
10      3          0.0
11      3          0.0
12      4  154815540.0
13      4          0.0
14      4          0.0
15      5  123650671.0
16      5          0.0
17      5          0.0
18      6  119089045.0
19      6          0.0
20      6          0.0

Insert empty row after every Nth row in pandas dataframe

The following should scale well with the size of the DataFrame since it doesn't iterate over the rows and doesn't create intermediate DataFrames.

import pandas as pd

df = pd.DataFrame(columns=['a','b'],data=[[3,4],
    [5,5],[9,3],[1,2],[9,9],[6,5],[6,5],[6,5],[6,5],
    [6,5],[6,5],[6,5],[6,5],[6,5],[6,5],[6,5],[6,5]])

def add_empty_rows(df, n_empty, period):
    """ adds 'n_empty' empty rows every 'period' rows  to 'df'. 
        Returns a new DataFrame. """
    
    # to make sure that the DataFrame index is a RangeIndex(start=0, stop=len(df)) 
    # and that the original df object is not mutated. 
    df = df.reset_index(drop=True)
    
    # length of the new DataFrame containing the NaN rows
    len_new_index = len(df) + n_empty*(len(df) // period)
    # index of the new DataFrame
    new_index = pd.RangeIndex(len_new_index)
    
    # add an offset (= number of NaN rows up to that row) 
    # to the current df.index to align with new_index. 
    df.index += n_empty * (df.index
                             .to_series()
                             .groupby(df.index // period)
                             .ngroup())
    
    # reindex by aligning df.index with new_index. 
    # Values of new_index not present in df.index are filled with NaN.
    new_df = df.reindex(new_index)
    
    return new_df

Tests:

# original df
>>> df

    a  b
0   3  4
1   5  5
2   9  3
3   1  2
4   9  9
5   6  5
6   6  5
7   6  5
8   6  5
9   6  5
10  6  5
11  6  5
12  6  5
13  6  5
14  6  5
15  6  5
16  6  5

# add 2 empty rows every 3 rows
>>> add_empty_rows(df, 2, 3)

      a    b
0   3.0  4.0
1   5.0  5.0
2   9.0  3.0
3   NaN  NaN
4   NaN  NaN
5   1.0  2.0
6   9.0  9.0
7   6.0  5.0
8   NaN  NaN
9   NaN  NaN
10  6.0  5.0
11  6.0  5.0
12  6.0  5.0
13  NaN  NaN
14  NaN  NaN
15  6.0  5.0
16  6.0  5.0
17  6.0  5.0
18  NaN  NaN
19  NaN  NaN
20  6.0  5.0
21  6.0  5.0
22  6.0  5.0
23  NaN  NaN
24  NaN  NaN
25  6.0  5.0
26  6.0  5.0

# add 5 empty rows every 4 rows
>>> add_empty_rows(df, 5, 4)

      a    b
0   3.0  4.0
1   5.0  5.0
2   9.0  3.0
3   1.0  2.0
4   NaN  NaN
5   NaN  NaN
6   NaN  NaN
7   NaN  NaN
8   NaN  NaN
9   9.0  9.0
10  6.0  5.0
11  6.0  5.0
12  6.0  5.0
13  NaN  NaN
14  NaN  NaN
15  NaN  NaN
16  NaN  NaN
17  NaN  NaN
18  6.0  5.0
19  6.0  5.0
20  6.0  5.0
21  6.0  5.0
22  NaN  NaN
23  NaN  NaN
24  NaN  NaN
25  NaN  NaN
26  NaN  NaN
27  6.0  5.0
28  6.0  5.0
29  6.0  5.0
30  6.0  5.0
31  NaN  NaN
32  NaN  NaN
33  NaN  NaN
34  NaN  NaN
35  NaN  NaN
36  6.0  5.0

How do you sample every nth row within a range in a pandas dataframe?

First, we can create a test dataframe:

from pandas import util
tdf= util.testing.makeDataFrame()

then, we can index it in the following way:

tdf[start_index:end_index:step_size]

so, getting every other row from index 10 to 20 would look like this:

tdf[10:20:2]

Slicing Pandas DataFrame every nth row

You can do it with a for loop:

for i in range(round(len(df)/5)): #This ensures all rows are captured
   df.loc[i*5:(i+1)*5,:].to_csv('Stored_files_'+str(i)+'.csv')

So the first iteration it'll be rows 0 to 5 stored with name "Stored_files_0.csv
The second iteration rows 5 to 10 with name "Stored_files_1.csv"
And so on...

Select nth rows every nth element in Python dataframe

You could use a startswith() option for this

df = df[(df['Date'].str.startswith('Ene')) | (df['Date'].str.startswith('Feb'))]

Pandas Every Nth Row