Pandas Every Nth Row

Pandas every nth row

I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:

df.iloc[::5, :]

How to take the average of every nth row of Pandas dataframe?

(Note: rewritten following clarifications).

out = (
df
.set_axis(
df.index.str.replace(r'_P\d', '', regex=True)
.set_names('group')
)
.groupby(['patient', 'time', 'group'])
.mean()
)

The replace part on the index eliminates '_P{n}' as requested. Further, patient and time are used as keys for the groupby. The result (on your sample data) is:

>>> out
DNAJA1 DNAJA1P5 DNAJA2 \
patient time group
P1 0h 0h_T1_TimeC1_PIDC4_Non-Survivor 0.378392 -0.191457 0.222613
P2 0h 0h_T1_TimeC2_PIDC2_Survivor 0.246673 -0.223132 0.255885
P3 0h 0h_T1_TimeC1_PIDC1_Survivor 0.327021 -0.212385 0.266633
P4 0h 0h_T1_TimeC1_PIDC1_Survivor 0.282316 -0.182006 0.245088
P5 0h 0h_T1_TimeC4_PIDC3_Survivor 0.200201 -0.220322 0.217304

DNAJA3 DNAJA4 DNAJB1 \
patient time group
P1 0h 0h_T1_TimeC1_PIDC4_Non-Survivor 0.180402 0.454774 0.579397
P2 0h 0h_T1_TimeC2_PIDC2_Survivor 0.123849 0.205732 0.331627
P3 0h 0h_T1_TimeC1_PIDC1_Survivor 0.104913 0.234903 0.380246
P4 0h 0h_T1_TimeC1_PIDC1_Survivor 0.144778 0.274043 0.350569
P5 0h 0h_T1_TimeC4_PIDC3_Survivor 0.133803 0.259557 0.302817

(...)

Notes:

  1. patient and time are now additional levels in the index. If that is not desired, simply add .reset_index(['patient', 'time'], drop=False) after .mean() above.

  2. you could consider splitting your index into the parts that matter to you. A simple example (but you should add consistency tests instead of just dropping time and patient) could be:

    idxcols = 'time patient t tc pidc survivor'.split()
    out = (
    df
    .set_axis(df.index.str.split('_', expand=True).set_names(idxcols))
    .drop(['time', 'patient'], axis=1)
    .groupby(idxcols)
    .mean()
    )

Pandas: Optimal subtract every nth row

Code

# Create boolean mask for matching rows
# m = np.arange(len(df)) % 6 == 5 # for index match
m = df['Samples'].str.contains(r'_BL\d+') # for regex match

# mask the values and backfill to propagate the row
# values corresponding to match in backward direction
df['var1'] = df['var1'] - df['var1'].mask(~m).bfill()

# Delete the matching rows
df = df[~m].copy()


     Samples  var1  var1
0 something -90.0 -80.0
1 something -80.0 -70.0
2 something -60.0 -70.0
4 something -50.0 60.0
5 something -10.0 90.0

Note: The core logic is specified in the code so I'll leave the function implementation upto the OP.

Pandas expanding a dataframe's length and populating every nth row

Use Index.repeat with DataFrame.loc and then duplicated values set to 0:

N = 3
df = df.loc[df.index.repeat(N)]
df['requests'] = df['requests'].mask(df.index.duplicated(), 0)
df = df.reset_index(drop=True)
print (df)
frame requests
0 0 214388438.0
1 0 0.0
2 0 0.0
3 1 194980303.0
4 1 0.0
5 1 0.0
6 2 179475934.0
7 2 0.0
8 2 0.0
9 3 165196540.0
10 3 0.0
11 3 0.0
12 4 154815540.0
13 4 0.0
14 4 0.0
15 5 123650671.0
16 5 0.0
17 5 0.0
18 6 119089045.0
19 6 0.0
20 6 0.0

Insert empty row after every Nth row in pandas dataframe

The following should scale well with the size of the DataFrame since it doesn't iterate over the rows and doesn't create intermediate DataFrames.

import pandas as pd

df = pd.DataFrame(columns=['a','b'],data=[[3,4],
[5,5],[9,3],[1,2],[9,9],[6,5],[6,5],[6,5],[6,5],
[6,5],[6,5],[6,5],[6,5],[6,5],[6,5],[6,5],[6,5]])

def add_empty_rows(df, n_empty, period):
""" adds 'n_empty' empty rows every 'period' rows to 'df'.
Returns a new DataFrame. """

# to make sure that the DataFrame index is a RangeIndex(start=0, stop=len(df))
# and that the original df object is not mutated.
df = df.reset_index(drop=True)

# length of the new DataFrame containing the NaN rows
len_new_index = len(df) + n_empty*(len(df) // period)
# index of the new DataFrame
new_index = pd.RangeIndex(len_new_index)

# add an offset (= number of NaN rows up to that row)
# to the current df.index to align with new_index.
df.index += n_empty * (df.index
.to_series()
.groupby(df.index // period)
.ngroup())

# reindex by aligning df.index with new_index.
# Values of new_index not present in df.index are filled with NaN.
new_df = df.reindex(new_index)

return new_df

Tests:

# original df
>>> df

a b
0 3 4
1 5 5
2 9 3
3 1 2
4 9 9
5 6 5
6 6 5
7 6 5
8 6 5
9 6 5
10 6 5
11 6 5
12 6 5
13 6 5
14 6 5
15 6 5
16 6 5

# add 2 empty rows every 3 rows
>>> add_empty_rows(df, 2, 3)

a b
0 3.0 4.0
1 5.0 5.0
2 9.0 3.0
3 NaN NaN
4 NaN NaN
5 1.0 2.0
6 9.0 9.0
7 6.0 5.0
8 NaN NaN
9 NaN NaN
10 6.0 5.0
11 6.0 5.0
12 6.0 5.0
13 NaN NaN
14 NaN NaN
15 6.0 5.0
16 6.0 5.0
17 6.0 5.0
18 NaN NaN
19 NaN NaN
20 6.0 5.0
21 6.0 5.0
22 6.0 5.0
23 NaN NaN
24 NaN NaN
25 6.0 5.0
26 6.0 5.0

# add 5 empty rows every 4 rows
>>> add_empty_rows(df, 5, 4)

a b
0 3.0 4.0
1 5.0 5.0
2 9.0 3.0
3 1.0 2.0
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 9.0 9.0
10 6.0 5.0
11 6.0 5.0
12 6.0 5.0
13 NaN NaN
14 NaN NaN
15 NaN NaN
16 NaN NaN
17 NaN NaN
18 6.0 5.0
19 6.0 5.0
20 6.0 5.0
21 6.0 5.0
22 NaN NaN
23 NaN NaN
24 NaN NaN
25 NaN NaN
26 NaN NaN
27 6.0 5.0
28 6.0 5.0
29 6.0 5.0
30 6.0 5.0
31 NaN NaN
32 NaN NaN
33 NaN NaN
34 NaN NaN
35 NaN NaN
36 6.0 5.0

How do you sample every nth row within a range in a pandas dataframe?

First, we can create a test dataframe:

from pandas import util
tdf= util.testing.makeDataFrame()

then, we can index it in the following way:

tdf[start_index:end_index:step_size]

so, getting every other row from index 10 to 20 would look like this:

tdf[10:20:2]

Slicing Pandas DataFrame every nth row

You can do it with a for loop:

for i in range(round(len(df)/5)): #This ensures all rows are captured
df.loc[i*5:(i+1)*5,:].to_csv('Stored_files_'+str(i)+'.csv')

So the first iteration it'll be rows 0 to 5 stored with name "Stored_files_0.csv
The second iteration rows 5 to 10 with name "Stored_files_1.csv"
And so on...

Select nth rows every nth element in Python dataframe

You could use a startswith() option for this

df = df[(df['Date'].str.startswith('Ene')) | (df['Date'].str.startswith('Feb'))]


Related Topics



Leave a reply



Submit