Select Every Nth Row from Dataframe

Pandas every nth row

I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:

df.iloc[::5, :]

How to extract every nth row from dataframe?

You are close, need for default RangeIndex compare by 1:

df1 = [df.index % 100 == 1]

Solution with general index:

df1 = [np.arange(len(df)) % 100 == 1]

If want also omit 1. and 101. rows:

df2 = (df[(df.index % 100 == 1) & (df.index > 200)]

And:

a = np.arange(len(df))
df2 = df[(a % 100 == 1) & (a > 200)]

Sample:

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(1000,3)), columns=list('ABC'))
#print (df)

a = np.arange(len(df))
df2 = df[(a % 100 == 1) & (a > 200)]
print (df2)
     A  B  C
201  4  4  4
301  1  3  2
401  0  3  5
501  5  8  4
601  3  7  9
701  5  5  7
801  4  1  0
901  4  7  6

Select nth rows every nth element in Python dataframe

You could use a startswith() option for this

df = df[(df['Date'].str.startswith('Ene')) | (df['Date'].str.startswith('Feb'))]

Select every other nth row of data frame and add to a list of data frames in R

Use split with 1:5 to create dataframes with a 5-row interval.

split(df, 1:5)

output

$`1`
   X1 X2 X3       X4 X5
1   1  0  0 1.501990  0
6   6  0  0 2.186790  0
11 11  0  0 2.190029  0
16 16  0  0 1.842470  0

$`2`
   X1 X2 X3       X4 X5
2   2  0  0 1.883904  0
7   7  0  0 1.269592  0
12 12  0  0 0.000000  0
17 17  0  0 1.937999  0

$`3`
   X1 X2 X3       X4 X5
3   3  0  0 1.333195  0
8   8  0  0 1.458405  0
13 13  0  0 1.460534  0
18 18  0  0 0.000000  0

$`4`
   X1 X2 X3       X4 X5
4   4  0  0 0.000000  0
9   9  0  0 1.816493  0
14 14  0  0 1.470776  0
19 19  0  0 1.649926  0

$`5`
   X1 X2 X3       X4 X5
5   5  0  0 2.136760  0
10 10  0  0 0.000000  0
15 15  0  0 1.675406  0
20 20  0  0 2.067902  0

An alternative with dplyr::group_split is:

group_split(df, rep(1:5, nrow(df)/5), .keep = F)

data

df <- structure(list(X1 = 1:20, X2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X3 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L), X4 = c(1.50199, 1.883904, 1.333195, 0, 2.13676, 
2.18679, 1.269592, 1.458405, 1.816493, 0, 2.190029, 0, 1.460534, 
1.470776, 1.675406, 1.84247, 1.937999, 0, 1.649926, 2.067902), 
    X5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-20L))

python pandas how to get data every n and every nth rows?

Use generator with iloc to select the desire rows:

def rows_generator(df):
    i = 0
    while (i+3) <= df.shape[0]:
        yield df.iloc[i:(i+3):1, :]
        i += 1

i = 1
for df in rows_generator(df):
    print(f'Time #{i}')
    print(df)
    i += 1

Example output:

Time #1
    Group   Cat  Value
0  Group1  Cat1   1230
1  Group2  Cat2   4019
2  Group3  Cat3   9491
Time #2
    Group   Cat  Value
1  Group2  Cat2   4019
2  Group3  Cat3   9491
3  Group4  Cat4   9588
Time #3
    Group   Cat  Value
2  Group3  Cat3   9491
3  Group4  Cat4   9588
4  Group5  Cat5   6402
Time #4
    Group   Cat  Value
3  Group4  Cat4   9588
4  Group5  Cat5   6402
5  Group6   Cat   1923
Time #5
    Group   Cat  Value
4  Group5  Cat5   6402
5  Group6   Cat   1923
6  Group7  Cat7    492
Time #6
    Group   Cat  Value
5  Group6   Cat   1923
6  Group7  Cat7    492
7  Group8  Cat8   8589
Time #7
    Group   Cat  Value
6  Group7  Cat7    492
7  Group8  Cat8   8589
8  Group9  Cat9   8582

How do you sample every nth row within a range in a pandas dataframe?

First, we can create a test dataframe:

from pandas import util
tdf= util.testing.makeDataFrame()

then, we can index it in the following way:

tdf[start_index:end_index:step_size]

so, getting every other row from index 10 to 20 would look like this:

tdf[10:20:2]

Slicing Pandas DataFrame every nth row

You can do it with a for loop:

for i in range(round(len(df)/5)): #This ensures all rows are captured
   df.loc[i*5:(i+1)*5,:].to_csv('Stored_files_'+str(i)+'.csv')

So the first iteration it'll be rows 0 to 5 stored with name "Stored_files_0.csv
The second iteration rows 5 to 10 with name "Stored_files_1.csv"
And so on...