Select Every Nth Row from Dataframe

Pandas every nth row

I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:

df.iloc[::5, :]

How to extract every nth row from dataframe?

You are close, need for default RangeIndex compare by 1:

df1 = [df.index % 100 == 1]

Solution with general index:

df1 = [np.arange(len(df)) % 100 == 1]

If want also omit 1. and 101. rows:

df2 = (df[(df.index % 100 == 1) & (df.index > 200)]

And:

a = np.arange(len(df))
df2 = df[(a % 100 == 1) & (a > 200)]

Sample:

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(1000,3)), columns=list('ABC'))
#print (df)

a = np.arange(len(df))
df2 = df[(a % 100 == 1) & (a > 200)]
print (df2)
A B C
201 4 4 4
301 1 3 2
401 0 3 5
501 5 8 4
601 3 7 9
701 5 5 7
801 4 1 0
901 4 7 6

Select nth rows every nth element in Python dataframe

You could use a startswith() option for this

df = df[(df['Date'].str.startswith('Ene')) | (df['Date'].str.startswith('Feb'))]

Select every other nth row of data frame and add to a list of data frames in R

Use split with 1:5 to create dataframes with a 5-row interval.

split(df, 1:5)

output

$`1`
X1 X2 X3 X4 X5
1 1 0 0 1.501990 0
6 6 0 0 2.186790 0
11 11 0 0 2.190029 0
16 16 0 0 1.842470 0

$`2`
X1 X2 X3 X4 X5
2 2 0 0 1.883904 0
7 7 0 0 1.269592 0
12 12 0 0 0.000000 0
17 17 0 0 1.937999 0

$`3`
X1 X2 X3 X4 X5
3 3 0 0 1.333195 0
8 8 0 0 1.458405 0
13 13 0 0 1.460534 0
18 18 0 0 0.000000 0

$`4`
X1 X2 X3 X4 X5
4 4 0 0 0.000000 0
9 9 0 0 1.816493 0
14 14 0 0 1.470776 0
19 19 0 0 1.649926 0

$`5`
X1 X2 X3 X4 X5
5 5 0 0 2.136760 0
10 10 0 0 0.000000 0
15 15 0 0 1.675406 0
20 20 0 0 2.067902 0

An alternative with dplyr::group_split is:

group_split(df, rep(1:5, nrow(df)/5), .keep = F)

data

df <- structure(list(X1 = 1:20, X2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X3 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), X4 = c(1.50199, 1.883904, 1.333195, 0, 2.13676,
2.18679, 1.269592, 1.458405, 1.816493, 0, 2.190029, 0, 1.460534,
1.470776, 1.675406, 1.84247, 1.937999, 0, 1.649926, 2.067902),
X5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-20L))

python pandas how to get data every n and every nth rows?

Use generator with iloc to select the desire rows:

def rows_generator(df):
i = 0
while (i+3) <= df.shape[0]:
yield df.iloc[i:(i+3):1, :]
i += 1

i = 1
for df in rows_generator(df):
print(f'Time #{i}')
print(df)
i += 1

Example output:

Time #1
Group Cat Value
0 Group1 Cat1 1230
1 Group2 Cat2 4019
2 Group3 Cat3 9491
Time #2
Group Cat Value
1 Group2 Cat2 4019
2 Group3 Cat3 9491
3 Group4 Cat4 9588
Time #3
Group Cat Value
2 Group3 Cat3 9491
3 Group4 Cat4 9588
4 Group5 Cat5 6402
Time #4
Group Cat Value
3 Group4 Cat4 9588
4 Group5 Cat5 6402
5 Group6 Cat 1923
Time #5
Group Cat Value
4 Group5 Cat5 6402
5 Group6 Cat 1923
6 Group7 Cat7 492
Time #6
Group Cat Value
5 Group6 Cat 1923
6 Group7 Cat7 492
7 Group8 Cat8 8589
Time #7
Group Cat Value
6 Group7 Cat7 492
7 Group8 Cat8 8589
8 Group9 Cat9 8582

How do you sample every nth row within a range in a pandas dataframe?

First, we can create a test dataframe:

from pandas import util
tdf= util.testing.makeDataFrame()

then, we can index it in the following way:

tdf[start_index:end_index:step_size]

so, getting every other row from index 10 to 20 would look like this:

tdf[10:20:2]

Slicing Pandas DataFrame every nth row

You can do it with a for loop:

for i in range(round(len(df)/5)): #This ensures all rows are captured
df.loc[i*5:(i+1)*5,:].to_csv('Stored_files_'+str(i)+'.csv')

So the first iteration it'll be rows 0 to 5 stored with name "Stored_files_0.csv
The second iteration rows 5 to 10 with name "Stored_files_1.csv"
And so on...



Related Topics



Leave a reply



Submit