How to Iterate Over Rows in a Dataframe in Pandas

How to iterate over rows in a DataFrame in Pandas

DataFrame.iterrows is a generator which yields both the index and row (as a Series):

import pandas as pd

df = pd.DataFrame({'c1': [10, 11, 12], 'c2': [100, 110, 120]})
df = df.reset_index()  # make sure indexes pair with number of rows

for index, row in df.iterrows():
    print(row['c1'], row['c2'])

10 100
11 110
12 120

Iterate over rows in pandas DataFrame and create a dict

You could use groupby and a dictionary comprehension:

d = {k:list(v) for k,v in df.groupby('name')['val']}

output:

{'p1': [0.0, 1.0], 'p2': [nan, 1.0, 0.0]}

using iterrows (not my favorite option)

NB. this will be quite slower on large dataframes

from collections import defaultdict

d = defaultdict(list)

for _, row in df.iterrows():
    d[row['name']].append(row['val'])
    
dict(d)

Python Pandas iterate over rows and access column names

I also like itertuples()

for row in df.itertuples():
    print(row.A)
    print(row.Index)

since row is a named tuples, if you meant to access values on each row this should be MUCH faster

speed run :

df = pd.DataFrame([x for x in range(1000*1000)], columns=['A'])
st=time.time()
for index, row in df.iterrows():
    row.A
print(time.time()-st)
45.05799984931946

st=time.time()
for row in df.itertuples():
    row.A
print(time.time() - st)
0.48400020599365234

Iterating over rows in a dataframe in Pandas: is there a difference between using df.index and df.iterrows() as iterators?

When we doing for loop , look up index get the data require additional loc

for index in df.index:
    value = df.loc['index','col']

When we do df.iterrows

for index, row in df.iterrows():
    value = row['col']

Since you already with pandas , both of them are not recommended. Unless you need certain function and cannot be vectorized.

However, IMO, I preferred df.index

Iterate over rows polars rust

If you activate the rows feature in polars, you can try:

DataFrame::get_row and DataFrame::get_row_amortized.

The latter is preferred, as that reduces heap allocations by reusing the row buffer.

Anti-pattern

This will be slow. Asking for rows from a columnar data storage will incur many cache misses and goes trough several layers of indirection.

Slightly better

What would be slightly better is using rust iterators. This will have less indirection than the get_row methods.

df.as_single_chunk_par();
let mut iters = df.columns(["foo", "bar", "ham"])?
    .iter().map(|s| s.iter()).collect::<Vec<_>>();

for row in 0..df.height() {
    for iter in &mut iters {
        let value = iter.next().expect("should have as many iterations as rows");
        // process value
    }
}

If your DataFrame consists of a single data type, you should downcast the Series to a ChunkedArray, this will speed up iteration.

In the snippet below, we'll assume the data type is Float64.

let mut iters = df.columns(["foo", "bar", "ham"])?
    .iter().map(|s| Ok(s.f64()?.into_iter())).collect::<Result<Vec<_>>>()?;

for row in 0..df.height() {
    for iter in &mut iters {
        let value = iter.next().expect("should have as many iterations as rows");
        // process value
    }
}

Iterate over rows of a dataframe based on index in python

If you just want to normalise, you can write the expression directly, using Series.min and Series.max:

m = df['time'].min()
df['normtime'] = (df['time'] - m) / (df['time'].max() - m)

However, if you want the difference between successive elements, you can use Series.diff:

df['difftime'] = df['time'].diff() / (df['time'].max() - df['time'].min())

Testing:

df = pd.DataFrame({'time': [0.000000, 0.020373, 0.040598], 'velocity': [0.136731, 0.244889, 0.386443]})
print(df)
#        time  velocity
# 0  0.000000  0.136731
# 1  0.020373  0.244889
# 2  0.040598  0.386443

m = df['time'].min()
df['normtime'] = (df['time'] - m) / (df['time'].max() - m)

df['difftime'] = df['time'].diff() / (df['time'].max() - df['time'].min())

print(df)
#        time  velocity  normtime  difftime
# 0  0.000000  0.136731  0.000000       NaN
# 1  0.020373  0.244889  0.501823  0.501823
# 2  0.040598  0.386443  1.000000  0.498177

How to Iterate Over Rows in a Dataframe in Pandas