Update a Dataframe in Pandas While Iterating Row by Row

iterate over pandas dataframe and update the value - AttributeError: can't set attribute

First iterating in pandas is possible, but very slow, so another vectorized solution are used.

I think you can use iterrows if you need iterating:

for idx, row in df.iterrows():
if df.loc[idx,'Qty'] == 1 and df.loc[idx,'Price'] == 10:
df.loc[idx,'Buy'] = 1

But better is to use vectorized solutions – set value by boolean mask with loc:

mask = (df['Qty'] == 1) & (df['Price'] == 10)
df.loc[mask, 'Buy'] = 1

Or solution with mask:

df['Buy'] = df['Buy'].mask(mask, 1)

Or if you need if...else use numpy.where:

df['Buy'] = np.where(mask, 1, 0)

Samples.

Set values by conditions:

df = pd.DataFrame({'Buy': [100, 200, 50], 
'Qty': [5, 1, 1],
'Name': ['apple', 'pear', 'banana'],
'Price': [1, 10, 10]})

print (df)
Buy Name Price Qty
0 100 apple 1 5
1 200 pear 10 1
2 50 banana 10 1

mask = (df['Qty'] == 1) & (df['Price'] == 10)

df['Buy'] = df['Buy'].mask(mask, 1)
print (df)
Buy Name Price Qty
0 100 apple 1 5
1 1 pear 10 1
2 1 banana 10 1
df['Buy'] = np.where(mask, 1, 0)
print (df)
Buy Name Price Qty
0 0 apple 1 5
1 1 pear 10 1
2 1 banana 10 1

Update row values using other row values while iterating over a Pandas dataframe

Since there is no example data, I couldn't tell why the code gives you the error. But you can try using apply()

def generate_answer(row):
if len(str(row.Year)) < 4 :
return ('Status on ' + row.Name + ' is ' + row.Status)
else :
return ('Status on ' + row.Name + ' ' + str(row.Year) + ' is ' + row.Status)

A['answer'] = A.apply(generate_answer, axis=1)

Pandas - iterate over dataframe rows and update df (one line of code)

You could do:

import pandas as pd
import requests
import numpy as np

d = {'ListOfURLs': ['https://stackoverflow.com/q/65060875/4001592',
'https://stackoverflow.com/q/65060875/4001592',
'https://stackoverflow.com/q/65060875/4001592']}
df = pd.DataFrame(data=d)

for index, row in df.iterrows():
r = requests.get(row['ListOfURLs'])
if r.status_code == 200:
df.at[index, ['Status Code', 'Result', 'Error']] = (r.status_code, '[OK]', np.nan)

print(df)

Output

                                     ListOfURLs  Status Code Result  Error
0 https://stackoverflow.com/q/65060875/4001592 200.0 [OK] NaN
1 https://stackoverflow.com/q/65060875/4001592 200.0 [OK] NaN
2 https://stackoverflow.com/q/65060875/4001592 200.0 [OK] NaN

Don't use set_value:

Deprecated since version 0.21.0: Use .at[] or .iat[] accessors
instead.

Notice that some details from your original question were omitted to produce an actual output.

update a dataframe in pandas while iterating

First convert dates to datetimes, then reshape by crosstab with Series.dt.month for correct order, add DataFrame.reindex for all missing months (if necessary), then convert columns to months and last convert MultiIndex to first 2 columns:

df['date'] = pd.to_datetime(df['date'])

df = (pd.crosstab([df['old value'],df['newvalue']], df['date'].dt.month)
.reindex(columns=range(1, 13), fill_value=0)
.rename(columns = lambda x: pd.to_datetime(x, format='%m').strftime('%b'))
.reset_index()
.rename_axis(None, axis=1))
print (df)
old value newvalue Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov \
0 aab baa 2 0 0 0 0 0 0 0 0 0 0
1 aab bca 0 1 0 0 0 0 0 0 0 0 0
2 abc cba 1 0 0 0 0 0 0 0 0 0 0
3 abc dca 0 0 0 1 0 0 0 0 0 0 0
4 acb baa 0 0 0 1 0 0 0 0 0 0 0
5 acb bca 0 1 0 0 0 0 0 0 0 0 0
6 acd dca 0 1 0 0 0 0 0 0 0 0 0

Dec
0 0
1 0
2 0
3 0
4 0
5 0
6 0

Replace 0 to empty strings is possible, but get numeric with strings data and next processing should be problem:

df = (pd.crosstab([df['old value'],df['newvalue']], df['date'].dt.month)
.replace(0, '')
.reindex(columns=range(1, 13), fill_value='')
.rename(columns = lambda x: pd.to_datetime(x, format='%m').strftime('%b'))
.reset_index()
.rename_axis(None, axis=1))
print (df)
old value newvalue Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
0 aab baa 2
1 aab bca 1
2 abc cba 1
3 abc dca 1
4 acb baa 1
5 acb bca 1
6 acd dca 1

Updating value in iterrow for pandas

The rows you get back from iterrows are copies that are no longer connected to the original data frame, so edits don't change your dataframe. Thankfully, because each item you get back from iterrows contains the current index, you can use that to access and edit the relevant row of the dataframe:

for index, row in rche_df.iterrows():
if isinstance(row.wgs1984_latitude, float):
row = row.copy()
target = row.address_chi
dict_temp = geocoding(target)
rche_df.loc[index, 'wgs1984_latitude'] = dict_temp['lat']
rche_df.loc[index, 'wgs1984_longitude'] = dict_temp['long']

In my experience, this approach seems slower than using an approach like apply or map, but as always, it's up to you to decide how to make the performance/ease of coding tradeoff.

In Pandas, how do I update the previous row in a iterator?

Without example data, it's unclear what you're trying. But using the operations in your for loop, it could probably be done like this instead, without any loop:

myValue = df['myCol']  # the column you wanted and other calculations
df['myCol'] = df['myCol'].shift() - myValue

Depending on what you're trying, one of these should be what you want:

# starting with this df
myCol otherCol
0 2 6
1 9 3
2 4 8
3 2 8
4 1 7

# next row minus current row
df['myCol'] = df['myCol'].shift(-1) - df['myCol']
df
# result:
myCol otherCol
0 7.0 6
1 -5.0 3
2 -2.0 8
3 -1.0 8
4 NaN 7

or

# previous row minus current row
df['myCol'] = df['myCol'].shift() - df['myCol']
df
# result:
myCol otherCol
0 NaN 6
1 -7.0 3
2 5.0 8
3 2.0 8
4 1.0 7

And myVal can be anything, like some mathematical operations vectorised over an entire column:

myVal = df['myCol'] * 2 + 3
# myVal is:
0 7
1 21
2 11
3 7
4 5
Name: myCol, dtype: int32

df['myCol'] = df['myCol'].shift(-1) - myVal
df
myCol otherCol
0 2.0 6
1 -17.0 3
2 -9.0 8
3 -6.0 8
4 NaN 7


Related Topics



Leave a reply



Submit