Modifying a Subset of Rows in a Pandas Dataframe

Modifying a subset of rows in a pandas dataframe

Use .loc for label based indexing:

df.loc[df.A==0, 'B'] = np.nan

The df.A==0 expression creates a boolean series that indexes the rows, 'B' selects the column. You can also use this to transform a subset of a column, e.g.:

df.loc[df.A==0, 'B'] = df.loc[df.A==0, 'B'] / 2

I don't know enough about pandas internals to know exactly why that works, but the basic issue is that sometimes indexing into a DataFrame returns a copy of the result, and sometimes it returns a view on the original object. According to documentation here, this behavior depends on the underlying numpy behavior. I've found that accessing everything in one operation (rather than [one][two]) is more likely to work for setting.

Modifying multiple columns in a subset of rows in pandas DataFrame

You can rename column names for align:

d = {'A_except':'finalA', 'B_except':'finalB'}
df.loc[exceptions.index, ['finalA', 'finalB']] = \
df.loc[exceptions.index, ['A_except', 'B_except']].rename(columns=d)

print (df)
A A_except B B_except normal finalA finalB
0 1 0 5 0 True 1 5
1 2 0 6 0 True 2 6
2 3 9 7 10 False 9 10
3 4 9 8 10 False 9 10

Another solution is convert output to numpy array, but columns dont align:

df.loc[exceptions.index, ['finalA', 'finalB']] = \
df.loc[exceptions.index, ['A_except', 'B_except']].values

print (df)
A A_except B B_except normal finalA finalB
0 1 0 5 0 True 1 5
1 2 0 6 0 True 2 6
2 3 9 7 10 False 9 10
3 4 9 8 10 False 9 10

Efficient way to update column value for subset of rows on Pandas DataFrame?

This may be what you require:

 df.loc[df.name.str.len() == 4, 'value'] *= 1000

df.loc[df.name.str.len() == 4, 'value'] = 'short_' + df['value'].astype(str)

How to update a subset of Pandas DataFrame rows with new (different) values?

If I understand your problem correctly then you want to change the values in column C based on values in column A and the actual value assigned to C is looked up in a dictionary but still you want to leave those rows untouched where a value in A is not present in the dictionary mapping.

Dictionary m is used for mapping values from column A to the target value:

df = pandas.DataFrame({'A': [1,2,3,4,5,6,7,8,9], 'C': [0,0,0,0,0,0,0,0,0]})
m = {1:1,3:1,6:1,8:1}

Then you need to select all rows in A that match the keys of the dictionary using select. Then you map the values of column A using m and assign the result to the filtered values of column C. The other values remain like before.

select = df['A'].isin(m.keys())
df.loc[select, 'C'] = df.loc[select, 'A'].map(m)

Modify a subset of a pandas column using data from another column

You are close, apply is not necessary:

m = df['A'] == 2
#short way
df.loc[m, 'B'] += df.loc[m, 'C']
#long way
df.loc[m, 'B'] = df.loc[m, 'B'] + df.loc[m, 'C']

Or:

df.loc[df['A'] == 2, 'B'] += df['C']

Modifying a subset of a pandas MultiIndex

One way is to re-assign the index by pd.MultiIndex:

idx_to_change = {(0, 10), (9, 25)}

data.index = pd.MultiIndex.from_tuples([i if i not in idx_to_change else (i[0],i[1]+10) for i in data.index], names=("start","end"))
print (data)

col1 col2
start end
0 20 a 1
12 20 b 1
9 35 a 2
24 32 d 2

Modifying dataframe subset isn't changing source

Assigning one df to another only copies the reference to the dataframe object, so this assumption is correct. However, slicing the original dataframe cannot copy the reference, it creates a new object which is different to the original dataframe, it is a subset of it. The new object does not contain all of the original dataframes' data. If you want to modify subset of the original dataframe, assign the newly created data back to where you got them from.

temp = df.loc[(df['sample']==s) & (df['pixel']==p),['pce']]
nval=temp.iloc[0]['pce']
temp['pce']=temp['pce']/nval
df.loc[temp.index, 'pce'] = temp['pce']

String Modification On Pandas DataFrame Subset

you can do it with loc but you did it the way around with column first while it should be index first, and using [] and not ()

mask_time = ~df['action'].str.contains('TIME') # same as df.action.str.contains('TIME')==False
df.loc[mask_time,'action'] = df.loc[mask_time,'action'].str.replace('([^a-z0-9\._]{2,})','')

example:

#dummy df
df = pd.DataFrame({'action': ['TIME 1', 'ABC 2']})
print (df)
action
0 TIME 1
1 ABC 2

see the result after using above method:

   action
0 TIME 1
1 2


Related Topics



Leave a reply



Submit