Modifying a subset of rows in a pandas dataframe
Use .loc
for label based indexing:
df.loc[df.A==0, 'B'] = np.nan
The df.A==0
expression creates a boolean series that indexes the rows, 'B'
selects the column. You can also use this to transform a subset of a column, e.g.:
df.loc[df.A==0, 'B'] = df.loc[df.A==0, 'B'] / 2
I don't know enough about pandas internals to know exactly why that works, but the basic issue is that sometimes indexing into a DataFrame returns a copy of the result, and sometimes it returns a view on the original object. According to documentation here, this behavior depends on the underlying numpy behavior. I've found that accessing everything in one operation (rather than [one][two]) is more likely to work for setting.
Modifying multiple columns in a subset of rows in pandas DataFrame
You can rename column names for align:
d = {'A_except':'finalA', 'B_except':'finalB'}
df.loc[exceptions.index, ['finalA', 'finalB']] = \
df.loc[exceptions.index, ['A_except', 'B_except']].rename(columns=d)
print (df)
A A_except B B_except normal finalA finalB
0 1 0 5 0 True 1 5
1 2 0 6 0 True 2 6
2 3 9 7 10 False 9 10
3 4 9 8 10 False 9 10
Another solution is convert output to numpy array
, but columns dont align:
df.loc[exceptions.index, ['finalA', 'finalB']] = \
df.loc[exceptions.index, ['A_except', 'B_except']].values
print (df)
A A_except B B_except normal finalA finalB
0 1 0 5 0 True 1 5
1 2 0 6 0 True 2 6
2 3 9 7 10 False 9 10
3 4 9 8 10 False 9 10
Efficient way to update column value for subset of rows on Pandas DataFrame?
This may be what you require:
df.loc[df.name.str.len() == 4, 'value'] *= 1000
df.loc[df.name.str.len() == 4, 'value'] = 'short_' + df['value'].astype(str)
How to update a subset of Pandas DataFrame rows with new (different) values?
If I understand your problem correctly then you want to change the values in column C based on values in column A and the actual value assigned to C is looked up in a dictionary but still you want to leave those rows untouched where a value in A is not present in the dictionary mapping.
Dictionary m is used for mapping values from column A to the target value:
df = pandas.DataFrame({'A': [1,2,3,4,5,6,7,8,9], 'C': [0,0,0,0,0,0,0,0,0]})
m = {1:1,3:1,6:1,8:1}
Then you need to select all rows in A that match the keys of the dictionary using select. Then you map the values of column A using m and assign the result to the filtered values of column C. The other values remain like before.
select = df['A'].isin(m.keys())
df.loc[select, 'C'] = df.loc[select, 'A'].map(m)
Modify a subset of a pandas column using data from another column
You are close, apply is not necessary:
m = df['A'] == 2
#short way
df.loc[m, 'B'] += df.loc[m, 'C']
#long way
df.loc[m, 'B'] = df.loc[m, 'B'] + df.loc[m, 'C']
Or:
df.loc[df['A'] == 2, 'B'] += df['C']
Modifying a subset of a pandas MultiIndex
One way is to re-assign the index by pd.MultiIndex
:
idx_to_change = {(0, 10), (9, 25)}
data.index = pd.MultiIndex.from_tuples([i if i not in idx_to_change else (i[0],i[1]+10) for i in data.index], names=("start","end"))
print (data)
col1 col2
start end
0 20 a 1
12 20 b 1
9 35 a 2
24 32 d 2
Modifying dataframe subset isn't changing source
Assigning one df to another only copies the reference to the dataframe object, so this assumption is correct. However, slicing the original dataframe cannot copy the reference, it creates a new object which is different to the original dataframe, it is a subset of it. The new object does not contain all of the original dataframes' data. If you want to modify subset of the original dataframe, assign the newly created data back to where you got them from.
temp = df.loc[(df['sample']==s) & (df['pixel']==p),['pce']]
nval=temp.iloc[0]['pce']
temp['pce']=temp['pce']/nval
df.loc[temp.index, 'pce'] = temp['pce']
String Modification On Pandas DataFrame Subset
you can do it with loc
but you did it the way around with column first while it should be index first, and using []
and not ()
mask_time = ~df['action'].str.contains('TIME') # same as df.action.str.contains('TIME')==False
df.loc[mask_time,'action'] = df.loc[mask_time,'action'].str.replace('([^a-z0-9\._]{2,})','')
example:
#dummy df
df = pd.DataFrame({'action': ['TIME 1', 'ABC 2']})
print (df)
action
0 TIME 1
1 ABC 2
see the result after using above method:
action
0 TIME 1
1 2
Related Topics
Pygame How to Let Balls Collide
What's the Difference Between "Update" and "Update_Idletasks"
How to Convert a Python Datetime.Datetime to Excel Serial Date Number
Numpy: Get Random Set of Rows from 2D Array
How to Do Virtual File Processing
Python Requests.Exceptions.Sslerror: Eof Occurred in Violation of Protocol
Using Print() (The Function Version) in Python2.X
Schedule a Repeating Event in Python 3
How to Use a Conditional Expression (Expression with If and Else) in a List Comprehension
Multiple Ping Script in Python
How to Remove the Space Between Subplots in Matplotlib.Pyplot
Executing Command Line Programs from Within Python
How to Plot a Confusion Matrix
Python Variables as Keys to Dict
Python Datetime to String Without Microsecond Component
Why Do Attribute References Act Like This with Python Inheritance