Pandas DataFrame: replace all values in a column, based on condition
You need to select that column:
In [41]:
df.loc[df['First Season'] > 1990, 'First Season'] = 1
df
Out[41]:
Team First Season Total Games
0 Dallas Cowboys 1960 894
1 Chicago Bears 1920 1357
2 Green Bay Packers 1921 1339
3 Miami Dolphins 1966 792
4 Baltimore Ravens 1 326
5 San Franciso 49ers 1950 1003
So the syntax here is:
df.loc[<mask>(here mask is generating the labels to index) , <optional column(s)> ]
You can check the docs and also the 10 minutes to pandas which shows the semantics
EDIT
If you want to generate a boolean indicator then you can just use the boolean condition to generate a boolean Series and cast the dtype to int
this will convert True
and False
to 1
and 0
respectively:
In [43]:
df['First Season'] = (df['First Season'] > 1990).astype(int)
df
Out[43]:
Team First Season Total Games
0 Dallas Cowboys 0 894
1 Chicago Bears 0 1357
2 Green Bay Packers 0 1339
3 Miami Dolphins 0 792
4 Baltimore Ravens 1 326
5 San Franciso 49ers 0 1003
Pandas data frame replace values in column based on condition
The problem with your code is, that df
is changing to type string during the process.
There exists a pandas function for this usecase, named pd.where()
.
df = df.where(df['col'].isin(['a', 'b']),'other')
Similar result avoiding where()
:
df[~df['col'].isin(['a', 'b'])] = 'other'
Replacing values in a pandas dataframe based on multiple conditions
In general, you could use np.select
on the values
and re-build the DataFrame
import pandas as pd
import numpy as np
df1 = pd.DataFrame(10*np.random.randn(10, 3))
df1.iloc[0, 0] = 0 # So we can check the == 0 condition
conds = [df1.values < 0 , df1.values > 0]
choices = ['down', 'up']
pd.DataFrame(np.select(conds, choices, default='zero'),
index=df1.index,
columns=df1.columns)
Output:
0 1 2
0 zero down up
1 up down up
2 up up up
3 down down down
4 up up up
5 up up up
6 up up down
7 up up down
8 down up down
9 up up down
Replace value in a pandas data frame column based on a condition
Let us fix your code
df['new'] = [' '.join(x.split()[:3]) if y !=y else y for x, y in zip(df['Text'],df['Name']) ]
Out[599]: ['Paul', 'ee ff gg', 'xx yy', 'Anton']
Conditional Replace Pandas
.ix
indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix
indexer is deprecated, so you should avoid using it. Instead, you can use .loc
or iloc
indexers. You can solve this problem by:
mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0
Or, in one line,
df.loc[df.my_channel > 20000, 'my_channel'] = 0
mask
helps you to select the rows in which df.my_channel > 20000
is True
, while df.loc[mask, column_name] = 0
sets the value 0 to the selected rows where mask
holds in the column which name is column_name
.
Update:
In this case, you should use loc
because if you use iloc
, you will get a NotImplementedError
telling you that iLocation based boolean indexing on an integer type is not available.
Replace value in pandas dataframe based on where condition
You can use np.where
or Series.mask
df['Feature1'] = df['Feature1'].mask(df['Feature1'].eq(-9999999), df['Age'])
# or
df['Feature1'] = np.where(df['Feature1'].eq(-9999999), df['Age'], df['Feature1'])
Python: replace multiple column values based on values present in other columns
You can use boolean indexing:
c = ['code3', 'code4']
df.loc[df.drop(c, axis=1).eq(1).any(1), c] = 0
id code1 code3 code4 code 5 code..n
0 ABC 1 0 0 1 1
1 CDE 0 0 0 0 1
2 EFG 1 0 0 0 1
Replace value in column based on a condition
np.where
is your friend https://numpy.org/doc/stable/reference/generated/numpy.where.html
df['Rate'] = np.where(((df['Rate'] == 0) & (df['Imputed_rate'] < df['min_rate'])), df['min_rate'],df['Rate'])
it is basically an if then else for rows in pandas.
iterating over row and column and replace values based on condition
Using DataFrame.applymap
is pretty slow when working with a big data set, it doesn't scale well. You should always look for a vectorized solution if possible.
In this case, you can mask the values between 10 and 100 and perform the conditional replacement using DataFrame.mask
(or DataFrame.where
if you negate the condition).
# select the numeric columns
num_cols = df1.select_dtypes(include="number").columns
# In DataFrame.mask `df` is replaced by the calling DataFrame,
# in this case df = df1[num_cols]
df1[num_cols] = (
df1[num_cols].mask(lambda df: (df > 10) & (df < 100),
lambda df: df // 10)
)
Output:
>>> df1
time n1 n2 n3 n4
0 11:50 1 2 3 4
1 12:50 5 6 7 8
2 13:50 8 7 6 500
Setup:
time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
'n1': [1, 5, 80],
'n2': [2, 6 ,7],
'n3': [3, 70 ,6],
'n4': [40, 8, 500],
}
df1 = pd.DataFrame(data = data_1)
Related Topics
Numpy Array Assignment with Copy
Windows Cmd Encoding Change Causes Python Crash
How to Set Environment Variables in Pycharm
Maximum Value for Long Integer
How to Execute Python Code from Within Visual Studio Code
How Could I Use Requests in Asyncio
How to Convert a Datetime to Date
Iterating Each Character in a String Using Python
Basic Http File Downloading and Saving to Disk in Python
How to Remove Nan Value While Combining Two Column in Panda Data Frame
Matplotlib Y Axis Values Are Not Ordered
Replace() Method Not Working on Pandas Dataframe
Df.Append() Is Not Appending to the Dataframe