Pandas Dataframe: Replace All Values in a Column, Based on Condition

Pandas DataFrame: replace all values in a column, based on condition

You need to select that column:

In [41]:
df.loc[df['First Season'] > 1990, 'First Season'] = 1
df

Out[41]:
Team First Season Total Games
0 Dallas Cowboys 1960 894
1 Chicago Bears 1920 1357
2 Green Bay Packers 1921 1339
3 Miami Dolphins 1966 792
4 Baltimore Ravens 1 326
5 San Franciso 49ers 1950 1003

So the syntax here is:

df.loc[<mask>(here mask is generating the labels to index) , <optional column(s)> ]

You can check the docs and also the 10 minutes to pandas which shows the semantics

EDIT

If you want to generate a boolean indicator then you can just use the boolean condition to generate a boolean Series and cast the dtype to int this will convert True and False to 1 and 0 respectively:

In [43]:
df['First Season'] = (df['First Season'] > 1990).astype(int)
df

Out[43]:
Team First Season Total Games
0 Dallas Cowboys 0 894
1 Chicago Bears 0 1357
2 Green Bay Packers 0 1339
3 Miami Dolphins 0 792
4 Baltimore Ravens 1 326
5 San Franciso 49ers 0 1003

Pandas data frame replace values in column based on condition

The problem with your code is, that df is changing to type string during the process.

There exists a pandas function for this usecase, named pd.where().

df = df.where(df['col'].isin(['a', 'b']),'other')

Similar result avoiding where():

df[~df['col'].isin(['a', 'b'])] = 'other'

Replacing values in a pandas dataframe based on multiple conditions

In general, you could use np.select on the values and re-build the DataFrame

import pandas as pd
import numpy as np

df1 = pd.DataFrame(10*np.random.randn(10, 3))
df1.iloc[0, 0] = 0 # So we can check the == 0 condition

conds = [df1.values < 0 , df1.values > 0]
choices = ['down', 'up']

pd.DataFrame(np.select(conds, choices, default='zero'),
index=df1.index,
columns=df1.columns)

Output:

      0     1     2
0 zero down up
1 up down up
2 up up up
3 down down down
4 up up up
5 up up up
6 up up down
7 up up down
8 down up down
9 up up down

Replace value in a pandas data frame column based on a condition

Let us fix your code

df['new'] = [' '.join(x.split()[:3]) if y !=y else y for x, y  in zip(df['Text'],df['Name']) ]
Out[599]: ['Paul', 'ee ff gg', 'xx yy', 'Anton']

Conditional Replace Pandas

.ix indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix indexer is deprecated, so you should avoid using it. Instead, you can use .loc or iloc indexers. You can solve this problem by:

mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0

Or, in one line,

df.loc[df.my_channel > 20000, 'my_channel'] = 0

mask helps you to select the rows in which df.my_channel > 20000 is True, while df.loc[mask, column_name] = 0 sets the value 0 to the selected rows where maskholds in the column which name is column_name.

Update:
In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.

Replace value in pandas dataframe based on where condition

You can use np.where or Series.mask

df['Feature1'] = df['Feature1'].mask(df['Feature1'].eq(-9999999), df['Age'])
# or
df['Feature1'] = np.where(df['Feature1'].eq(-9999999), df['Age'], df['Feature1'])

Python: replace multiple column values based on values present in other columns

You can use boolean indexing:

c = ['code3', 'code4']
df.loc[df.drop(c, axis=1).eq(1).any(1), c] = 0


    id  code1  code3  code4  code 5  code..n
0 ABC 1 0 0 1 1
1 CDE 0 0 0 0 1
2 EFG 1 0 0 0 1

Replace value in column based on a condition

np.where is your friend https://numpy.org/doc/stable/reference/generated/numpy.where.html

df['Rate'] = np.where(((df['Rate'] == 0) & (df['Imputed_rate'] < df['min_rate'])), df['min_rate'],df['Rate'])

it is basically an if then else for rows in pandas.

iterating over row and column and replace values based on condition

Using DataFrame.applymap is pretty slow when working with a big data set, it doesn't scale well. You should always look for a vectorized solution if possible.

In this case, you can mask the values between 10 and 100 and perform the conditional replacement using DataFrame.mask (or DataFrame.where if you negate the condition).

# select the numeric columns
num_cols = df1.select_dtypes(include="number").columns

# In DataFrame.mask `df` is replaced by the calling DataFrame,
# in this case df = df1[num_cols]
df1[num_cols] = (
df1[num_cols].mask(lambda df: (df > 10) & (df < 100),
lambda df: df // 10)
)

Output:

>>> df1

time n1 n2 n3 n4
0 11:50 1 2 3 4
1 12:50 5 6 7 8
2 13:50 8 7 6 500

Setup:

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
'n1': [1, 5, 80],
'n2': [2, 6 ,7],
'n3': [3, 70 ,6],
'n4': [40, 8, 500],
}

df1 = pd.DataFrame(data = data_1)


Related Topics



Leave a reply



Submit