Replacing Few Values in a Pandas Dataframe Column with Another Value

Replacing few values in a pandas dataframe column with another value

The easiest way is to use the replace method on the column. The arguments are a list of the things you want to replace (here ['ABC', 'AB']) and what you want to replace them with (the string 'A' in this case):

>>> df['BrandName'].replace(['ABC', 'AB'], 'A')
0 A
1 B
2 A
3 D
4 A

This creates a new Series of values so you need to assign this new column to the correct column name:

df['BrandName'] = df['BrandName'].replace(['ABC', 'AB'], 'A')

Efficiently replace values from a column to another column Pandas DataFrame

Using np.where is faster. Using a similar pattern as you used with replace:

df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])
df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])

However, using a nested np.where is slightly faster:

df['col1'] = np.where(df['col1'] == 0, 
np.where(df['col2'] == 0, df['col3'], df['col2']),
df['col1'])

Timings

Using the following setup to produce a larger sample DataFrame and timing functions:

df = pd.concat([df]*10**4, ignore_index=True)

def root_nested(df):
df['col1'] = np.where(df['col1'] == 0, np.where(df['col2'] == 0, df['col3'], df['col2']), df['col1'])
return df

def root_split(df):
df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])
df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])
return df

def pir2(df):
df['col1'] = df.where(df.ne(0), np.nan).bfill(axis=1).col1.fillna(0)
return df

def pir2_2(df):
slc = (df.values != 0).argmax(axis=1)
return df.values[np.arange(slc.shape[0]), slc]

def andrew(df):
df.col1[df.col1 == 0] = df.col2
df.col1[df.col1 == 0] = df.col3
return df

def pablo(df):
df['col1'] = df['col1'].replace(0,df['col2'])
df['col1'] = df['col1'].replace(0,df['col3'])
return df

I get the following timings:

%timeit root_nested(df.copy())
100 loops, best of 3: 2.25 ms per loop

%timeit root_split(df.copy())
100 loops, best of 3: 2.62 ms per loop

%timeit pir2(df.copy())
100 loops, best of 3: 6.25 ms per loop

%timeit pir2_2(df.copy())
1 loop, best of 3: 2.4 ms per loop

%timeit andrew(df.copy())
100 loops, best of 3: 8.55 ms per loop

I tried timing your method, but it's been running for multiple minutes without completing. As a comparison, timing your method on just the 6 row example DataFrame (not the much larger one tested above) took 12.8 ms.

Pandas replace Values in a Column with values from another column but keep some values

IIUC:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['dog', 'cat', 'mouse', 'spider', 'fish', 'dog'],
'B': ['New York', 'London', np.nan, 'Berlin', np.nan,
'Paris']})

df.loc[(~df["A"].str.contains("dog"))&(df["B"].notnull()),"A"] = df["B"]

print (df)
#
A B
0 dog New York
1 London London
2 mouse NaN
3 Berlin Berlin
4 fish NaN
5 dog Paris

Replace values of certain column and rows in a DataFrame with the value of another column in the same dataframe with Pandas

This can make it work

df.loc[155:305,'categori'] = df['keterangan']

This will copy the values from keterangan to categori only on the index from 155 to 305

Replace values in column based on same or closer values from another columns pandas

First we find the value if df1['Score1'] that is the closest to each value in df2['Score1'], and put it into df2['match']:

df2['match'] = df2['Score1'].apply(lambda s : min(df1['Score1'].values, key = lambda x: abs(x-s)))

df2 now looks like this


Score1 life match
0 3.033986 0 2.29100
1 9.103820 0 9.10382
2 9.103820 0 9.10382
3 7.350981 0 9.10382
4 1.443400 0 2.29100
5 9.103820 0 9.10382
6 -1.134486 0 -1.34432

Now we just merge on match, drop unneeded columns and rename others

(df2[['match', 'Score1']].merge(df1, how = 'left', left_on = 'match', right_on = 'Score1', suffixes = ['','_2'])
.rename(columns = {'Avg_life':'life'})
.drop(columns = ['match', 'Score1_2'])
)

output


Score1 life
0 3.033986 432.0
1 9.103820 758.0
2 9.103820 758.0
3 7.350981 758.0
4 1.443400 432.0
5 9.103820 758.0
6 -1.134486 68000.0

Replace values from one column with another column Pandas DataFrame

Using just the method pandas.replace():

df.old_id = df.old_id.fillna(0).astype('int')

list_old = list(map(str, df.old_id.tolist()))
list_new = list(map(str, df.external_id.tolist()))

df['new_claim'] = df.claim.replace(to_replace=['Claim ID: ' + e for e in list_old], value=['Claim ID: ' + e for e in list_new], regex=True)
df['new_description'] = df.description.replace(to_replace=['\* ' + e + '\\n' for e in list_old], value=['* ' + e + '\\n' for e in list_new], regex=True)

Produces the following output:

Sample Image

Replacing values in pandas dataframe column with same row value from another column

One other way is to use np.where for numpy.where(condtion,yes,no)

In this case, I use nested np.where so that

np.where(If Flag=2,take val_2,(take x)) where takex is another np.where

df['Flag']=np.where(df['Flag']==1,df['val_1'],(np.where(df['Flag']==2,df['val_2'],df['Flag'])))
df

Output

Sample Image

Replace values in one dataframe with values from another dataframe

You can use update after replacing 0 with np.nan and setting a common index between the two dataframes.

Be wary of two things:

  1. Use overwrite=False to only fill the null values
  2. update modifies inplace
common_index = ['Region','Product']
df_indexed = df.replace(0,np.nan).set_index(common_index)
df2_indexed = df2.set_index(common_index)

df_indexed.update(df2_indexed,overwrite=False)


print(df_indexed.reset_index())

Region Product Country Quantity Price
0 Africa ABC South Africa 500.0 1200.0
1 Africa DEF South Africa 200.0 400.0
2 Africa XYZ South Africa 110.0 300.0
3 Africa DEF Nigeria 150.0 450.0
4 Africa XYZ Nigeria 200.0 750.0
5 Asia XYZ Japan 100.0 500.0
6 Asia ABC Japan 200.0 500.0
7 Asia DEF Japan 120.0 300.0
8 Asia XYZ India 250.0 600.0
9 Asia ABC India 100.0 400.0
10 Asia DEF India 40.0 220.0

Change one value based on another value in pandas

One option is to use Python's slicing and indexing features to logically evaluate the places where your condition holds and overwrite the data there.

Assuming you can load your data directly into pandas with pandas.read_csv then the following code might be helpful for you.

import pandas
df = pandas.read_csv("test.csv")
df.loc[df.ID == 103, 'FirstName'] = "Matt"
df.loc[df.ID == 103, 'LastName'] = "Jones"

As mentioned in the comments, you can also do the assignment to both columns in one shot:

df.loc[df.ID == 103, ['FirstName', 'LastName']] = 'Matt', 'Jones'

Note that you'll need pandas version 0.11 or newer to make use of loc for overwrite assignment operations. Indeed, for older versions like 0.8 (despite what critics of chained assignment may say), chained assignment is the correct way to do it, hence why it's useful to know about even if it should be avoided in more modern versions of pandas.


Another way to do it is to use what is called chained assignment. The behavior of this is less stable and so it is not considered the best solution (it is explicitly discouraged in the docs), but it is useful to know about:

import pandas
df = pandas.read_csv("test.csv")
df['FirstName'][df.ID == 103] = "Matt"
df['LastName'][df.ID == 103] = "Jones"


Related Topics



Leave a reply



Submit