How to Write a Lambda Function That Is Conditional on Two Variables (Columns) in Python

How to write a lambda function that is conditional on two variables (columns) in python

Use where:

df['dummyVar '] = df['x'].where((df['x'] > 100) & (df['y'] < 50), df['y'])

This will be much faster than performing an apply operation as it is vectorised.

Using lambda if condition on different columns in Pandas dataframe

is that what you want?

In [300]: frame[['b','c']].apply(lambda x: x['c'] if x['c']>0 else x['b'], axis=1)
Out[300]:
0 -1.099891
1 0.582815
2 0.901591
3 0.900856
dtype: float64

Issue with multiple conditionals within a lambda function being applied to multiple columns

Define a named function instead of trying to cram everything into a complex lambda.

There's no need to test x >= 2010 in the else; if it gets to the else, that must be true.

def labelval(x, y):
if x < 2010:
return 1
elif 26 <= y <= 40:
return 2
else:
return 3

CJ['label'] = CJ[['WY','WY Week']].apply(labelval)

Multiple conditionals in lambda function in pandas dataframe

You can use np.where

import numpy as np
ops['repair_location'] = (
ops['depot_name']
.apply(lambda x: np.where(x=='Field', 'Field', np.where(x=='Unknown Location', 'Unknown Location', 'Depot')))
)

pandas assign multiple columns with conditional lambda expression

Use vectorized solution with numpy.where and numpy.select:

m1 = df.year <= 1985
m2 = df.sex == 'M'

a = np.where(m1, 'A', 'B')
b = np.select([m1 & m2, ~m1 & m2, m1 & ~m2], [1,2,3], default=4)

df = df.assign(cat_a = a, cat_b = b)
print (df)
id sex year cat_a cat_b
0 3461 F 1983 A 3
1 8663 M 1988 B 2
2 6615 M 1986 B 2
3 5336 M 1982 A 1
4 3756 F 1984 A 3
5 8653 F 1989 B 4
6 9362 M 1985 A 1
7 3944 M 1981 A 1
8 3334 F 1986 B 4
9 6135 F 1988 B 4

Verify:

a = list(map(lambda y: 'A' if y <= 1985 else 'B', df.year))
b = list(map(lambda s, y: 1 if s == 'M' and y <= 1985 else (2 if s == 'M' else (3 if y < 1985 else 4)), df.sex, df.year))

df = df.assign(cat_a = a, cat_b = b)
print (df)
id sex year cat_a cat_b
0 3461 F 1983 A 3
1 8663 M 1988 B 2
2 6615 M 1986 B 2
3 5336 M 1982 A 1
4 3756 F 1984 A 3
5 8653 F 1989 B 4
6 9362 M 1985 A 1
7 3944 M 1981 A 1
8 3334 F 1986 B 4
9 6135 F 1988 B 4

Performance is really interesting, in small DataFrames to 1k is faster mapping, for bigger DataFrames is better use numpy solution:

pic

np.random.seed(999)

def mapping(df):
a = list(map(lambda y: 'A' if y <= 1985 else 'B', df.year))
b = list(map(lambda s, y: 1 if s == 'M' and y <= 1985 else (2 if s == 'M' else (3 if y < 1985 else 4)), df.sex, df.year))

return df.assign(cat_a = a, cat_b = b)

def vec(df):
m1 = df.year <= 1985
m2 = df.sex == 'M'
a = np.where(m1, 'A', 'B')
b = np.select([m1 & m2, ~m1 & m2, m1 & ~m2], [1,2,3], default=4)
return df.assign(cat_a = a, cat_b = b)

def make_df(n):
df = pd.DataFrame({'id': np.random.choice(range(10, 1000000), n, replace=False),
'sex': np.random.choice(list('MF'), n, replace=True),
'year': np.random.randint(1980, 1990, n)})
return df

perfplot.show(
setup=make_df,
kernels=[mapping, vec],
n_range=[2**k for k in range(2, 18)],
logx=True,
logy=True,
equality_check=False, # rows may appear in different order
xlabel='len(df)')

Using Two Variables In Lambda Python

Try this:

import pandas as pd 

def update_column(row):
if (row['x'] >= .5 or row['y'] <= .5) and (row['x'] < .5 or row['y'] >= .5):
return "Good"
return "Bad"

df['new_column'] = df.apply(update_column, axis=1)

Using Apply in Pandas Lambda functions with multiple if statements

Here is a small example that you can build upon:

Basically, lambda x: x.. is the short one-liner of a function. What apply really asks for is a function which you can easily recreate yourself.

import pandas as pd

# Recreate the dataframe
data = dict(Size=[80000,8000000,800000000])
df = pd.DataFrame(data)

# Create a function that returns desired values
# You only need to check upper bound as the next elif-statement will catch the value
def func(x):
if x < 1e6:
return "<1m"
elif x < 1e7:
return "1-10m"
elif x < 5e7:
return "10-50m"
else:
return 'N/A'
# Add elif statements....

df['Classification'] = df['Size'].apply(func)

print(df)

Returns:

        Size Classification
0 80000 <1m
1 8000000 1-10m
2 800000000 N/A

Python - Lambda function on multiple columns

You can use Series.mask by boolean mask:

mask = (df['fruit_a'] == 'vegetable') | (df['fruit_b'] == 'vegetable')
print (mask)
0 True
1 False
2 True
3 True
4 False
dtype: bool


df.my_fruits = df.my_fruits.mask(mask, 'not_fruits')
print (df)
fruit_a fruit_b my_fruits
0 apple vegetable not_fruits
1 banana apple fruit
2 vegetable vegetable not_fruits
3 vegetable pineapple not_fruits
4 cherry pear fruit

Another solution for mask is compare all selected columns by vegetable and then get all True at least in one column by any:

print ((df[['fruit_a', 'fruit_b']] == 'vegetable'))
fruit_a fruit_b
0 False True
1 False False
2 True True
3 True False
4 False False

mask = (df[['fruit_a', 'fruit_b']] == 'vegetable').any(axis=1)
print (mask)
0 True
1 False
2 True
3 True
4 False
dtype: bool

python pandas lambda with 2 and more variables

My guess is you want this:

df = pd.DataFrame({'A': [1,1,2,2,3,3],
'B': [2,2,2,3,3,3],
'TotalAmount': [10,20,30,40,50,60]})

df['NewColumn'] = df.groupby(['A', 'B'])['TotalAmount'].transform('sum')
df
# A B TotalAmount NewColumn
#0 1 2 10 30
#1 1 2 20 30
#2 2 2 30 30
#3 2 3 40 40
#4 3 3 50 110
#5 3 3 60 110

applying lambda row on multiple columns pandas

You need to use axis=1 to tell Pandas you want to apply a function to each row. The default is axis=0.

tp['col'] = tp.apply(lambda row: row['source'] if row['target'] in ['b', 'n'] else 'x',
axis=1)

However, for this specific task, you should use vectorised operations. For example, using numpy.where:

tp['col'] = np.where(tp['target'].isin(['b', 'n']), tp['source'], 'x')

pd.Series.isin returns a Boolean series which tells numpy.where whether to select the second or third argument.



Related Topics



Leave a reply



Submit