How to write a lambda function that is conditional on two variables (columns) in python
Use where
:
df['dummyVar '] = df['x'].where((df['x'] > 100) & (df['y'] < 50), df['y'])
This will be much faster than performing an apply operation as it is vectorised.
Using lambda if condition on different columns in Pandas dataframe
is that what you want?
In [300]: frame[['b','c']].apply(lambda x: x['c'] if x['c']>0 else x['b'], axis=1)
Out[300]:
0 -1.099891
1 0.582815
2 0.901591
3 0.900856
dtype: float64
Issue with multiple conditionals within a lambda function being applied to multiple columns
Define a named function instead of trying to cram everything into a complex lambda
.
There's no need to test x >= 2010
in the else
; if it gets to the else
, that must be true.
def labelval(x, y):
if x < 2010:
return 1
elif 26 <= y <= 40:
return 2
else:
return 3
CJ['label'] = CJ[['WY','WY Week']].apply(labelval)
Multiple conditionals in lambda function in pandas dataframe
You can use np.where
import numpy as np
ops['repair_location'] = (
ops['depot_name']
.apply(lambda x: np.where(x=='Field', 'Field', np.where(x=='Unknown Location', 'Unknown Location', 'Depot')))
)
pandas assign multiple columns with conditional lambda expression
Use vectorized solution with numpy.where
and numpy.select
:
m1 = df.year <= 1985
m2 = df.sex == 'M'
a = np.where(m1, 'A', 'B')
b = np.select([m1 & m2, ~m1 & m2, m1 & ~m2], [1,2,3], default=4)
df = df.assign(cat_a = a, cat_b = b)
print (df)
id sex year cat_a cat_b
0 3461 F 1983 A 3
1 8663 M 1988 B 2
2 6615 M 1986 B 2
3 5336 M 1982 A 1
4 3756 F 1984 A 3
5 8653 F 1989 B 4
6 9362 M 1985 A 1
7 3944 M 1981 A 1
8 3334 F 1986 B 4
9 6135 F 1988 B 4
Verify:
a = list(map(lambda y: 'A' if y <= 1985 else 'B', df.year))
b = list(map(lambda s, y: 1 if s == 'M' and y <= 1985 else (2 if s == 'M' else (3 if y < 1985 else 4)), df.sex, df.year))
df = df.assign(cat_a = a, cat_b = b)
print (df)
id sex year cat_a cat_b
0 3461 F 1983 A 3
1 8663 M 1988 B 2
2 6615 M 1986 B 2
3 5336 M 1982 A 1
4 3756 F 1984 A 3
5 8653 F 1989 B 4
6 9362 M 1985 A 1
7 3944 M 1981 A 1
8 3334 F 1986 B 4
9 6135 F 1988 B 4
Performance is really interesting, in small DataFrames to 1k
is faster mapping
, for bigger DataFrames is better use numpy
solution:
np.random.seed(999)
def mapping(df):
a = list(map(lambda y: 'A' if y <= 1985 else 'B', df.year))
b = list(map(lambda s, y: 1 if s == 'M' and y <= 1985 else (2 if s == 'M' else (3 if y < 1985 else 4)), df.sex, df.year))
return df.assign(cat_a = a, cat_b = b)
def vec(df):
m1 = df.year <= 1985
m2 = df.sex == 'M'
a = np.where(m1, 'A', 'B')
b = np.select([m1 & m2, ~m1 & m2, m1 & ~m2], [1,2,3], default=4)
return df.assign(cat_a = a, cat_b = b)
def make_df(n):
df = pd.DataFrame({'id': np.random.choice(range(10, 1000000), n, replace=False),
'sex': np.random.choice(list('MF'), n, replace=True),
'year': np.random.randint(1980, 1990, n)})
return df
perfplot.show(
setup=make_df,
kernels=[mapping, vec],
n_range=[2**k for k in range(2, 18)],
logx=True,
logy=True,
equality_check=False, # rows may appear in different order
xlabel='len(df)')
Using Two Variables In Lambda Python
Try this:
import pandas as pd
def update_column(row):
if (row['x'] >= .5 or row['y'] <= .5) and (row['x'] < .5 or row['y'] >= .5):
return "Good"
return "Bad"
df['new_column'] = df.apply(update_column, axis=1)
Using Apply in Pandas Lambda functions with multiple if statements
Here is a small example that you can build upon:
Basically, lambda x: x..
is the short one-liner of a function. What apply really asks for is a function which you can easily recreate yourself.
import pandas as pd
# Recreate the dataframe
data = dict(Size=[80000,8000000,800000000])
df = pd.DataFrame(data)
# Create a function that returns desired values
# You only need to check upper bound as the next elif-statement will catch the value
def func(x):
if x < 1e6:
return "<1m"
elif x < 1e7:
return "1-10m"
elif x < 5e7:
return "10-50m"
else:
return 'N/A'
# Add elif statements....
df['Classification'] = df['Size'].apply(func)
print(df)
Returns:
Size Classification
0 80000 <1m
1 8000000 1-10m
2 800000000 N/A
Python - Lambda function on multiple columns
You can use Series.mask
by boolean mask
:
mask = (df['fruit_a'] == 'vegetable') | (df['fruit_b'] == 'vegetable')
print (mask)
0 True
1 False
2 True
3 True
4 False
dtype: bool
df.my_fruits = df.my_fruits.mask(mask, 'not_fruits')
print (df)
fruit_a fruit_b my_fruits
0 apple vegetable not_fruits
1 banana apple fruit
2 vegetable vegetable not_fruits
3 vegetable pineapple not_fruits
4 cherry pear fruit
Another solution for mask
is compare all selected columns by vegetable
and then get all True
at least in one column by any
:
print ((df[['fruit_a', 'fruit_b']] == 'vegetable'))
fruit_a fruit_b
0 False True
1 False False
2 True True
3 True False
4 False False
mask = (df[['fruit_a', 'fruit_b']] == 'vegetable').any(axis=1)
print (mask)
0 True
1 False
2 True
3 True
4 False
dtype: bool
python pandas lambda with 2 and more variables
My guess is you want this:
df = pd.DataFrame({'A': [1,1,2,2,3,3],
'B': [2,2,2,3,3,3],
'TotalAmount': [10,20,30,40,50,60]})
df['NewColumn'] = df.groupby(['A', 'B'])['TotalAmount'].transform('sum')
df
# A B TotalAmount NewColumn
#0 1 2 10 30
#1 1 2 20 30
#2 2 2 30 30
#3 2 3 40 40
#4 3 3 50 110
#5 3 3 60 110
applying lambda row on multiple columns pandas
You need to use axis=1
to tell Pandas you want to apply a function to each row. The default is axis=0
.
tp['col'] = tp.apply(lambda row: row['source'] if row['target'] in ['b', 'n'] else 'x',
axis=1)
However, for this specific task, you should use vectorised operations. For example, using numpy.where
:
tp['col'] = np.where(tp['target'].isin(['b', 'n']), tp['source'], 'x')
pd.Series.isin
returns a Boolean series which tells numpy.where
whether to select the second or third argument.
Related Topics
Regex to Append Some Characters in a Certain Position
Retrieving Subfolders Names in S3 Bucket from Boto3
How to Convert .Dat to .Csv Using Python
Replacing Values in a Dataframe for Given Indices
Python: Getting Around Division by Zero
No Matching Distribution Found for Tkinter
How to Check for an Exact Word in a String in Python
Create a New Dataframe Based on Rows With a Certain Value
Json Valueerror: Expecting Property Name: Line 1 Column 2 (Char 1)
How to Run a Function Multiple Times and Return Different Result Python
Determining Neighbours of Cell Two Dimensional List
Python 3D Polynomial Surface Fit, Order Dependent
How to Pad a String With Leading Zeros in Python 3
Python/Regex - How to Extract Date from Filename Using Regular Expression
Python Serial: How to Use the Read or Readline Function to Read More Than 1 Character At a Time
Python: Using Doctests for Classes
Python Calculate Distance Closest Xy Points
High Pass Filter for Image Processing in Python by Using Scipy/Numpy