Applying function with multiple arguments to create a new pandas column
Alternatively, you can use numpy underlying function:
>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
>>> df['new_column'] = np.multiply(df['A'], df['B'])
>>> df
A B new_column
0 10 20 200
1 20 30 600
2 30 10 300
or vectorize arbitrary function in general case:
>>> def fx(x, y):
... return x*y
...
>>> df['new_column'] = np.vectorize(fx)(df['A'], df['B'])
>>> df
A B new_column
0 10 20 200
1 20 30 600
2 30 10 300
Passing a function with multiple arguments to DataFrame.apply
It's just the way you think it would be, apply
accepts args
and kwargs
and passes them directly to some_func
.
df.apply(some_func, var1='DOG', axis=1)
Or,
df.apply(some_func, args=('DOG', ), axis=1)
0 foo-x-DOG
1 bar-y-DOG
dtype: object
python pandas- apply function with two arguments to columns
Why not just do this?
df['NewCol'] = df.apply(lambda x: segmentMatch(x['TimeCol'], x['ResponseCol']),
axis=1)
Rather than trying to pass the column as an argument as in your example, we now simply pass the appropriate entries in each row as argument, and store the result in 'NewCol'
.
pandas apply function with multiple inputs to create a new column
after some fiddling around I got it to work using that example:
df['country'] = df.apply(lambda x: airport_to_contry(x['a_code'],country_dict),axis = 1)
pandas apply function to multiple columns with condition and create new columns
First is necessary convert strings repr of lists by ast.literal_eval
to lists, then for chceck length remove casting to strings. If need one element lists instead scalars use []
in fruit[0]
and fruit[1]
and last change order of condition for len(fruit) == 1
, also change len(fruit) > 3
to len(fruit) > 2
for match first row:
def fruits_vegetable(row):
fruit = ast.literal_eval(row['fruit_code'])
vege = ast.literal_eval(row['vegetable_code'])
if len(fruit) == 1 and len(vege) > 1: # write "all" in new_col_1
row['new_col_1'] = 'all'
elif len(fruit) > 2 and len(vege) == 1: # vegetable_code in new_col_1
row['new_col_1'] = vege
elif len(fruit) > 2 and len(vege) > 1: # write "all" in new_col_1
row['new_col_1'] = 'all'
elif len(fruit) == 2 and len(vege) >= 0:# fruit 1 new_col_1 & fruit 2 new_col_2
row['new_col_1'] = [fruit[0]]
row['new_col_2'] = [fruit[1]]
elif len(fruit) == 1: # fruit_code in new_col_1
row['new_col_1'] = fruit
return row
df = df.apply(fruits_vegetable, axis=1)
print (df)
ID date fruit_code new_col_1 new_col_2 supermarket \
0 1 2022-01-01 [100,99,300] all NaN xy
1 2 2022-01-01 [67,200,87] [5000] NaN z, m
2 3 2021-01-01 [100,5,300,78] all NaN wf, z
3 4 2020-01-01 [77] [77] NaN NaN
4 5 2022-15-01 [100,200,546,33] all NaN t, wf
5 6 2002-12-01 [64,2] [64] [2] k
6 7 2018-12-01 [5] all NaN p
supermarkt vegetable_code
0 NaN [1000,2000,3000]
1 NaN [5000]
2 NaN [7000,2000,3000]
3 wf [1000]
4 NaN [4000,2000,3000]
5 NaN [6000,8000,1000]
6 NaN [6000,8000,1000]
Apply function with multiple argument to multiple columns to create a new column
Maybe this:
data['lower_bound_a']=data.apply(lambda x: ci_lower_bound(x['won_A'], x['lost_A']),axis=1)
print(data)
Apply pandas function to column to create multiple new columns?
Building off of user1827356 's answer, you can do the assignment in one pass using df.merge
:
df.merge(df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1})),
left_index=True, right_index=True)
textcol feature1 feature2
0 0.772692 1.772692 -0.227308
1 0.857210 1.857210 -0.142790
2 0.065639 1.065639 -0.934361
3 0.819160 1.819160 -0.180840
4 0.088212 1.088212 -0.911788
EDIT:
Please be aware of the huge memory consumption and low speed: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !
Pandas : How to apply a function with multiple column inputs and where condition
First, you should only use apply if necessary. Vectorized functions will be much faster, and the way you have it written now in the np.where statement makes use of these. If you really want to make your code more readable (at the (probably small) expense of time and memory) you could make an intermediate column and then use it in the np.where statement.
df["Share"] = ( df.B + df.C ) / ( df.B + df.C + df.D )
df["X"] = ( df.A + df.Share * df.E ).where( df.index >= 2020 )
To answer your question, however, you can create a custom function and then apply it to your DataFrame.
def my_func( year,a,b,c,d,e ):
#This function can be longer and do more things
return np.nan if year < 2020 else a + ( ( (b + c) / (b + c + d) ) * e )
df['X'] = df.apply( lambda x: my_func( x.name, x.A, x.B, x.C, x.D, x.E ), axis = 1 )
Note that to access then index of a row when using apply with axis = 1
you need to use the name attribute.
Also, since applying a function is relatively slow, it may be worth creating columns that take care of some of the intermediate steps (such as summing several columns, etc.) so that that doesn't need to be done in each iteration.
Check out this answer for more examples of applying a custom function.
Related Topics
Python Unicodedecodeerror - am I Misunderstanding Encode
Converting Xml to JSON Using Python
Python Multiprocessing Safely Writing to a File
Numpy Matrix Vector Multiplication
Sending Multipart HTML Emails Which Contain Embedded Images
Pretty Printing a Pandas Dataframe
Execute Multiple Commands in Paramiko So That Commands Are Affected by Their Predecessors
Making the Background Move Sideways in Pygame
Using Django Time/Date Widgets in Custom Form
Applying Function with Multiple Arguments to Create a New Pandas Column
Multiple Modeladmins/Views for Same Model in Django Admin
How to Make a Scatter Plot Colored by Density in Matplotlib
What Are the Differences Between JSON and Simplejson Python Modules
Pandas: Merge (Join) Two Data Frames on Multiple Columns
Pandas Dataframe: Replace All Values in a Column, Based on Condition
Read Excel Cell Value and Not the Formula Computing It -Openpyxl