Applying Function with Multiple Arguments to Create a New Pandas Column

Applying function with multiple arguments to create a new pandas column

Alternatively, you can use numpy underlying function:

>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
>>> df['new_column'] = np.multiply(df['A'], df['B'])
>>> df
A B new_column
0 10 20 200
1 20 30 600
2 30 10 300

or vectorize arbitrary function in general case:

>>> def fx(x, y):
... return x*y
...
>>> df['new_column'] = np.vectorize(fx)(df['A'], df['B'])
>>> df
A B new_column
0 10 20 200
1 20 30 600
2 30 10 300

Passing a function with multiple arguments to DataFrame.apply

It's just the way you think it would be, apply accepts args and kwargs and passes them directly to some_func.

df.apply(some_func, var1='DOG', axis=1)

Or,

df.apply(some_func, args=('DOG', ), axis=1)

0    foo-x-DOG
1 bar-y-DOG
dtype: object

python pandas- apply function with two arguments to columns

Why not just do this?

df['NewCol'] = df.apply(lambda x: segmentMatch(x['TimeCol'], x['ResponseCol']), 
axis=1)

Rather than trying to pass the column as an argument as in your example, we now simply pass the appropriate entries in each row as argument, and store the result in 'NewCol'.

pandas apply function with multiple inputs to create a new column

after some fiddling around I got it to work using that example:

df['country'] = df.apply(lambda x: airport_to_contry(x['a_code'],country_dict),axis = 1)

pandas apply function to multiple columns with condition and create new columns

First is necessary convert strings repr of lists by ast.literal_eval to lists, then for chceck length remove casting to strings. If need one element lists instead scalars use [] in fruit[0] and fruit[1] and last change order of condition for len(fruit) == 1, also change len(fruit) > 3 to len(fruit) > 2 for match first row:

def fruits_vegetable(row):

fruit = ast.literal_eval(row['fruit_code'])
vege = ast.literal_eval(row['vegetable_code'])

if len(fruit) == 1 and len(vege) > 1: # write "all" in new_col_1
row['new_col_1'] = 'all'
elif len(fruit) > 2 and len(vege) == 1: # vegetable_code in new_col_1
row['new_col_1'] = vege
elif len(fruit) > 2 and len(vege) > 1: # write "all" in new_col_1
row['new_col_1'] = 'all'
elif len(fruit) == 2 and len(vege) >= 0:# fruit 1 new_col_1 & fruit 2 new_col_2
row['new_col_1'] = [fruit[0]]
row['new_col_2'] = [fruit[1]]
elif len(fruit) == 1: # fruit_code in new_col_1
row['new_col_1'] = fruit
return row

df = df.apply(fruits_vegetable, axis=1)


print (df)
ID date fruit_code new_col_1 new_col_2 supermarket \
0 1 2022-01-01 [100,99,300] all NaN xy
1 2 2022-01-01 [67,200,87] [5000] NaN z, m
2 3 2021-01-01 [100,5,300,78] all NaN wf, z
3 4 2020-01-01 [77] [77] NaN NaN
4 5 2022-15-01 [100,200,546,33] all NaN t, wf
5 6 2002-12-01 [64,2] [64] [2] k
6 7 2018-12-01 [5] all NaN p

supermarkt vegetable_code
0 NaN [1000,2000,3000]
1 NaN [5000]
2 NaN [7000,2000,3000]
3 wf [1000]
4 NaN [4000,2000,3000]
5 NaN [6000,8000,1000]
6 NaN [6000,8000,1000]

Apply function with multiple argument to multiple columns to create a new column

Maybe this:

data['lower_bound_a']=data.apply(lambda x: ci_lower_bound(x['won_A'], x['lost_A']),axis=1)
print(data)

Apply pandas function to column to create multiple new columns?

Building off of user1827356 's answer, you can do the assignment in one pass using df.merge:

df.merge(df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1})), 
left_index=True, right_index=True)

textcol feature1 feature2
0 0.772692 1.772692 -0.227308
1 0.857210 1.857210 -0.142790
2 0.065639 1.065639 -0.934361
3 0.819160 1.819160 -0.180840
4 0.088212 1.088212 -0.911788

EDIT:
Please be aware of the huge memory consumption and low speed: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !

Pandas : How to apply a function with multiple column inputs and where condition

First, you should only use apply if necessary. Vectorized functions will be much faster, and the way you have it written now in the np.where statement makes use of these. If you really want to make your code more readable (at the (probably small) expense of time and memory) you could make an intermediate column and then use it in the np.where statement.

df["Share"] = ( df.B + df.C ) / ( df.B + df.C + df.D )
df["X"] = ( df.A + df.Share * df.E ).where( df.index >= 2020 )

To answer your question, however, you can create a custom function and then apply it to your DataFrame.

def my_func( year,a,b,c,d,e ):
#This function can be longer and do more things
return np.nan if year < 2020 else a + ( ( (b + c) / (b + c + d) ) * e )

df['X'] = df.apply( lambda x: my_func( x.name, x.A, x.B, x.C, x.D, x.E ), axis = 1 )

Note that to access then index of a row when using apply with axis = 1 you need to use the name attribute.

Also, since applying a function is relatively slow, it may be worth creating columns that take care of some of the intermediate steps (such as summing several columns, etc.) so that that doesn't need to be done in each iteration.

Check out this answer for more examples of applying a custom function.



Related Topics



Leave a reply



Submit