Apply Pandas Function to Column to Create Multiple New Columns

Apply pandas function to column to create multiple new columns?

Building off of user1827356 's answer, you can do the assignment in one pass using df.merge:

df.merge(df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1})), 
    left_index=True, right_index=True)

    textcol  feature1  feature2
0  0.772692  1.772692 -0.227308
1  0.857210  1.857210 -0.142790
2  0.065639  1.065639 -0.934361
3  0.819160  1.819160 -0.180840
4  0.088212  1.088212 -0.911788

EDIT:
Please be aware of the huge memory consumption and low speed: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !

Pandas apply row-wise a function and create multiple new columns

To do this, you need to:

Transpose df2 so that its columns are correct for concatenation
Index it with the df1["sic"] column to get the correct rows
Reset the index of the obtained rows of df2 using .reset_index(drop=True), so that the dataframes can be concatenated correctly. (This replaces the current index e.g. 5, 6, 3, 8, 12, 6 with a new one e.g. 0, 1, 2, 3, 4, 5 while keeping the actual values the same. This is so that pandas doesn't get confused while concatenating them)
Concatenate the two dataframes

Note: I used a method based off of this to read in the dataframe, and it assumed that the columns of df2 were strings but the values of the sic column of df1 were ints. Therefore I used .astype(str) to get step 2 working. If this is not actually the case, you may need to remove the .astype(str).

Here is the single line of code to do these things:

merged = pd.concat([df1, df2.T.loc[df1["sic"].astype(str)].reset_index(drop=True)], axis=1)

Here is the full code I used:

from io import StringIO
import pandas as pd

df1 = pd.read_csv(StringIO("""
sic data1   data2   data3   data4   data5
5   0.90783598  0.84722083  0.47149924  0.98724123  0.50654476
6   0.53442684  0.59730371  0.92486887  0.61531646  0.62784041
3   0.56806423  0.09619383  0.33846097  0.71878313  0.96316724
8   0.86933042  0.64965755  0.94549745  0.08866519  0.92156389
12  0.651328    0.37193774  0.9679044   0.36898991  0.15161838
6   0.24555531  0.50195983  0.79114578  0.9290596   0.10672607
"""), sep="\t")
df2 = pd.read_csv(StringIO("""
    1   2   3   4   5   6   7   8   9   10  11  12
c_bar   0.4955329   0.92970292  0.68049726  0.91325006  0.55578465  0.78056519  0.53954711  0.90335326  0.93986402  0.0204794   0.51575764  0.61144255
a1_bar  0.75781444  0.81052669  0.99910449  0.62181902  0.11797144  0.40031316  0.08561665  0.35296894  0.14445697  0.93799762  0.80641802  0.31379671
a2_bar  0.41432552  0.36313911  0.13091618  0.39251953  0.66249636  0.31221897  0.15988528  0.1620938   0.55143589  0.66571044  0.68198944  0.23806947
a3_bar  0.38918855  0.83689178  0.15838139  0.39943204  0.48615188  0.06299899  0.86343819  0.47975619  0.05300611  0.15080875  0.73088725  0.3500239
a4_bar  0.47201384  0.90874121  0.50417142  0.70047698  0.24820601  0.34302454  0.4650635   0.0992668   0.55142391  0.82947194  0.28251699  0.53170308
"""), sep="\t", index_col=[0])

merged = pd.concat([df1, df2.T.loc[df1["sic"].astype(str)].reset_index(drop=True)], axis=1)

print(merged)

which produces the output:

   sic     data1     data2     data3  ...    a1_bar    a2_bar    a3_bar    a4_bar
0    5  0.907836  0.847221  0.471499  ...  0.117971  0.662496  0.486152  0.248206
1    6  0.534427  0.597304  0.924869  ...  0.400313  0.312219  0.062999  0.343025
2    3  0.568064  0.096194  0.338461  ...  0.999104  0.130916  0.158381  0.504171
3    8  0.869330  0.649658  0.945497  ...  0.352969  0.162094  0.479756  0.099267
4   12  0.651328  0.371938  0.967904  ...  0.313797  0.238069  0.350024  0.531703
5    6  0.245555  0.501960  0.791146  ...  0.400313  0.312219  0.062999  0.343025

[6 rows x 11 columns]

Return multiple columns from pandas apply()

You can return a Series from the applied function that contains the new data, preventing the need to iterate three times. Passing axis=1 to the apply function applies the function sizes to each row of the dataframe, returning a series to add to a new dataframe. This series, s, contains the new values, as well as the original data.

def sizes(s):
    s['size_kb'] = locale.format("%.1f", s['size'] / 1024.0, grouping=True) + ' KB'
    s['size_mb'] = locale.format("%.1f", s['size'] / 1024.0 ** 2, grouping=True) + ' MB'
    s['size_gb'] = locale.format("%.1f", s['size'] / 1024.0 ** 3, grouping=True) + ' GB'
    return s

df_test = df_test.append(rows_list)
df_test = df_test.apply(sizes, axis=1)

Apply function on multiple columns and create new column based on condition

I first had to add the columns and fill them with zeros, then apply the function.

def conditions(x,column1, column2):
        if x[column1] != x[column2]: 
            return "incorrect"
        else:
            return "correct"
    
    
lst1=["col1","col2","col3","col4","col5"]
lst2=["col1_1","col2_2","col3_3","col4_4","col5_5"]
i=0
for item in lst2: 
    df[str(item)+"_2"] = 0

i=0
for item in df.columns[-5:]:
    df[item]=df.apply(lambda x: conditions(x, column1=lst1[i], column2=lst2[i]) , axis=1) 
    i=i+1

pandas apply function to multiple columns with condition and create new columns

First is necessary convert strings repr of lists by ast.literal_eval to lists, then for chceck length remove casting to strings. If need one element lists instead scalars use [] in fruit[0] and fruit[1] and last change order of condition for len(fruit) == 1, also change len(fruit) > 3 to len(fruit) > 2 for match first row:

def fruits_vegetable(row):
    
    fruit = ast.literal_eval(row['fruit_code'])
    vege = ast.literal_eval(row['vegetable_code'])
    
    if len(fruit) == 1 and len(vege) > 1:   # write "all" in new_col_1 
        row['new_col_1'] = 'all'
    elif len(fruit) > 2 and len(vege) == 1: # vegetable_code in new_col_1
        row['new_col_1'] = vege
    elif len(fruit) > 2 and len(vege) > 1:  # write "all" in new_col_1
        row['new_col_1'] = 'all'
    elif len(fruit) == 2 and len(vege) >= 0:# fruit 1 new_col_1 & fruit 2 new_col_2
        row['new_col_1'] = [fruit[0]]
        row['new_col_2'] = [fruit[1]]
    elif len(fruit) == 1:                   # fruit_code in new_col_1
        row['new_col_1'] = fruit
    return row

df = df.apply(fruits_vegetable, axis=1)

print (df)
   ID        date        fruit_code new_col_1 new_col_2 supermarket  \
0   1  2022-01-01      [100,99,300]       all       NaN          xy   
1   2  2022-01-01       [67,200,87]    [5000]       NaN        z, m   
2   3  2021-01-01    [100,5,300,78]       all       NaN       wf, z   
3   4  2020-01-01              [77]      [77]       NaN         NaN   
4   5  2022-15-01  [100,200,546,33]       all       NaN       t, wf   
5   6  2002-12-01            [64,2]      [64]       [2]           k   
6   7  2018-12-01               [5]       all       NaN           p   

  supermarkt    vegetable_code  
0        NaN  [1000,2000,3000]  
1        NaN            [5000]  
2        NaN  [7000,2000,3000]  
3         wf            [1000]  
4        NaN  [4000,2000,3000]  
5        NaN  [6000,8000,1000]  
6        NaN  [6000,8000,1000]

Pandas Apply Function That returns two new columns

Based on your latest error, you can avoid the error by returning the new columns as a Series

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return pd.Series([C, D])

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)

Applying function with multiple arguments to create a new pandas column

Alternatively, you can use numpy underlying function:

>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
>>> df['new_column'] = np.multiply(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         300

or vectorize arbitrary function in general case:

>>> def fx(x, y):
...     return x*y
...
>>> df['new_column'] = np.vectorize(fx)(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         300