Return Multiple Columns from Pandas Apply()

Return multiple columns from pandas apply()

You can return a Series from the applied function that contains the new data, preventing the need to iterate three times. Passing axis=1 to the apply function applies the function sizes to each row of the dataframe, returning a series to add to a new dataframe. This series, s, contains the new values, as well as the original data.

def sizes(s):
s['size_kb'] = locale.format("%.1f", s['size'] / 1024.0, grouping=True) + ' KB'
s['size_mb'] = locale.format("%.1f", s['size'] / 1024.0 ** 2, grouping=True) + ' MB'
s['size_gb'] = locale.format("%.1f", s['size'] / 1024.0 ** 3, grouping=True) + ' GB'
return s

df_test = df_test.append(rows_list)
df_test = df_test.apply(sizes, axis=1)

Apply pandas function to column to create multiple new columns?

Building off of user1827356 's answer, you can do the assignment in one pass using df.merge:

df.merge(df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1})), 
left_index=True, right_index=True)

textcol feature1 feature2
0 0.772692 1.772692 -0.227308
1 0.857210 1.857210 -0.142790
2 0.065639 1.065639 -0.934361
3 0.819160 1.819160 -0.180840
4 0.088212 1.088212 -0.911788

EDIT:
Please be aware of the huge memory consumption and low speed: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !

Pandas Apply Function That returns two new columns

Based on your latest error, you can avoid the error by returning the new columns as a Series

def myfunc1(row):
C = row['A'] + 10
D = row['A'] + 50
return pd.Series([C, D])

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)

Dataframe Apply method to return multiple elements (series)

UPDATE

Updated for version 0.23 - using result_type='broadcast' for further details refer to documentation

Redefine your function like this:

def divideAndMultiply(x,y):
return [x/y, x*y]

Then do this:

df[['e','f']] = df.apply(lambda x: divideAndMultiply(x["a"], 2), axis=1, result_type='broadcast')

You shall get the desired result:

In [118]: df
Out[118]:
a b e f
0 0 1 0 0
1 1 2 0 2
2 2 3 1 4
3 3 4 1 6

How to return multiple values including a list in pandas apply function?

You can do result_type='expand' in apply with your existing function:

df[['c','d','e']]=(df.apply(lambda x: funct(x['col1'],x['col2']),
axis=1,result_type='expand')


print(df)

col1 col2 c d e
0 1 a 1a a1 [1a, a1]
1 2 b 2b b2 [2b, b2]
2 3 c 3c c3 [3c, c3]
3 4 d 4d d4 [4d, d4]
4 5 e 5e e5 [5e, e5]

Return multiple values from a pandas rolling apply function

Rolling apply can only produce single numeric values. There is no support for multiple returns or even nonnumeric returns (like something as simple as a string) from rolling apply. Any answer to this question will be a work around.

That said, a viable workaround is to take advantage of the fact that rolling objects are iterable (as of pandas 1.1.0).

What’s new in 1.1.0 (July 28, 2020)

  • Made pandas.core.window.rolling.Rolling and pandas.core.window.expanding.Expanding iterable(GH11704)

Meaning that it is possible to take advantage of the faster grouping and indexing operations of the rolling function, but obtain more flexible behaviour with python:

def some_fn(df_):
"""
When iterating over a rolling window it disregards the min_periods
argument of rolling and will produce DataFrames for all windows

The input is also of type DataFrame not Series

You are completely responsible for doing all operations here,
including ignoring values if the input is not of the correct shape
or format

:param df_: A DataFrame produced by rolling
:return: a column joined, and the max value within the window
"""
return ','.join(df_['a']), df_['a'].max()

window = 5
results = pd.DataFrame([some_fn(df_) for df_ in df.rolling(window)])

Sample DataFrame and output:

df = pd.DataFrame({'a': list('abdesfkm')})

df:

   a
0 a
1 b
2 d
3 e
4 s
5 f
6 k
7 m

result:

           0  1
0 a a
1 a,b b
2 a,b,d d
3 a,b,d,e e
4 a,b,d,e,s s
5 b,d,e,s,f s
6 d,e,s,f,k s
7 e,s,f,k,m s

Returning multiple variables with pandas.series.apply

You can convert the result of Series.apply to list then assign to multiple columns

df[['DOUBLE', 'TRIPLE']] = df['x'].apply(do_math).tolist()
print(df)

x DOUBLE TRIPLE
0 1 2 3
1 2 4 6
2 3 6 9
3 4 8 12

You can also try DataFrame.apply on rows with result_type='expand'

df[['DOUBLE', 'TRIPLE']] = df.apply(lambda row: do_math(row['x']), axis=1, result_type='expand')
print(df)

x DOUBLE TRIPLE
0 1 2 3
1 2 4 6
2 3 6 9
3 4 8 12

Since your operation is simple, you can also try df.eval

df = df.eval('''
double = 2 * x
triple = 3 * x
'''
)
print(df)

x double triple
0 1 2 3
1 2 4 6
2 3 6 9
3 4 8 12

Pandas: apply function that return multiple new columns over Pandas DataFrame

Pretty much all you do can be done directly on the dataframe, instead of per-series and iterating on the columns:

def differencing(df, per=1):
dif = df.diff(periods=per).fillna(0).add_suffix(f'_per{per}')
ind = np.sign(dif).add_suffix('_ind')
return df.join([dif, ind])

differencing(df)

That’s roughly a 50% reduction in duration on a 5-column 10_000-rows dataframe. On a 5000-column 10-rows dataframe this reduced the time from 24 seconds to 0.016 seconds (caveat: both measured on my machine which runs a lot of other things simultaneously though).



Related Topics



Leave a reply



Submit