Difference Between Map, Applymap and Apply Methods in Pandas

Difference between map, applymap and apply methods in Pandas

Comparing map, applymap and apply: Context Matters

First major difference: DEFINITION

  • map is defined on Series ONLY
  • applymap is defined on DataFrames ONLY
  • apply is defined on BOTH

Second major difference: INPUT ARGUMENT

  • map accepts dicts, Series, or callable
  • applymap and apply accept callables only

Third major difference: BEHAVIOR

  • map is elementwise for Series
  • applymap is elementwise for DataFrames
  • apply also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.

Fourth major difference (the most important one): USE CASE

  • map is meant for mapping values from one domain to another, so is optimised for performance (e.g., df['A'].map({1:'a', 2:'b', 3:'c'}))
  • applymap is good for elementwise transformations across multiple rows/columns (e.g., df[['A', 'B', 'C']].applymap(str.strip))
  • apply is for applying any function that cannot be vectorised (e.g., df['sentences'].apply(nltk.sent_tokenize)).

Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using apply (note that there aren't many, but there are a few— apply is generally slow).



Summarising

Sample Image

Footnotes

  1. map when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as
    NaN in the output.

  2. applymap in more recent versions has been optimised for some operations. You will find applymap slightly faster than apply in
    some cases. My suggestion is to test them both and use whatever works
    better.

  3. map is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to
    use faster code paths for better performance.

  4. Series.apply returns a scalar for aggregating operations, Series otherwise. Similarly for DataFrame.apply. Note that apply also has
    fastpaths when called with certain NumPy functions such as mean,
    sum, etc.

What is the difference between Pandas Series.apply() and Series.map()?

The difference is subtle:

pandas.Series.map will substitute the values of the Series by what you pass into map.

pandas.Series.apply will apply a function (potentially with arguments) to the values of the Series.

The difference is what you can pass to the methods

  • both map and apply can receive a function :
s = pd.Series([1, 2, 3, 4])

def square(x):
return x**2

s.map(square)

0 1
1 2
2 3
3 4
dtype: int64

s.apply(square)

0 1
1 2
2 3
3 4
dtype: int64
  • However, the function you pass into map cannot have more than one parameter (it will output a ValueError) :
def power(x, p):
return x**p

s.apply(power, p=3)

0 1
1 8
2 27
3 64
dtype: int64


s.map(power,3)
---------------------------------------------------------------------------
ValueError

  • map can receive a dictionary (or even a pd.Series in which case it will use the index as key ) while apply cannot (it will output a TypeError)
dic = {1: 5, 2: 4}

s.map(dic)

0 5.0
1 4.0
2 NaN
3 NaN
dtype: float64

s.apply(dic)
---------------------------------------------------------------------------
TypeError


s.map(s)

0 2.0
1 3.0
2 4.0
3 NaN
dtype: float64


s.apply(s)

---------------------------------------------------------------------------
TypeError

Difference between Series.map and Series.apply

The See also paragraph of Series.map says that Series.apply is For applying more complex functions on a Series.

Series.map if for a one to one relation, that can be represented by a dictionary or a function of one parameter returning one value.

Series.apply can use functions returning more than one single parameter (in fact a whole Series). In that case, the result of Series.apply will be a DataFrame.

Said differently you can always use apply where you use map. If you pass a dict (say d) to map, you can pass a trivial lambda to apply: lambda x: d[x]. But if you use apply to transform a Series into a DataFrame, then map cannot be used.

As a result, map is likely to be more optimized that apply for one to one transformation, and should be used instead of apply wherever possible.

Python: pandas apply vs. map

Another solutions are use DataFrame.any for get at least one True per row:

print (df[['h1', 'h5']].apply(lambda x: x.str.contains('A')))
h1 h5
0 True False
1 False False
2 False True

print (df[['h1', 'h5']].apply(lambda x: x.str.contains('A')).any(1))
0 True
1 False
2 True
dtype: bool

df['new'] = np.where(df[['h1','h5']].apply(lambda x: x.str.contains('A')).any(1),
'Label', '')

print (df)
h1 h2 h3 h4 h5 new
0 A B C D Z Label
1 E A G H Y
2 I J K L A Label

mask = df[['h1', 'h5']].apply(lambda x: x.str.contains('A')).any(1)
df.loc[mask, 'New'] = 'Label'
print (df)
h1 h2 h3 h4 h5 New
0 A B C D Z Label
1 E A G H Y NaN
2 I J K L A Label

Working of map vs applymap in pandas, python

Yes, apply works on a row or a column basis of a DataFrame, applymap works element-wise on a DataFrame.

Understand pandas' applymap argument

It seems that you are confusing pandas.DataFrame.applymap and df.style.applymap (where df is an instance of pd.DataFrame), for which subset stands on its own and is not part of the kwargs arguments.

Here is one way to find out (in your terminal or a Jupyter notebook cell) what are the named parameters of this method (or any other Pandas method for that matter):

import pandas as pd

df = pd.DataFrame()
help(df.style.applymap)

# Output

Help on method applymap in module pandas.io.formats.style:

applymap(func: 'Callable', subset: 'Subset | None' = None, **kwargs)
-> 'Styler' method of pandas.io.formats.style.Styler instance
Apply a CSS-styling function elementwise.

Updates the HTML representation with the result.

Parameters
----------
func : function
``func`` should take a scalar and return a string.

subset : label, array-like, IndexSlice, optional
A valid 2d input to `DataFrame.loc[<subset>]`, or, in the case of a 1d input
or single key, to `DataFrame.loc[:, <subset>]` where the columns are
prioritised, to limit ``data`` to *before* applying the function.

**kwargs : dict
Pass along to ``func``.
...


Related Topics



Leave a reply



Submit