Difference between map, applymap and apply methods in Pandas
Comparing map
, applymap
and apply
: Context Matters
First major difference: DEFINITION
map
is defined on Series ONLYapplymap
is defined on DataFrames ONLYapply
is defined on BOTH
Second major difference: INPUT ARGUMENT
map
acceptsdict
s,Series
, or callableapplymap
andapply
accept callables only
Third major difference: BEHAVIOR
map
is elementwise for Seriesapplymap
is elementwise for DataFramesapply
also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.
Fourth major difference (the most important one): USE CASE
map
is meant for mapping values from one domain to another, so is optimised for performance (e.g.,df['A'].map({1:'a', 2:'b', 3:'c'})
)applymap
is good for elementwise transformations across multiple rows/columns (e.g.,df[['A', 'B', 'C']].applymap(str.strip)
)apply
is for applying any function that cannot be vectorised (e.g.,df['sentences'].apply(nltk.sent_tokenize)
).
Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using apply
(note that there aren't many, but there are a few— apply is generally slow).
Summarising
Footnotes
map
when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as
NaN in the output.
applymap
in more recent versions has been optimised for some operations. You will findapplymap
slightly faster thanapply
in
some cases. My suggestion is to test them both and use whatever works
better.
map
is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to
use faster code paths for better performance.
Series.apply
returns a scalar for aggregating operations, Series otherwise. Similarly forDataFrame.apply
. Note thatapply
also has
fastpaths when called with certain NumPy functions such asmean
,sum
, etc.
What is the difference between Pandas Series.apply() and Series.map()?
The difference is subtle:
pandas.Series.map
will substitute the values of the Series by what you pass into map
.
pandas.Series.apply
will apply a function (potentially with arguments) to the values of the Series.
The difference is what you can pass to the methods
- both
map
andapply
can receive a function :
s = pd.Series([1, 2, 3, 4])
def square(x):
return x**2
s.map(square)
0 1
1 2
2 3
3 4
dtype: int64
s.apply(square)
0 1
1 2
2 3
3 4
dtype: int64
- However, the function you pass into
map
cannot have more than one parameter (it will output aValueError
) :
def power(x, p):
return x**p
s.apply(power, p=3)
0 1
1 8
2 27
3 64
dtype: int64
s.map(power,3)
---------------------------------------------------------------------------
ValueError
map
can receive a dictionary (or even apd.Series
in which case it will use the index as key ) whileapply
cannot (it will output aTypeError
)
dic = {1: 5, 2: 4}
s.map(dic)
0 5.0
1 4.0
2 NaN
3 NaN
dtype: float64
s.apply(dic)
---------------------------------------------------------------------------
TypeError
s.map(s)
0 2.0
1 3.0
2 4.0
3 NaN
dtype: float64
s.apply(s)
---------------------------------------------------------------------------
TypeError
Difference between Series.map and Series.apply
The See also paragraph of Series.map
says that Series.apply
is For applying more complex functions on a Series.
Series.map
if for a one to one relation, that can be represented by a dictionary or a function of one parameter returning one value.
Series.apply
can use functions returning more than one single parameter (in fact a whole Series
). In that case, the result of Series.apply
will be a DataFrame
.
Said differently you can always use apply
where you use map
. If you pass a dict (say d
) to map
, you can pass a trivial lambda to apply: lambda x: d[x]
. But if you use apply
to transform a Series
into a DataFrame
, then map
cannot be used.
As a result, map
is likely to be more optimized that apply for one to one transformation, and should be used instead of apply
wherever possible.
Python: pandas apply vs. map
Another solutions are use DataFrame.any
for get at least one True
per row:
print (df[['h1', 'h5']].apply(lambda x: x.str.contains('A')))
h1 h5
0 True False
1 False False
2 False True
print (df[['h1', 'h5']].apply(lambda x: x.str.contains('A')).any(1))
0 True
1 False
2 True
dtype: bool
df['new'] = np.where(df[['h1','h5']].apply(lambda x: x.str.contains('A')).any(1),
'Label', '')
print (df)
h1 h2 h3 h4 h5 new
0 A B C D Z Label
1 E A G H Y
2 I J K L A Label
mask = df[['h1', 'h5']].apply(lambda x: x.str.contains('A')).any(1)
df.loc[mask, 'New'] = 'Label'
print (df)
h1 h2 h3 h4 h5 New
0 A B C D Z Label
1 E A G H Y NaN
2 I J K L A Label
Working of map vs applymap in pandas, python
Yes, apply
works on a row or a column basis of a DataFrame, applymap
works element-wise on a DataFrame.
Understand pandas' applymap argument
It seems that you are confusing pandas.DataFrame.applymap
and df.style.applymap
(where df is an instance of pd.DataFrame), for which subset
stands on its own and is not part of the kwargs
arguments.
Here is one way to find out (in your terminal or a Jupyter notebook cell) what are the named parameters of this method (or any other Pandas method for that matter):
import pandas as pd
df = pd.DataFrame()
help(df.style.applymap)
# Output
Help on method applymap in module pandas.io.formats.style:
applymap(func: 'Callable', subset: 'Subset | None' = None, **kwargs)
-> 'Styler' method of pandas.io.formats.style.Styler instance
Apply a CSS-styling function elementwise.
Updates the HTML representation with the result.
Parameters
----------
func : function
``func`` should take a scalar and return a string.
subset : label, array-like, IndexSlice, optional
A valid 2d input to `DataFrame.loc[<subset>]`, or, in the case of a 1d input
or single key, to `DataFrame.loc[:, <subset>]` where the columns are
prioritised, to limit ``data`` to *before* applying the function.
**kwargs : dict
Pass along to ``func``.
...
Related Topics
How to Create a Text Input Box With Pygame
Process Escape Sequences in a String in Python
Remove All Whitespace in a String
Convert Dataframe Column Type from String to Datetime
Select Rows in Pandas Multiindex Dataframe
How to Read CSV Data into a Record Array in Numpy
Assign Output of Os.System to a Variable and Prevent It from Being Displayed on the Screen
What Is a Good Way to Draw Images Using Pygame
Pandas: How to Easily Share a Sample Dataframe Using Df.To_Dict()
Difference Between Images in 'P' and 'L' Mode in Pil
How to Fix: "Unicodedecodeerror: 'Ascii' Codec Can't Decode Byte"