Scatter Plots in Pandas/Pyplot: How to Plot by Category

Scatter plots in Pandas/Pyplot: How to plot by category

You can use scatter for this, but that requires having numerical values for your key1, and you won't have a legend, as you noticed.

It's better to just use plot for discrete categories like this. For example:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)

# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

groups = df.groupby('label')

# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend()

plt.show()

Sample Image

If you'd like things to look like the default pandas style, then just update the rcParams with the pandas stylesheet and use its color generator. (I'm also tweaking the legend slightly):

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)

# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

groups = df.groupby('label')

# Plot
plt.rcParams.update(pd.tools.plotting.mpl_stylesheet)
colors = pd.tools.plotting._get_standard_colors(len(groups), color_type='random')

fig, ax = plt.subplots()
ax.set_color_cycle(colors)
ax.margins(0.05)
for name, group in groups:
ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend(numpoints=1, loc='upper left')

plt.show()

Sample Image

Scatter plots in Pandas/Pyplot: How to plot by category with different markers

While you iterate over your groups, you can iterate over a list of markers using zip. The code below will iterate over the markers list and assign each element, in turn, using marker=marker in the ax.plot line.

I've also added itertools.cycle which will cause the iteration to go to the beginning once the end is reached, this means that if you have more than 3 groups then it won't fail. If you had 4 groups then the markers would be 'x', 'o', '^', 'x', for example.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)

from itertools import cycle

# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

groups = df.groupby('label')

markers = ['x', 'o', '^']

# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for (name, group), marker in zip(groups, cycle(markers)):
ax.plot(group.x, group.y, marker=marker, linestyle='', ms=12, label=name)
ax.legend()

plt.show()

Example plot

How to scatter plot each group of a pandas DataFrame

  • The correct way to do this with pandas is with pandas.DataFrame.groupby and pandas.DataFrame.plot.
  • Tested in python 3.8.12, pandas 1.3.4, matplotlib 3.4.3, seaborn 0.11.2
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# load data
df = sns.load_dataset('geyser')

# plot
fig, ax = plt.subplots(figsize=(6, 4))
colors = {'short': 'MediumVioletRed', 'long': 'Navy'}
for kind, data in df.groupby('kind'):
data.plot(kind='scatter', x='waiting', y='duration', label=kind, color=colors[kind], ax=ax)

ax.set(xlabel='Waiting', ylabel='Duration')
fig.suptitle('Waiting vs Duration')
plt.show()

Sample Image

  • The easiest way is with seaborn, a high-level API for matplotlib, where hue is used to separate groups by color.
    • sns.scatterplot: an axes-level plot
    • sns.relplot: a figure-level plot where kind='scatter' is the default plot style
fig, ax = plt.subplots(figsize=(6, 4))
colors = {'short': 'MediumVioletRed', 'long': 'Navy'}
sns.scatterplot(data=df, x='waiting', y='duration', hue='kind', palette=colors, ax=ax)

ax.set(xlabel='Waiting', ylabel='Duration')
fig.suptitle('Waiting vs Duration')
plt.show()
colors = {'short': 'MediumVioletRed', 'long': 'Navy'}
p = sns.relplot(data=df, x='waiting', y='duration', hue='kind', palette=colors, height=4, aspect=1.5)

ax = p.axes.flat[0] # extract the single subplot axes

ax.set(xlabel='Waiting', ylabel='Duration')
p.fig.suptitle('Waiting vs Duration', y=1.1)
plt.show()

scatter plot by category in pandas

This is essentially the same answer as @JoeCondron, but a two liner:

cmap = {'a': 'red', 'b': 'blue', 'c': 'yellow'}
df.plot(x='cpu', y='wait', kind='scatter',
colors=[cmap.get(c, 'black') for c in df.category])

If no color is mapped for the category, it defaults to black.

EDIT:

The above works for Pandas 0.14.1. For 0.16.2, 'colors' needs to be changed to 'c':

df.plot(x='cpu', y='wait', kind='scatter', 
c=[cmap.get(c, 'black') for c in df.category])

Scatter plots in Pandas: Plot by category with different color and shape combinations

you can try this code block

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

#Create mapping dictionary that you want
marker_dict = {'a':'o','b':'^','c':'s'}
color_dict = {'I':'red', 'II':'green', 'III':'blue'}

np.random.seed(1983)
num = 10
x, y = np.random.random((2, num))
cat1 = np.random.choice(['a', 'b', 'c'], num)
cat2 = np.random.choice(['I', 'II', 'III'], num)
df = pd.DataFrame(dict(x=x, y=y, cat1=cat1, cat2=cat2))

groups = df.groupby(['cat1', 'cat2'])

fig, ax = plt.subplots()
ax.margins(0.05)
for name, group in groups:
marker = marker_dict[name[0]]
color = color_dict[name[1]]
ax.plot(group.x, group.y, marker=marker, linestyle='', ms=12, label=name,color=color)
ax.legend()

plt.show()

Hope it helps.

Matplotlib scatter plot with different colors/label based on a category

You can use seaborn:

import seaborn as sns
import numpy as np

data = np.array([[1,1], [2,1], [0,1], [3,2], [3,3]])
labels = ['fruit', 'fruit', 'animal', 'animal', 'fruit']
sns.scatterplot(x=data[:, 0], y=data[:, 1], hue=labels)

It gives:

plot



Related Topics



Leave a reply



Submit