Why Is This Python Script With Matplotlib So Slow

Why is this Python script with Matplotlib so slow?

Your measure of execution time is too rough. The following allows you to measure the time needed for the simulation, separate from the time needed for plotting:

It is using numpy.

import matplotlib.pyplot as plt
import numpy as np
import time


def run_sims(num_sims, num_flips):
    start = time.time()
    sims = [np.random.choice(coins, num_flips).cumsum() for _ in range(num_sims)]
    end = time.time()
    print(f"sim time = {end-start}")
    return sims


def plot_sims(sims):
    start = time.time()
    for line in sims:
        plt.plot(line)
    end = time.time()
    print(f"plotting time = {end-start}")
    plt.show()


if __name__ == '__main__':

    start_time = time.time()
    num_sims = 2000
    num_flips = 2000
    coins = np.array([150, -100])

    plot_sims(run_sims(num_sims, num_flips))

result:

sim time = 0.13962197303771973
plotting time = 6.621474981307983

As you can see, the sim time is greatly reduced (it was on the order of 7 seconds on my 2011 laptop); The plotting time is matplotlib dependent.

why is plotting with Matplotlib so slow?

First off, (though this won't change the performance at all) consider cleaning up your code, similar to this:

import matplotlib.pyplot as plt
import numpy as np
import time

x = np.arange(0, 2*np.pi, 0.01)
y = np.sin(x)

fig, axes = plt.subplots(nrows=6)
styles = ['r-', 'g-', 'y-', 'm-', 'k-', 'c-']
lines = [ax.plot(x, y, style)[0] for ax, style in zip(axes, styles)]

fig.show()

tstart = time.time()
for i in xrange(1, 20):
    for j, line in enumerate(lines, start=1):
        line.set_ydata(np.sin(j*x + i/10.0))
    fig.canvas.draw()

print 'FPS:' , 20/(time.time()-tstart)

With the above example, I get around 10fps.

Just a quick note, depending on your exact use case, matplotlib may not be a great choice. It's oriented towards publication-quality figures, not real-time display.

However, there are a lot of things you can do to speed this example up.

There are two main reasons why this is as slow as it is.

1) Calling fig.canvas.draw() redraws everything. It's your bottleneck. In your case, you don't need to re-draw things like the axes boundaries, tick labels, etc.

2) In your case, there are a lot of subplots with a lot of tick labels. These take a long time to draw.

Both these can be fixed by using blitting.

To do blitting efficiently, you'll have to use backend-specific code. In practice, if you're really worried about smooth animations, you're usually embedding matplotlib plots in some sort of gui toolkit, anyway, so this isn't much of an issue.

However, without knowing a bit more about what you're doing, I can't help you there.

Nonetheless, there is a gui-neutral way of doing it that is still reasonably fast.

import matplotlib.pyplot as plt
import numpy as np
import time

x = np.arange(0, 2*np.pi, 0.1)
y = np.sin(x)

fig, axes = plt.subplots(nrows=6)

fig.show()

# We need to draw the canvas before we start animating...
fig.canvas.draw()

styles = ['r-', 'g-', 'y-', 'm-', 'k-', 'c-']
def plot(ax, style):
    return ax.plot(x, y, style, animated=True)[0]
lines = [plot(ax, style) for ax, style in zip(axes, styles)]

# Let's capture the background of the figure
backgrounds = [fig.canvas.copy_from_bbox(ax.bbox) for ax in axes]

tstart = time.time()
for i in xrange(1, 2000):
    items = enumerate(zip(lines, axes, backgrounds), start=1)
    for j, (line, ax, background) in items:
        fig.canvas.restore_region(background)
        line.set_ydata(np.sin(j*x + i/10.0))
        ax.draw_artist(line)
        fig.canvas.blit(ax.bbox)

print 'FPS:' , 2000/(time.time()-tstart)

This gives me ~200fps.

To make this a bit more convenient, there's an animations module in recent versions of matplotlib.

As an example:

import matplotlib.pyplot as plt
import matplotlib.animation as animation
import numpy as np

x = np.arange(0, 2*np.pi, 0.1)
y = np.sin(x)

fig, axes = plt.subplots(nrows=6)

styles = ['r-', 'g-', 'y-', 'm-', 'k-', 'c-']
def plot(ax, style):
    return ax.plot(x, y, style, animated=True)[0]
lines = [plot(ax, style) for ax, style in zip(axes, styles)]

def animate(i):
    for j, line in enumerate(lines, start=1):
        line.set_ydata(np.sin(j*x + i/10.0))
    return lines

# We'd normally specify a reasonable "interval" here...
ani = animation.FuncAnimation(fig, animate, xrange(1, 200), 
                              interval=0, blit=True)
plt.show()

MatPlotLib is very slow in python

I found this answer in another post. All credit to Luke.

Matplotlib makes great publication-quality graphics, but is not very
well optimized for speed. There are a variety of python plotting
packages that are designed with speed in mind:

http://pyqwt.sourceforge.net/ [ edit: pyqwt is no longer
maintained;the previous maintainer is recommending pyqtgraph ]

http://code.google.com/p/guiqwt/

http://code.enthought.com/projects/chaco/

http://www.pyqtgraph.org/

Very poor performance in my matplotlib script

Use a proper ODE solver like scipy.integrate.odeint for speed. Then you can use larger time steps for the output. With an implicit solver like odeint or solve_ivp with method="Radau" the coordinate planes that are boundaries in the exact solution will also be boundaries in the numerical solution, so that the values never become negative.

Reduce the plotted data set to match the actual resolution of the plot image.
The difference from 300 points to 1000 points may still be visible, there will be no visible difference from 1000 points to 5000 points, probably even not an actual difference.

matplotlib draws its images via a scene tree as objects, using slow python iteration. This makes it very slow if there are more than a couple 10000 objects to draw, so it is best to limit the number of details to this number.

Code for the ODE solver

to solve the ODE I used solve_ivp, but it makes no difference if odeint is used,

def SIR_prime(t,SIR,trans, recov): # solver expects t argument, even if not used
    S,I,R = SIR
    dS = (-trans*I/p) * S 
    dI = (trans*S/p-recov) * I
    dR = recov*I
    return [dS, dI, dR]

def genData(transRate, recovRate, maxT):
    SIR = solve_ivp(SIR_prime, [0,maxT], [s,i,r], args=(transRate, recovRate), method="Radau", dense_output=True)
    time = np.linspace(0,SIR.t[-1],1001)
    sVals, iVals, rVals = SIR.sol(time)
    return (time, sVals, iVals, rVals)

Streamlined code for the plot update procdures

One can remove much of the duplicated code. I also added a line so that the time axis changes with the maxTime variable, so that one really can zoom in

def updateTransmission(newVal):
    global trans_rate
    trans_rate = newVal
    updatePlot()

def updateRecovery(newVal):
    global recov_rate
    recov_rate = newVal
    updatePlot()

def updateMaxTime(newVal):
    global maxTime
    maxTime = newVal
    updatePlot()

def updatePlot():
    newData = genData(trans_rate, recov_rate, maxTime)

    susceptible.set_data(newData[0],newData[1])
    infected.set_data(newData[0],newData[2])
    recovered.set_data(newData[0],newData[3])

    ax.set_xlim(0, maxTime+1)

    r_o.set_text(r'$R_O$={:.2f}'.format(trans_rate/recov_rate))

    fig.canvas.draw_idle()

The code in-between and around remains the same.

Matplotlib plot excessively slow

There is no reason whatsoever to have a line plot of 20000000 points in matplotlib.

Let's consider printing first:
The maximum figure size in matplotlib is 50 inch. Even having a high-tech plotter with 3600 dpi would give a maximum number of 50*3600 = 180000 points which are resolvable.

For screen applications it's even less: Even a high-tech 4k screen has a limited resolution of 4000 pixels. Even if one uses aliasing effects, there are a maximum of ~3 points per pixel that would still be distinguishable for the human eye. Result: maximum of 12000 points makes sense.

Therefore the question you are asking rather needs to be: How do I subsample my 20000000 data points to a set of points that still produces the same image on paper or screen.

The solution to this strongly depends on the nature of the data. If it is sufficiently smooth, you can just take every nth list entry.

sample = data[::n]

If there are high frequency components which need to be resolved, this would require more sophisticated techniques, which will again depend on how the data looks like.

One such technique might be the one shown in How can I subsample an array according to its density? (Remove frequent values, keep rare ones).

Why Is This Python Script With Matplotlib So Slow