How to Fit a Sine Curve to My Data with Pylab and Numpy

How do I fit a sine curve to my data with pylab and numpy?

You can use the least-square optimization function in scipy to fit any arbitrary function to another. In case of fitting a sin function, the 3 parameters to fit are the offset ('a'), amplitude ('b') and the phase ('c').

As long as you provide a reasonable first guess of the parameters, the optimization should converge well.Fortunately for a sine function, first estimates of 2 of these are easy: the offset can be estimated by taking the mean of the data and the amplitude via the RMS (3*standard deviation/sqrt(2)).

Note: as a later edit, frequency fitting has also been added. This does not work very well (can lead to extremely poor fits). Thus, use at your discretion, my advise would be to not use frequency fitting unless frequency error is smaller than a few percent.

This leads to the following code:

import numpy as np
from scipy.optimize import leastsq
import pylab as plt

N = 1000 # number of data points
t = np.linspace(0, 4*np.pi, N)
f = 1.15247 # Optional!! Advised not to use
data = 3.0*np.sin(f*t+0.001) + 0.5 + np.random.randn(N) # create artificial data with noise

guess_mean = np.mean(data)
guess_std = 3*np.std(data)/(2**0.5)/(2**0.5)
guess_phase = 0
guess_freq = 1
guess_amp = 1

# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = guess_std*np.sin(t+guess_phase) + guess_mean

# Define the function to optimize, in this case, we want to minimize the difference
# between the actual data and our "guessed" parameters
optimize_func = lambda x: x[0]*np.sin(x[1]*t+x[2]) + x[3] - data
est_amp, est_freq, est_phase, est_mean = leastsq(optimize_func, [guess_amp, guess_freq, guess_phase, guess_mean])[0]

# recreate the fitted curve using the optimized parameters
data_fit = est_amp*np.sin(est_freq*t+est_phase) + est_mean

# recreate the fitted curve using the optimized parameters

fine_t = np.arange(0,max(t),0.1)
data_fit=est_amp*np.sin(est_freq*fine_t+est_phase)+est_mean

plt.plot(t, data, '.')
plt.plot(t, data_first_guess, label='first guess')
plt.plot(fine_t, data_fit, label='after fitting')
plt.legend()
plt.show()

Sample Image

Edit: I assumed that you know the number of periods in the sine-wave. If you don't, it's somewhat trickier to fit. You can try and guess the number of periods by manual plotting and try and optimize it as your 6th parameter.

fit a sine curve to my data in python, matplotlib

Well, it does not seem to be a mathematical function, as for example for argument value 15 you may have multiple values (f(x) equals what?). Thus, it won't be a classical
interpolation in this case. If you could normalize the data somehow, ie make a function out of it, then you could use numpy.

Simplest approach would be to add some small disturbance where arguments' values are equal. Let's look at an example in your data:

4   0.0326
4   0.014
4   -0.0086
4   0.0067

So, as you can see, you can't tell what's the relation's value for f(4). If you'd disturb the arguments a bit, eg:

3.9     -0.0086
3.95    0.0067
4       0.014
4.05    0.0326

And so on for all such examples from your data file. Simplest approach would be to group these values by their x argument, sort and disturb.

That would obviously introduce some error, but, well...you are curve fitting anyway, right?

To formulate a sine, you have to know the amplitude, frequency and phase: f(x) = A * sin(F*x + p) where A is the amplitude, F is the frequency and p is the phase. Numpy has dedicated methods for this if you've got a proper data set prepared:
How do I fit a sine curve to my data with pylab and numpy?

Fitting sinusoidal data in Python

Your fitted curve will look like this


import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
def sinfunc(t, A, w, p, g, c):  return A * np.sin(w*t + p) * np.exp(-g*t) + c

tt = np.linspace(0, 10, 1000)
yy = sinfunc(tt, -1, 10, 2, 0.3, 2)
plt.plot(tt, yy)

Sample Image

g stretches the envelope horizontally, c moves the center vertically, w determines the oscillation frequency, A stretches the envelope vertically.

So it can't accurately model the data you have.

Also, you will not be able to reliably fit w, to determine the oscilation frequency it is better to try an FFT.

Of course you could adjust the function to look like your data by adding a few more parameters, e.g.

import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
def sinfunc(t, A, w, p, g, c1, c2, c3):  return A * np.sin(w*t + p) * (np.exp(-g*t) - c1) + c2 * np.exp(-g*t) + c3

tt = np.linspace(0, 10, 1000)
yy = sinfunc(tt, -1, 20, 2, 0.5, 1, 1.5, 1)
plt.plot(tt, yy)

Sample Image

But you will still have to give a good guess for the frequency.

Fitting sin curve using python

Well, if you used lmfit setting up and running your fit would look like this:

xdeg  = [70.434654, 37.147266, 8.5787086, 161.40877, -27.31284, 80.429482, -81.918106, 52.320129, 64.064552, -156.40771, 12.37026, 15.599689, 166.40984, 134.93636, 142.55002, -38.073524, -38.073524, 123.88509, -82.447571, 97.934402, 106.28793]

y = [86683.961, -40564.863, 50274.41, 80570.828, 63628.465, -87284.016, 30571.402, -79985.648, -69387.891, 175398.62, -132196.5, -64803.133, -269664.06, 36493.316, 22769.121, 25648.252, 25648.252, 53444.855, 684814.69, 82679.977, 103244.58]

import numpy as np
from lmfit import Model

import matplotlib.pyplot as plt

def sinefunction(x, a, b, c):
    return a + b * np.sin(x*np.pi/180.0 + c)

smodel = Model(sinefunction)
result = smodel.fit(y, x=xdeg, a=0, b=30000, c=0)

print(result.fit_report())

plt.plot(xdeg, y, 'o', label='data')
plt.plot(xdeg, result.best_fit, '*', label='fit')
plt.legend()
plt.show()

That is assuming your X data is in degrees, and that you really intended to convert that to radians (as numpy's sin() function requires).

But that just addresses the mechanics of how to do the fit (and I'll leave the display of results up to you - it seems like you may need the practice).

The fit result is terrible, because these data are not sinusoidal. They are also not well ordered, which isn't a problem for doing the fit, but does make it harder to see what is going on.

Sinusoidal regression line with scipy

The thing you are doing "wrong" is passing p0=None to curve_fit().

All fitting methods really, really require initial values. Unfortunately, scipy.optimize.curve_fit() has the completely unjustifiable option of allowing you to not set initial values and silently (not even a warning!!) making the absurd guess that all values have initial values of 1.0. It turns out that for your problem these impossible-to-justify-and-broken-by-design initial values are so bad that the fit fails to find a good answer. This is not uncommon. curve_fit is lying to you that p0=None is acceptable, and you are believing that lie.

The solution is to recognize that the offset is obviously around 11 and use p0=[1.0, 0.5, 0.5, 11.0].

You might consider using lmfit (https://lmfit.github.io/lmfit-py/). for this problem (disclaimer: I am a lead author). lmfit has a Model class for curve-fitting that has several useful features that might be useful here (not that curve_fit cannot solve this problem -- it can). With lmfit, your fit might look like:

import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model

def sinfunc(x, a, b, c, d):
    return a * np.sin(b*(x - c)) + d

year=np.arange(0,24,2)
population=np.array([10.2,11.1,12,11.7,10.6,10,
                     10.6,11.7,12,11.1,10.2,10.2])

# build model from your model function  
model  = Model(sinfunc)

# create parameters (with initial values!). Note that parameters 
# are named from the argument names of your model function
params = model.make_params(a=1, b=0.5, c=0.5, d=11.0)

# you can set min/max for any parameter to put bounds on the values
params['a'].min = 0
params['c'].min = -np.pi
params['c'].max = np.pi

# do the fit to your data with those parameters
result = model.fit(population, params, x=year)

# print out report of fit statistics and parameter values+uncertainties
print(result.fit_report())

# plot data and fit result
plt.scatter(year,population,label='Population')
plt.plot(year, result.best_fit, 'r-',label='Fitted function')
plt.title("Year vs Population")
plt.xlabel('Year')
plt.ylabel('Population')
plt.legend()
plt.show()

This will print out a report of

[[Model]]
    Model(sinfunc)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 26
    # data points      = 12
    # variables        = 4
    chi-square         = 0.00761349
    reduced chi-square = 9.5169e-04
    Akaike info crit   = -80.3528861
    Bayesian info crit = -78.4132595
[[Variables]]
    a:  1.00465520 +/- 0.01247767 (1.24%) (init = 1)
    b:  0.57528444 +/- 0.00198556 (0.35%) (init = 0.5)
    c:  1.80990367 +/- 0.03823815 (2.11%) (init = 0.5)
    d:  11.0250780 +/- 0.00925246 (0.08%) (init = 11)
[[Correlations]] (unreported correlations are < 0.100)
    C(b, c) =  0.812
    C(b, d) =  0.245
    C(c, d) =  0.234

and produce a plot of
Sample Image

But, again: the problem is that you were suckered into believing that p0=None is a reasonable use of curve_fit().

How can I fit this sinusoidal wave with my current data?

I'd use an FFT to get a first guess at parameters, as this sort of thing is highly non-linear and curve_fit is unlikely to get very far otherwise. the reason for using a FFT is to get an initial idea of the frequency involved, not much more. 3Blue1Brown has a great video on FFTs if you've not seem it

I used web plot digitizer to get your data out of your plots, then pulled into Python and made sure it looked OK with:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('sinfit2.csv')
print(df.head())

giving me:

       x     y
0  809.3   0.3
1  820.0   0.3
2  830.3  19.6
3  839.9  19.6
4  849.6   0.4

I started by doing a basic FFT with NumPy (SciPy has the full fftpack which is more complete, but not needed here):

import numpy as np
from numpy.fft import fft

d = fft(df.y)

plt.plot(np.abs(d)[:len(d)//2], '.')

the np.abs(d) is because you get a complex number back containing both phase and amplitude, and [:len(d)//2] is because (for real valued input) the output is symmetric about the midpoint, i.e. d[5] == d[-5].

this says the largest component was 18, I tried plotting this by hand and it looked OK:

x = np.linspace(0, np.pi * 2, len(df))

plt.plot(df.x, df.y, '.-', lw=1)
plt.plot(df.x, np.sin(x * 18) * 10 + 10)

I'm multiplying by 10 and adding 10 is because the range of a sine is (-1, +1) and we need to take it to (0, 20).

next I passed these to curve_fit with a simplified model to help it along:

from scipy.optimize import curve_fit

def model(x, a, b):
    return np.sin(x * a + b) * 10 + 10

(a, b), cov = curve_fit(model, x, df.y, [18, 0])

again I'm hardcoding the * 10 + 10 to get the range to match your data, which gives me a=17.8 and b=2.97

finally I plot the function sampled at a higher frequency to make sure all is OK:

plt.plot(df.x, df.y)
plt.plot(
    np.linspace(810, 1400, 501),
    model(np.linspace(0, np.pi*2, 501), a, b)
)

giving me:

validation plot

which seems to look OK. note you might want to change these parameters so they fit your original X, and note my df.x starts at 810, so I might have missed the first point.

How to Fit a Sine Curve to My Data with Pylab and Numpy

How do I fit a sine curve to my data with pylab and numpy?

fit a sine curve to my data in python, matplotlib

Fitting sinusoidal data in Python

Fitting sin curve using python

Sinusoidal regression line with scipy

How can I fit this sinusoidal wave with my current data?

Related Topics

Leave a reply