Computing Cross-Correlation Function

Computing cross-correlation function?

To cross-correlate 1d arrays use numpy.correlate.

For 2d arrays, use scipy.signal.correlate2d.

There is also scipy.stsci.convolve.correlate2d.

There is also matplotlib.pyplot.xcorr which is based on numpy.correlate.

See this post on the SciPy mailing list for some links to different implementations.

Edit: @user333700 added a link to the SciPy ticket for this issue in a comment.

Normalized Cross-Correlation in Python

Nice Question. There is no direct way but you can "normalize" the input vectors before using np.correlate like this and reasonable values will be returned within a range of [-1,1]:

Here i define the correlation as generally defined in signal processing textbooks.

c'_{ab}[k] = sum_n a[n] conj(b[n+k])

CODE: If a and b are the vectors:

a = (a - np.mean(a)) / (np.std(a) * len(a))
b = (b - np.mean(b)) / (np.std(b))
c = np.correlate(a, b, 'full')

References:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html

https://en.wikipedia.org/wiki/Cross-correlation

Sample Image

A question on cross-correlation & correlation coefficient

I seriously think you need to read up more on cross-correlation functions & correlation coefficient from a statistics book, because your confusion here is more fundamental than related to MATLAB. Unless you know what you're dealing with, you cannot make sense of what MATLAB gives you, even if you get the program right.

CROSS-CORRELATION:

Here is what you do in a cross correlation. Consider data A and B as follows

    A                   B

x
x | x x
| | | x |
| | x | | | x
| | | | | | |
--------------- -----------
0 1 2 3 0 1 2

You then take B and slide it all the way to the end, so that the last point of B and the first point of A are aligned:

                x                   
x | x
| | |
| | x |
| | | |
----x---x------------------
-2 -1 0 1 2 3

x
x |
| | x
| | |
----------------x---x---x--
-2 -1 0 1 2 3

You fill in zeros where ever the data does not exist i.e., in this case, B beyond 0 and A before 0. Then you multiply them point wise and add, giving 0 + 0 + 3 + 0 + 0 + 0 = 3 as your first point in the cross-correlation.

Then you slide B one step to the right and repeat

            x                   
x | x
| | |
| | x |
| | | |
----x------------------
-1 0 1 2 3

x
x |
| | x
| | |
----------------x---x--
-1 0 1 2 3

giving 0 + 9 + 4 + 0 + 0 = 13 as the second point in the cross-correlation. You keep doing this till you slide B all the way to the other end of A.

The resulting vector is length(A)+length(B)-1, the -1 being because we started with an overlap at 0, so it's one point less. So here you should get 3 + 4 - 1=6 points in the cross-correlation and in your case, you should get 73 + 73 -1 = 145 points.

As you can see, the value of the cross-correlation vector at any point, need not be within ±1. The cross-correlation has a maximum when the two data vectors are "most alike". The "offset" of the peak from zero gives an indication of the "lag" between the two datasets.

CORRELATION COEFFICIENT

The correlation coefficient (I'm assuming Pearson's) is a mere number defined as

            Covariance(A,B)
r = --------------------------------
________________________________
\|Covariance(A,A)*Covariance(B,B)

where Covariance(A,A) is better known as Variance(A). This is a quantity that can range from -1 to 1 (as for why it has to be between ±1, look up Cauchy-Schwartz inequality)

NOTE:

While you can most certainly calculate the cross-correlation of two data vectors with unequal data points, you cannot compute their correlation coefficient. The notion of covariance is a measure of how two variables/datasets change together and is not defined for unequal datasets.

Fast cross correlation method in Python

You're unlikely to get much faster than using an fft based correlation method.

import numpy
from scipy import signal

data_length = 8192

a = numpy.random.randn(data_length)
b = numpy.zeros(data_length * 2)

b[data_length/2:data_length/2+data_length] = a # This works for data_length being even

# Do an array flipped convolution, which is a correlation.
c = signal.fftconvolve(b, a[::-1], mode='valid')

# Use numpy.correlate for comparison
d = numpy.correlate(a, a, mode='same')

# c will be exactly the same as d, except for the last sample (which
# completes the symmetry)
numpy.allclose(c[:-1], d) # Should be True

Now for a time comparison:

In [12]: timeit b[data_length/2:data_length/2+data_length] = a; c = signal.fftconvolve(b, a[::-1], mode='valid')
100 loops, best of 3: 4.67 ms per loop

In [13]: timeit d = numpy.correlate(a, a, mode='same')
10 loops, best of 3: 69.9 ms per loop

If you can cope with a circular correlation, you can remove the copy. The time difference will increase as data_length increases.



Related Topics



Leave a reply



Submit