Correlation of Two Arrays in C#

Correlation of two arrays in C#

You can have the values in separate lists at the same index and use a simple Zip.

var fitResult = new FitResult();
var values1 = new List<int>();
var values2 = new List<int>();

var correls = values1.Zip(values2, (v1, v2) =>
fitResult.CorrelationCoefficient(v1, v2));

A second way is to write your own custom implementation (mine isn't optimized for speed):

public double ComputeCoeff(double[] values1, double[] values2)
{
if(values1.Length != values2.Length)
throw new ArgumentException("values must be the same length");

var avg1 = values1.Average();
var avg2 = values2.Average();

var sum1 = values1.Zip(values2, (x1, y1) => (x1 - avg1) * (y1 - avg2)).Sum();

var sumSqr1 = values1.Sum(x => Math.Pow((x - avg1), 2.0));
var sumSqr2 = values2.Sum(y => Math.Pow((y - avg2), 2.0));

var result = sum1 / Math.Sqrt(sumSqr1 * sumSqr2);

return result;
}

Usage:

var values1 = new List<double> { 3, 2, 4, 5 ,6 };
var values2 = new List<double> { 9, 7, 12 ,15, 17 };

var result = ComputeCoeff(values1.ToArray(), values2.ToArray());
// 0.997054485501581

Debug.Assert(result.ToString("F6") == "0.997054");

Another way is to use the Excel function directly:

var values1 = new List<double> { 3, 2, 4, 5 ,6 };
var values2 = new List<double> { 9, 7, 12 ,15, 17 };

// Make sure to add a reference to Microsoft.Office.Interop.Excel.dll
// and use the namespace

var application = new Application();

var worksheetFunction = application.WorksheetFunction;

var result = worksheetFunction.Correl(values1.ToArray(), values2.ToArray());

Console.Write(result); // 0.997054485501581

How to compute Pearson Correlation between 2 given vectors?

That's adaptation of my answer on Java version

How to find correlation between two integer arrays in java

for C#. First, the Pearson Correlation is

http://en.wikipedia.org/wiki/Correlation_and_dependence

Providing that both vectors (let them be IEnumerable<Double>) are of same length

  private static double Correlation(IEnumerable<Double> xs, IEnumerable<Double> ys) {
// sums of x, y, x squared etc.
double sx = 0.0;
double sy = 0.0;
double sxx = 0.0;
double syy = 0.0;
double sxy = 0.0;

int n = 0;

using (var enX = xs.GetEnumerator()) {
using (var enY = ys.GetEnumerator()) {
while (enX.MoveNext() && enY.MoveNext()) {
double x = enX.Current;
double y = enY.Current;

n += 1;
sx += x;
sy += y;
sxx += x * x;
syy += y * y;
sxy += x * y;
}
}
}

// covariation
double cov = sxy / n - sx * sy / n / n;
// standard error of x
double sigmaX = Math.Sqrt(sxx / n - sx * sx / n / n);
// standard error of y
double sigmaY = Math.Sqrt(syy / n - sy * sy / n / n);

// correlation is just a normalized covariation
return cov / sigmaX / sigmaY;
}

Test:

  // -0.539354840012899
Double result = Correlation(
new Double[] { 0.3, 0, 1.7, 2.2 },
new Double[] { 0, 3.3, 1.2, 0 });

Correlation of two arrays in C#

You can have the values in separate lists at the same index and use a simple Zip.

var fitResult = new FitResult();
var values1 = new List<int>();
var values2 = new List<int>();

var correls = values1.Zip(values2, (v1, v2) =>
fitResult.CorrelationCoefficient(v1, v2));

A second way is to write your own custom implementation (mine isn't optimized for speed):

public double ComputeCoeff(double[] values1, double[] values2)
{
if(values1.Length != values2.Length)
throw new ArgumentException("values must be the same length");

var avg1 = values1.Average();
var avg2 = values2.Average();

var sum1 = values1.Zip(values2, (x1, y1) => (x1 - avg1) * (y1 - avg2)).Sum();

var sumSqr1 = values1.Sum(x => Math.Pow((x - avg1), 2.0));
var sumSqr2 = values2.Sum(y => Math.Pow((y - avg2), 2.0));

var result = sum1 / Math.Sqrt(sumSqr1 * sumSqr2);

return result;
}

Usage:

var values1 = new List<double> { 3, 2, 4, 5 ,6 };
var values2 = new List<double> { 9, 7, 12 ,15, 17 };

var result = ComputeCoeff(values1.ToArray(), values2.ToArray());
// 0.997054485501581

Debug.Assert(result.ToString("F6") == "0.997054");

Another way is to use the Excel function directly:

var values1 = new List<double> { 3, 2, 4, 5 ,6 };
var values2 = new List<double> { 9, 7, 12 ,15, 17 };

// Make sure to add a reference to Microsoft.Office.Interop.Excel.dll
// and use the namespace

var application = new Application();

var worksheetFunction = application.WorksheetFunction;

var result = worksheetFunction.Correl(values1.ToArray(), values2.ToArray());

Console.Write(result); // 0.997054485501581

Correlate two signals in C# with different length

I think that you need to reconsider your data-structure. It's not clear if the signals;

  1. cover different time-spans;
  2. have different time-steps between readings;
  3. or both.

In the first case, you can simply crop the series so that they have the same length and cover the same time-span. However, a list of numbers can only contain the values, not the times they are for. If you do not store this information elsewhere in the program then you will not be able to do the clipping.

In the second case, you need to choose an appropriate series or times and conform both of your signals to that. This will likely take the form of a series of lerp operations to fill in the target points.

// x - the first value
// y - the second value
// t - the distance from the first to the second value, normalized to 0..1
public static float Lerp(float x, float y, float t) {

return x + t * (y - x);
}

As you can see, performing the Lerp requires t, which can be computed from the time-values of the two known points.

A better data-structure might be a mapping of times to values:

var signal = new Dictionary<DateTime, double>();

This will allow you to keep track of when a reading happens more easily.

There is already a question about performing the actual correlation on StackOverflow.

As an aside, this is something which R makes considerably easier - take a look at the zoo package for inspiration.

C# Similarities of two arrays

You could use IEnumerable.Intersect.

var z = a.Intersect(b);

which will probably be more efficient than your current solution.

note you left out one important piece of information - whether the lists happen to be ordered or not. If they are then a couple of nested loops that pass over each input array exactly once each may be faster - and a little more fun to write.

Edit
In response to your comment on ordering:

first stab at looping - it will need a little tweaking on your behalf but works for your initial data.

    int j = 0;
foreach (var i in a)
{
int x = b[j];
while (x < i)
{
if (x == i)
{
z.Add(b[j]);
}
j++;
x = b[j];
}
}

this is where you need to add some unit tests ;)

Edit
final point - it may well be that Linq can use SortedList to perform this intersection very efficiently, if performance is a concern it is worth testing the various solutions. Dont forget to take the sorting into account if you load your data in an un-ordered manner.

One Final Edit because there has been some to and fro on this and people may be using the above without properly debugging it I am posting a later version here:

        int j = 0;
int b1 = b[j];
foreach (var a1 in a)
{
while (b1 <= a1)
{
if (b1 == a1)
z1.Add(b[j]);
j++;
if (j >= b.Count)
break;
b1 = b[j];
}
}

Cross-correlation Code in C#

The Meta.Numerics library supports the computation of correla

How to make MATLAB's corrcoef (correlation coefficient) in C#?

Why don't you use Math.NET Numerics

Math.NET Numerics aims to provide methods and algorithms for numerical
computations in science, engineering and every day use. Covered topics
include special functions, linear algebra, probability models, random
numbers, interpolation, integration, regression, optimization problems
and more.

the class You are looking for is coded like:

namespace MathNet.Numerics.Statistics
{
using System;
using System.Collections.Generic;
using Properties;

/// <summary>
/// A class with correlation measures between two datasets.
/// </summary>
public static class Correlation
{
/// <summary>
/// Computes the Pearson product-moment correlation coefficient.
/// </summary>
/// <param name="dataA">Sample data A.</param>
/// <param name="dataB">Sample data B.</param>
/// <returns>The Pearson product-moment correlation coefficient.</returns>
public static double Pearson(IEnumerable<double> dataA, IEnumerable<double> dataB)
{
int n = 0;
double r = 0.0;
double meanA = dataA.Mean();
double meanB = dataB.Mean();
double sdevA = dataA.StandardDeviation();
double sdevB = dataB.StandardDeviation();

IEnumerator<double> ieA = dataA.GetEnumerator();
IEnumerator<double> ieB = dataB.GetEnumerator();

while (ieA.MoveNext())
{
if (ieB.MoveNext() == false)
{
throw new ArgumentOutOfRangeException("Datasets dataA and dataB need to have the same length.");
}

n++;
r += (ieA.Current - meanA) * (ieB.Current - meanB) / (sdevA * sdevB);
}
if (ieB.MoveNext() == true)
{
throw new ArgumentOutOfRangeException("Datasets dataA and dataB need to have the same length.");
}

return r / (n - 1);
}
}
}

Math.NET Numerics works very well with C# and related .Net languages. When using Visual Studio or another IDE with built-in NuGet support, you can get started quickly by adding a reference to the MathNet.Numerics NuGet package. Alternatively you can grab that package with the command line tool with nuget.exe install MathNet.Numerics -Pre or simply download the Zip package.

I have used this library intensively with good results. So you would use it like:

using MathNet.Numerics.Statistics;

correlation = Correlation.Pearson(arrayOfDoubles1, arrayOfDoubles2);

Correlation of two arrays in C#

You can have the values in separate lists at the same index and use a simple Zip.

var fitResult = new FitResult();
var values1 = new List<int>();
var values2 = new List<int>();

var correls = values1.Zip(values2, (v1, v2) =>
fitResult.CorrelationCoefficient(v1, v2));

A second way is to write your own custom implementation (mine isn't optimized for speed):

public double ComputeCoeff(double[] values1, double[] values2)
{
if(values1.Length != values2.Length)
throw new ArgumentException("values must be the same length");

var avg1 = values1.Average();
var avg2 = values2.Average();

var sum1 = values1.Zip(values2, (x1, y1) => (x1 - avg1) * (y1 - avg2)).Sum();

var sumSqr1 = values1.Sum(x => Math.Pow((x - avg1), 2.0));
var sumSqr2 = values2.Sum(y => Math.Pow((y - avg2), 2.0));

var result = sum1 / Math.Sqrt(sumSqr1 * sumSqr2);

return result;
}

Usage:

var values1 = new List<double> { 3, 2, 4, 5 ,6 };
var values2 = new List<double> { 9, 7, 12 ,15, 17 };

var result = ComputeCoeff(values1.ToArray(), values2.ToArray());
// 0.997054485501581

Debug.Assert(result.ToString("F6") == "0.997054");

Another way is to use the Excel function directly:

var values1 = new List<double> { 3, 2, 4, 5 ,6 };
var values2 = new List<double> { 9, 7, 12 ,15, 17 };

// Make sure to add a reference to Microsoft.Office.Interop.Excel.dll
// and use the namespace

var application = new Application();

var worksheetFunction = application.WorksheetFunction;

var result = worksheetFunction.Correl(values1.ToArray(), values2.ToArray());

Console.Write(result); // 0.997054485501581


Related Topics



Leave a reply



Submit