Function Which Returns the Least-Squares Solution to a Linear Matrix Equation

Function which returns the least-squares solution to a linear matrix equation

The Accelerate framework included the LAPACK linear algebra package,
which has a DGELS function to solve under- or overdetermined linear systems. From the documentation:

DGELS solves overdetermined or underdetermined real linear systems
involving an M-by-N matrix A, or its transpose, using a QR or LQ
factorization of A. It is assumed that A has full rank.

Here is an example how that function can be used from Swift.
It is essentially a translation of this C sample code.

func solveLeastSquare(A A: [[Double]], B: [Double]) -> [Double]? {
    precondition(A.count == B.count, "Non-matching dimensions")

    var mode = Int8(bitPattern: UInt8(ascii: "N")) // "Normal" mode
    var nrows = CInt(A.count)
    var ncols = CInt(A[0].count)
    var nrhs = CInt(1)
    var ldb = max(nrows, ncols)

    // Flattened columns of matrix A
    var localA = (0 ..< nrows * ncols).map {
        A[Int($0 % nrows)][Int($0 / nrows)]
    }

    // Vector B, expanded by zeros if ncols > nrows
    var localB = B
    if ldb > nrows {
        localB.appendContentsOf([Double](count: ldb - nrows, repeatedValue: 0.0))
    }

    var wkopt = 0.0
    var lwork: CInt = -1
    var info: CInt = 0

    // First call to determine optimal workspace size
    dgels_(&mode, &nrows, &ncols, &nrhs, &localA, &nrows, &localB, &ldb, &wkopt, &lwork, &info)
    lwork = Int32(wkopt)

    // Allocate workspace and do actual calculation
    var work = [Double](count: Int(lwork), repeatedValue: 0.0)
    dgels_(&mode, &nrows, &ncols, &nrhs, &localA, &nrows, &localB, &ldb, &work, &lwork, &info)

    if info != 0 {
        print("A does not have full rank; the least squares solution could not be computed.")
        return nil
    }
    return Array(localB.prefix(Int(ncols)))
}

Some notes:

dgels_() modifies the passed matrix and vector data, and expects
the matrix as "flat" array containing the columns of A.
Also the right-hand side is expected as an array with length max(M, N).
For this reason, the input data is copied to local variables first.
All arguments must be passed by reference to dgels_(), that's why
they are all stored in vars.
A C integer is a 32-bit integer, which makes some conversions between
Int and CInt necessary.

Example 1: Overdetermined system, from http://www.seas.ucla.edu/~vandenbe/103/lectures/ls.pdf.

let A = [[ 2.0, 0.0 ],
         [ -1.0, 1.0 ],
         [ 0.0, 2.0 ]]
let B = [ 1.0, 0.0, -1.0 ]
if let x = solveLeastSquare(A: A, B: B) {
    print(x) // [0.33333333333333326, -0.33333333333333343]
}

Example 2: Underdetermined system, minimum norm
solution to x_1 + x_2 + x_3 = 1.0.

let A = [[ 1.0, 1.0, 1.0 ]]
let B = [ 1.0 ]
if let x = solveLeastSquare(A: A, B: B) {
    print(x) // [0.33333333333333337, 0.33333333333333337, 0.33333333333333337]
}

Update for Swift 3 and Swift 4:

func solveLeastSquare(A: [[Double]], B: [Double]) -> [Double]? {
    precondition(A.count == B.count, "Non-matching dimensions")

    var mode = Int8(bitPattern: UInt8(ascii: "N")) // "Normal" mode
    var nrows = CInt(A.count)
    var ncols = CInt(A[0].count)
    var nrhs = CInt(1)
    var ldb = max(nrows, ncols)

    // Flattened columns of matrix A
    var localA = (0 ..< nrows * ncols).map { (i) -> Double in
        A[Int(i % nrows)][Int(i / nrows)]
    }

    // Vector B, expanded by zeros if ncols > nrows
    var localB = B
    if ldb > nrows {
        localB.append(contentsOf: [Double](repeating: 0.0, count: Int(ldb - nrows)))
    }

    var wkopt = 0.0
    var lwork: CInt = -1
    var info: CInt = 0

    // First call to determine optimal workspace size
    var nrows_copy = nrows // Workaround for SE-0176
    dgels_(&mode, &nrows, &ncols, &nrhs, &localA, &nrows_copy, &localB, &ldb, &wkopt, &lwork, &info)
    lwork = Int32(wkopt)

    // Allocate workspace and do actual calculation
    var work = [Double](repeating: 0.0, count: Int(lwork))
    dgels_(&mode, &nrows, &ncols, &nrhs, &localA, &nrows_copy, &localB, &ldb, &work, &lwork, &info)

    if info != 0 {
        print("A does not have full rank; the least squares solution could not be computed.")
        return nil
    }
    return Array(localB.prefix(Int(ncols)))
}

cv2.solve fail to return least-squares solution

Add an explicit solution method argument to cv2.solve, e.g.:

x = cv2.solve(a, b, flags=cv2.DECOMP_QR)

By default, opencv uses LU decomposition to solve linear systems (that's the default value of flags argument).
But LU decomposition is only used for square matrices, and is not applicable to your case (where the linear system is overdetermined). DECOMP_SVD, DECOMP_QR, and/or DECOMP_NORMAL are better suited for your case.

Least Squares: Python

There are three points here:

One is that it is generally better (faster, more accurate) to solve linear equations rather than to compute inverses.

The second is that it's always a good idea to use what you know about a system of equations (e.g. that the coefficient matrix is positive definite) when computing a solution, in this case you should use numpy.linalg.lstsq

The third is more specifically about polynomials. When using monomials as a basis, you can end up with a very poorly conditioned coefficient matrix, and this will mean that numerical errors tend to be large. This is because, for example, the vectors x->pow(x,11) and x->pow(x,12) are very nearly parallel. You would get a more accurate fit, and be able to use higher degrees, if you were to use a basis of orthogonal polynomials, for example https://en.wikipedia.org/wiki/Chebyshev_polynomials or https://en.wikipedia.org/wiki/Legendre_polynomials

Function Which Returns the Least-Squares Solution to a Linear Matrix Equation