How to Transpose an Array More Swiftly

How to transpose an array more Swiftly?

Here's an improvement on Shadow Of's answer:

extension Collection where Self.Iterator.Element: RandomAccessCollection {
    // PRECONDITION: `self` must be rectangular, i.e. every row has equal size.
    func transposed() -> [[Self.Iterator.Element.Iterator.Element]] {
        guard let firstRow = self.first else { return [] }
        return firstRow.indices.map { index in
            self.map{ $0[index] }
        }
    }
}

let matrix = [
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
]
matrix.transposed().forEach{ print($0) }

How to transpose a matrix of unequal array length in Swift 3

Thanks..! @Pragnesh Vitthani I just modified your answer.

var array = [[1,2,3],
             [4,5,6],
             [7,8,9,10],
             [11,12,13],
             [14,15,16]]

var transposedArray = [[Int]]

for i in stride(from: 0, to: array.count, by: 1)
{
    var subArray = [Int]()
    for j in stride(from: 0, to: array.count, by: 1)
    {
        if array[j].count < array.count
        {
            array[j].append(0)
        }
        subArray.append(array[j][i])
    }
    transposedArray.append(subArray )
}
print(transposedArray)

How to transpose an array of strings

What you are trying to do is called a transposition. Turning an array that looks like:

[[1, 2, 3], [4, 5, 6]]

into an array that looks like:

[[1, 4], [2, 5], [3, 6]]

To do this, let's define a generic function for transposition and apply it to your problem

// Import the text file from the bundle

guard
    let inputURL = NSBundle.mainBundle().URLForResource("input", withExtension: "txt"),
    let input = try? String(contentsOfURL: inputURL)
    else { fatalError("Unable to get data") }

// Convert the input string into [[String]]
let strings = input.componentsSeparatedByString("\n").map { (string) -> [String] in
    string.componentsSeparatedByString(":")
}

// Define a generic transpose function.
// This is the key to the solution.

public func transpose<T>(input: [[T]]) -> [[T]] {
    if input.isEmpty { return [[T]]() }
    let count = input[0].count
    var out = [[T]](count: count, repeatedValue: [T]())
    for outer in input {
        for (index, inner) in outer.enumerate() {
            out[index].append(inner)
        }
    }

    return out
}

// Transpose the strings
let results = transpose(strings)

You can see the results of the transposition with

for result in results {
    print("\(result)")
}

Which generates (for your example)

["AYGA", "AYLA", "AYMD"]
["GKA", "LAE", "MAG"]
["GOROKA", "", "MADANG"]
["GOROKA", "LAE", "MADANG"]
["PAPUA NEW GUINEA", "PAPUA NEW GUINEA", "PAPUA NEW GUINEA"]
["06", "00", "05"]
["04", "00", "12"]
["54", "00", "25"]
["S", "U", "S"]
["145", "00", "145"]
["23", "00", "47"]
["30", "00", "19"]
["E", "U", "E"]
["5282", "0000", "0020"]

This has the advantage of not depending on the number of arrays that you have, and the number of subarrays is taken from the count of the first array.

You can download an example playground for this, which has the input as a file in the playground's resources.

What is the fastest way to transpose a matrix in C++?

This is a good question. There are many reason you would want to actually transpose the matrix in memory rather than just swap coordinates, e.g. in matrix multiplication and Gaussian smearing.

First let me list one of the functions I use for the transpose (EDIT: please see the end of my answer where I found a much faster solution)

void transpose(float *src, float *dst, const int N, const int M) {
    #pragma omp parallel for
    for(int n = 0; n<N*M; n++) {
        int i = n/N;
        int j = n%N;
        dst[n] = src[M*j + i];
    }
}

Now let's see why the transpose is useful. Consider matrix multiplication C = A*B. We could do it this way.

for(int i=0; i<N; i++) {
    for(int j=0; j<K; j++) {
        float tmp = 0;
        for(int l=0; l<M; l++) {
            tmp += A[M*i+l]*B[K*l+j];
        }
        C[K*i + j] = tmp;
    }
}

That way, however, is going to have a lot of cache misses. A much faster solution is to take the transpose of B first

transpose(B);
for(int i=0; i<N; i++) {
    for(int j=0; j<K; j++) {
        float tmp = 0;
        for(int l=0; l<M; l++) {
            tmp += A[M*i+l]*B[K*j+l];
        }
        C[K*i + j] = tmp;
    }
}
transpose(B);

Matrix multiplication is O(n^3) and the transpose is O(n^2), so taking the transpose should have a negligible effect on the computation time (for large n). In matrix multiplication loop tiling is even more effective than taking the transpose but that's much more complicated.

I wish I knew a faster way to do the transpose (Edit: I found a faster solution, see the end of my answer). When Haswell/AVX2 comes out in a few weeks it will have a gather function. I don't know if that will be helpful in this case but I could image gathering a column and writing out a row. Maybe it will make the transpose unnecessary.

For Gaussian smearing what you do is smear horizontally and then smear vertically. But smearing vertically has the cache problem so what you do is

Smear image horizontally
transpose output 
Smear output horizontally
transpose output

Here is a paper by Intel explaining that
http://software.intel.com/en-us/articles/iir-gaussian-blur-filter-implementation-using-intel-advanced-vector-extensions

Lastly, what I actually do in matrix multiplication (and in Gaussian smearing) is not take exactly the transpose but take the transpose in widths of a certain vector size (e.g. 4 or 8 for SSE/AVX). Here is the function I use

void reorder_matrix(const float* A, float* B, const int N, const int M, const int vec_size) {
    #pragma omp parallel for
    for(int n=0; n<M*N; n++) {
        int k = vec_size*(n/N/vec_size);
        int i = (n/vec_size)%N;
        int j = n%vec_size;
        B[n] = A[M*i + k + j];
    }
}

EDIT:

I tried several function to find the fastest transpose for large matrices. In the end the fastest result is to use loop blocking with block_size=16 (Edit: I found a faster solution using SSE and loop blocking - see below). This code works for any NxM matrix (i.e. the matrix does not have to be square).

inline void transpose_scalar_block(float *A, float *B, const int lda, const int ldb, const int block_size) {
    #pragma omp parallel for
    for(int i=0; i<block_size; i++) {
        for(int j=0; j<block_size; j++) {
            B[j*ldb + i] = A[i*lda +j];
        }
    }
}

inline void transpose_block(float *A, float *B, const int n, const int m, const int lda, const int ldb, const int block_size) {
    #pragma omp parallel for
    for(int i=0; i<n; i+=block_size) {
        for(int j=0; j<m; j+=block_size) {
            transpose_scalar_block(&A[i*lda +j], &B[j*ldb + i], lda, ldb, block_size);
        }
    }
}

The values lda and ldb are the width of the matrix. These need to be multiples of the block size. To find the values and allocate the memory for e.g. a 3000x1001 matrix I do something like this

#define ROUND_UP(x, s) (((x)+((s)-1)) & -(s))
const int n = 3000;
const int m = 1001;
int lda = ROUND_UP(m, 16);
int ldb = ROUND_UP(n, 16);

float *A = (float*)_mm_malloc(sizeof(float)*lda*ldb, 64);
float *B = (float*)_mm_malloc(sizeof(float)*lda*ldb, 64);

For 3000x1001 this returns ldb = 3008 and lda = 1008

Edit:

I found an even faster solution using SSE intrinsics:

inline void transpose4x4_SSE(float *A, float *B, const int lda, const int ldb) {
    __m128 row1 = _mm_load_ps(&A[0*lda]);
    __m128 row2 = _mm_load_ps(&A[1*lda]);
    __m128 row3 = _mm_load_ps(&A[2*lda]);
    __m128 row4 = _mm_load_ps(&A[3*lda]);
     _MM_TRANSPOSE4_PS(row1, row2, row3, row4);
     _mm_store_ps(&B[0*ldb], row1);
     _mm_store_ps(&B[1*ldb], row2);
     _mm_store_ps(&B[2*ldb], row3);
     _mm_store_ps(&B[3*ldb], row4);
}

inline void transpose_block_SSE4x4(float *A, float *B, const int n, const int m, const int lda, const int ldb ,const int block_size) {
    #pragma omp parallel for
    for(int i=0; i<n; i+=block_size) {
        for(int j=0; j<m; j+=block_size) {
            int max_i2 = i+block_size < n ? i + block_size : n;
            int max_j2 = j+block_size < m ? j + block_size : m;
            for(int i2=i; i2<max_i2; i2+=4) {
                for(int j2=j; j2<max_j2; j2+=4) {
                    transpose4x4_SSE(&A[i2*lda +j2], &B[j2*ldb + i2], lda, ldb);
                }
            }
        }
    }
}

How to transpose a row to a column array in R?

Make it a matrix as.matrix(v). Though curious as to why you need this format?

Transpose a large array without loading into memory

In your working but slow solution, you are reading the input file 5,000 times -- that won't be fast, but the only easy way to minimize the reads is to read it all in memory.

You could try some compromise where you read, say, fifty columns at a time into memory (~50MB), and write them into the file as rows. This way you would read the file "only" 100 times. Try a few different combinations to get the performance/memory compromise you're satisfied with.

You would do this over three nested loops:

Loop over the number of chunks (100 in this case)
Loop over the lines of your input file
Loop over the number of columns in your chunk (50 here)

In your inner-most loop you collect the column values as a row into a two-dimensional array, one row for each of the middle loop. In the outer-most loop, you clear the array before entering the inner loops, and print it out to the file as rows afterwards. For each iteration of loop 1. you will have written fifty rows of a million columns.

You can't really insert in the middle of a normal file without loading the whole target file into memory -- you need to shift the trailing bytes forward manually. Since you know your exact file size, however, you could pre-allocate it and always seek to the position when writing each byte; probably not very fast to do 5 billion seeks, either... If your ones and zeroes are fairly evenly distributed, you could initialize the file with all-zeroes, and then only write ones (or the other way around) to halve the number of seeks.

Edit: Added details how chunking could be implemented.

Get all values into dictionary and create a String with a specific format

Once you know what's the order of the keys (alpha ?), you can use this:

let dict: [String: [Int]] = ["a": [1,2], "b": [3, 4], "c": [5, 6]]
let keys = dict.keys.sorted() //Or do whatever you want here to get your target order
var matrix: [[String]] = []
keys.forEach {
    guard let arrayAsInt = dict[$0] else { return }
    let arrayAsString = arrayAsInt.map{ "\($0)" }
    matrix.append( [$0] + arrayAsString)
}
print("Matrix: \(matrix)")
let transposed = matrix.transposed()
print("Transposed Matrix: \(transposed)")
let output = transposed.map { $0.joined(separator: ",")}.joined(separator: "\n")
print(output)

The outputs:

$>Matrix: [["a", "1", "2"], ["b", "3", "4"], ["c", "5", "6"]]
$>Transposed Matrix: [["a", "b", "c"], ["1", "3", "5"], ["2", "4", "6"]]
$>a,b,c
1,3,5
2,4,6

Obvisouly the "\n" might be invisible and be an actual new line

a,b,c
1,3,5
2,4,6

Being

a,b,c\n1,3,5\n2,4,6

What's the idea behind that? Create a matrix and use the transpose (it's used in maths with matrix, it's one of the basic modification of a matrix).

First transform the [String: [Int]] into a [[String]], where each element would be key followed by its values. I transformed it there as String for simpler code later.

Why doing that? Because the matrix value is easy to get from your initial dict. the transposed value is harder (not impossible) to get from dict but easier from matrix, and the transposed is quickly transformed into your format.

So my thinking was the reverse:
Get a structure from your output, then how to get it, it's a transpose, so I need to get the initial input as it, etc.

With the help of a code for Transpose Matrix (that accept String elements).

extension Collection where Self.Iterator.Element: RandomAccessCollection {
    // PRECONDITION: `self` must be rectangular, i.e. every row has equal size.
    func transposed() -> [[Self.Iterator.Element.Iterator.Element]] {
        guard let firstRow = self.first else { return [] }
        return firstRow.indices.map { index in
            self.map{ $0[index] }
        }
    }
}

Any code (there a various) working ones, should the trick. I took it from here.

As pointed by @Leo Dabus, you can remove the Self.Iterator.Element
from the extension code (twice). I just wanted to it as such, not modifying the initial answer since it's not mind.

Get items with the same position from multidimensional array in Swift 5

You can use reduce(into:_:) function of Array like this:

let arrayDeCortes = [["a","b","c","d"],["e","f","g","h"],["i","j","k","l"]]

let arrays = arrayDeCortes.reduce(into: [[String]]()) { (result, array) in
    array.enumerated().forEach {
        if $0.offset < result.count {
            result[$0.offset].append($0.element)
        } else {
            result.append([$0.element])
        }
    }
}

print(arrays)
// [["a", "e", "i"], ["b", "f", "j"], ["c", "g", "k"], ["d", "h", "l"]]

Edit: As @Alexander mentioned in the comments, there is a simpler way of achieving this by using zip(_:_:) function twice.

The following will return an array of tuples:

var widths = ["a","b","c","d"]
var heights = ["e","f","g","h"]
var quantities = ["i","j","k","l"]

let result = zip(widths, zip(heights, quantities)).map { width, pair in
    (width, pair.0, pair.1)
}

print(result)
// [("a", "e", "i"), ("b", "f", "j"), ("c", "g", "k"), ("d", "h", "l")]

Fast matrix transposition in Python

Simple: Y=zip(*X)

>>> X=[[1,2,3], [4,5,6]]
>>> Y=zip(*X)
>>> Y
[(1, 4), (2, 5), (3, 6)]

EDIT: to answer questions in the comments about what does zip(*X) mean, here is an example from python manual:

>>> range(3, 6)             # normal call with separate arguments
[3, 4, 5]
>>> args = [3, 6]
>>> range(*args)            # call with arguments unpacked from a list
[3, 4, 5]

So, when X is [[1,2,3], [4,5,6]], zip(*X) is zip([1,2,3], [4,5,6])

How to Transpose an Array More Swiftly