Structure of Arrays VS Array of Structures

Structure of Arrays vs Array of Structures

Choice of AoS versus SoA for optimum performance usually depends on access pattern. This is not just limited to CUDA however - similar considerations apply for any architecture where performance can be significantly affected by memory access pattern, e.g. where you have caches or where performance is better with contiguous memory access (e.g. coalesced memory accesses in CUDA).

E.g. for RGB pixels versus separate RGB planes:

struct {
uint8_t r, g, b;
} AoS[N];

struct {
uint8_t r[N];
uint8_t g[N];
uint8_t b[N];
} SoA;

If you are going to be accessing the R/G/B components of each pixel concurrently then AoS usually makes sense, since the successive reads of R, G, B components will be contiguous and usually contained within the same cache line. For CUDA this also means memory read/write coalescing.

However if you are going to process color planes separately then SoA might be preferred, e.g. if you want to scale all R values by some scale factor, then SoA means that all R components will be contiguous.

One further consideration is padding/alignment. For the RGB example above each element in an AoS layout is aligned to a multiple of 3 bytes, which may not be convenient for CUDA, SIMD, et al - in some cases perhaps even requiring padding within the struct to make alignment more convenient (e.g. add a dummy uint8_t element to ensure 4 byte alignment). In the SoA case however the planes are byte aligned which can be more convenient for certain algorithms/architectures.

For most image processing type applications the AoS scenario is much more common, but for other applications, or for specific image processing tasks this may not always be the case. When there is no obvious choice I would recommend AoS as the default choice.

See also this answer for more general discussion of AoS v SoA.

Array of Structs are always faster than Structs of arrays?

Much depends of how useful all fields are. If you have a data structure where using one fields means you are likely to use all of them, then an array of struct is more efficient as it keeps together all the things you are likely to need.

Say you have time series data where you only need a small selection of the possible fields you have. You might have all sorts of data about an event or point in time, but you only need say 3-5 of them. In this case a structure of arrays is more efficient because a) you don't need to cache the fields you don't use b) you often access values in order i.e. caching a field, its next value and its next is useful.

For this reason, time-series information is often stored as a collection of columns.

Structure of arrays and array of structures - performance difference

Structure of arrays is not cache friendly in this case.

You use both u and v together, but in case of 2 different arrays for them they will not be loaded simultaneously into one cache line and cache misses will cost huge performance penalty.

_mm_prefetch can be used to make AoS representation even faster.

Array of structures vs structure of arrays in JSON

Not as part of JSON itself, no. What I've done on projects is a generic system where the JSON would look like this:

{
"__keys__": ["name", "second_name"],
"values": [
["Amrit", "Valentine"],
["Beatriz", "Carty"]
]
}

...where once I've parsed the JSON, I throw a utility function at it to consume that and turn it into an array of objects. Along these lines:

const json = `{

"__keys__": ["name", "second_name"],

"values": [

["Amrit", "Valentine"],

["Beatriz", "Carty"]

]

}`;

const parsed = JSON.parse(json);

const expanded = expand(parsed);

console.log(expanded);

function expand(data) {

const keys = data.__keys__;

return data.values.map(entry => {

const obj = {};

keys.forEach((key, index) => {

obj[key] = entry[index];

});

return obj;

});

}

Array of structs vs. Array of pointers to structs

Arrays of structures and arrays of pointers to structures are different ways to organize memory.

Arrays of structures have these strong points:

  • it is easy to allocate such an array dynamically in one step with struct s *p = calloc(n, sizeof(*p));.
  • if the array is part of an enclosing structure, no separate allocation code is needed at all. The same is true for local and global arrays.
  • the array is a contiguous block of memory, a pointer to the next and previous elements can be easily computed as struct s *prev = p - 1, *next = p + 1;
  • accessing array element members may be faster as they are close in memory, increasing cache efficiency.

They also have disadvantages:

  • the size of the array must be passed explicitly as there is no way to tell from the pointer to the array how many elements it has.
  • the expression p[i].member generates a multiplication, which may be costly on some architectures if the size of the structure is not a power of 2.
  • changing the order of elements is costly as it may involve copying large amounts of memory.

Using an array of pointers has these advantages:

  • the size of the array could be determined by allocating an extra element and setting it to NULL. This convention is used for the argv[] array of command line arguments provided to the main() function.
  • if the above convention is not used, and the number of elements is passed separately, NULL pointer values could be used to specify missing elements.
  • it is easy to change the order of elements by just moving the pointers.
  • multiple elements could be made to point to the same structure.
  • reallocating the array is easier as only the array of pointers needs reallocation, optionally keeping separate length and size counts to minimize reallocations. Incremental allocation is easy too.
  • the expression p[i].member generates a simple shift and an extra memory access, but may be more efficient than the equivalent expression for arrays of structures.

and the following drawbacks:

  • allocating and freeing this indirect array is more cumbersome. An extra loop is required to allocate and/or initialize the structures pointed to by the array.
  • access to structure elements involve an extra memory indirection. Compilers can generate efficient code for this if multiple members are accessed in the same function, but not always.
  • pointers to adjacent structures cannot be derived from a pointer to a given element.

EDIT: As hinted by David Bowling, one can combine some of the advantages of both approaches by allocating an array of structures on one hand and a separate array of pointers pointing to the elements of the first array. This is a handy way to implement a sort order, or even multiple concomitant sort orders with separate arrays of pointers, like database indexes.

Array of Structures (AoS) vs Structure of Arrays (SoA) on random reads for vectorization

You can parallelize your program in two ways: horizontally and vertically. I think you are mixing those two approaches.

Horizontal parallelization treats each lane in your SIMD unit as a separate "thread" working on a different data. Vertical parallelization takes whole SIMD unit working on the same data object, trying to benefit from its inner multi-dimensionality.

To give a concrete example: consider you have 2 arrays X and Y of 3D vectors that you want to add.

  • Horizontal approach: each lane of the SIMD unit would do:

    for(idx = 0; idx<size; idx+=SIMD_size) {
    ... = X[idx+laneid].x + Y[idx+laneid].x;
    ... = X[idx+laneid].y + Y[idx+laneid].y;
    ... = X[idx+laneid].z + Y[idx+laneid].z;
    }
  • Vertical approach: each lane of the SIMD unit takes a different component of the same vector:

    for(idx = 0; idx<size; idx+=1) { 
    ... = X[idx].coord(laneid) + Y[idx].coord(laneid);
    }

Vertical approach is easier to implement. In fact, compilers are trying to auto-vectorize already. The problem is that as the width of the SIMD unit is growing, the implementation cannot benefit from it. If you switch from 4-wide to 16-wide SIMD, you are still adding up only 3 numbers in parallel of your 3D vector.

Horizontal approach is harder. You usually have to handle diverging branches, function calls, etc... and - you want to reorganize your data into Structure-of-Arrays - so that the corresponding fields of your different data object are next to each other in memory.


Now, back to your question: SoA makes sense only if you do horizontal parallelization. When each lane is access the same field of different object, SoA allows to replace an expensive gather instruction with a better aligned single memory fetch.
If you try to do vertical, as in your example in the question - no one would even consider doing SoA in the first place - accessing multiple fields of same object would cause "gather".

However, with random access, SoA may not be the best option even if you do horizontal parallelization. First, you get no benefit of having SoA because you still need to do the expensive gather. However, as your fields of the same object are spread across the memory, each load is going to hit a different cache lane. Not only it increases the usage of memory bandwidth, it may also cause cache thrashing.
This is why SoA are not that efficient with random access.

A better solution is to have a hybrid approach: You pack your data in an Array-of-Structures-of-Arrays-of-SIMD-with-size. But that is another story...

Why does javascript process an array of structures faster than a structure of arrays?

The structure of the tests being used for benchmarking seem to have overlapped each other causing either undefined or undesired behavior. A cleaner test (https://www.measurethat.net/Benchmarks/Show/474/0/soa-vs-aos) shows little difference between the two, and has SOA executing slightly (30%) faster.

However, none of this matters to the bottom line when it comes to performance. This is an effort in micro-optimization. What you are essentially comparing is O(n) to O(n) with nuance involved. The small percent difference will not have an effect overall as O(n) is considered to be an acceptable time complexity.



Related Topics



Leave a reply



Submit