Why We Have Both Jagged Array and Multidimensional Array

Why we have both jagged array and multidimensional array?

  1. A jagged array is an array-of-arrays, so an int[][] is an array of int[], each of which can be of different lengths and occupy their own block in memory. A multidimensional array (int[,]) is a single block of memory (essentially a matrix).

  2. You can't create a MyClass[10][20] because each sub-array has to be initialized separately, as they are separate objects:

    MyClass[][] abc = new MyClass[10][];

    for (int i=0; i<abc.Length; i++) {
    abc[i] = new MyClass[20];
    }

    A MyClass[10,20] is ok, because it is initializing a single object as a matrix with 10 rows and 20 columns.

  3. A MyClass[][,][,] can be initialized like so (not compile tested though):

    MyClass[][,][,] abc = new MyClass[10][,][,];

    for (int i=0; i<abc.Length; i++) {
    abc[i] = new MyClass[20,30][,];

    for (int j=0; j<abc[i].GetLength(0); j++) {
    for (int k=0; k<abc[i].GetLength(1); k++) {
    abc[i][j,k] = new MyClass[40,50];
    }
    }
    }

Bear in mind, that the CLR is heavily optimized for single-dimension array access, so using a jagged array will likely be faster than a multidimensional array of the same size.

What are the differences between a multidimensional array and an array of arrays in C#?

Array of arrays (jagged arrays) are faster than multi-dimensional arrays and can be used more effectively. Multidimensional arrays have nicer syntax.

If you write some simple code using jagged and multidimensional arrays and then inspect the compiled assembly with an IL disassembler you will see that the storage and retrieval from jagged (or single dimensional) arrays are simple IL instructions while the same operations for multidimensional arrays are method invocations which are always slower.

Consider the following methods:

static void SetElementAt(int[][] array, int i, int j, int value)
{
array[i][j] = value;
}

static void SetElementAt(int[,] array, int i, int j, int value)
{
array[i, j] = value;
}

Their IL will be the following:

.method private hidebysig static void  SetElementAt(int32[][] 'array',
int32 i,
int32 j,
int32 'value') cil managed
{
// Code size 7 (0x7)
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldarg.1
IL_0002: ldelem.ref
IL_0003: ldarg.2
IL_0004: ldarg.3
IL_0005: stelem.i4
IL_0006: ret
} // end of method Program::SetElementAt

.method private hidebysig static void SetElementAt(int32[0...,0...] 'array',
int32 i,
int32 j,
int32 'value') cil managed
{
// Code size 10 (0xa)
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldarg.1
IL_0002: ldarg.2
IL_0003: ldarg.3
IL_0004: call instance void int32[0...,0...]::Set(int32,
int32,
int32)
IL_0009: ret
} // end of method Program::SetElementAt

When using jagged arrays you can easily perform such operations as row swap and row resize. Maybe in some cases usage of multidimensional arrays will be more safe, but even Microsoft FxCop tells that jagged arrays should be used instead of multidimensional when you use it to analyse your projects.

Multi-dimensional or jagged array when dealing with matrix in C#?

It's not about the lines of code that is the problem, but the efficiency of the code itself.

If you had a sparse matrix (matrix with almost all zeros), you want to use a jagged matrix because iterating through the two-dimensional matrix searching for non-zero elements would waste time.

However, if you had a matrix and you wanted to find its determinant, it would be simpler to use the method of co-factors on it. If you're not familiar with the method, it involves breaking up the matrix into smaller matrices, eventually to the 2x2 version where you can simply perform a*d-b*c. This isn't possible with jagged matrices.

MIPS Assembly for two-dimensional array and jagged array

These aren't C code, this looks like C#.

int[][] A; // C# declaration for jagged multidimensional array
int[,] B; // C# declaration for rectangular multidimensional array

First, let's look at the simpler and more regular rectangular array.

C code for rectangular multidimensional array would be:

int B[5][10];  // C declaration for rectangular array

This fragment request a total of 50 integer elements, structed as 5 rows, where the row size is 10 elements.

Because the logical data structure is two dimensional yet physical memory is only one dimensional, we must map the 2D structure onto 1D.  There are two ways of doing this, one is row-major and the other column-major.  If we consider the index having the 5 to be row and 10 the column, then we can say that the C language uses row-major ordering.

(That there are two choices in mapping the array is similar to the idea that there are two reasonable orderings for individual bytes in multi-byte (i.e. word) items: big endian and little endian.)

The array is immediate in some sense: there are no pointers or references stored in this structure, only the 50 integer elements.

Access to an element is by address computation:

In B[r][c] we compute addr = B + (r * 10 + c) * 4, where the 10 represents the size of the row, and the 4 is the size of one integer element.  The multiplications are what are called scaling.  Because each row refers to 10 elements, then row 1 starts 10 positions further down than row 0.  Because each element takes 4 bytes, and each of the 4 bytes takes a memory address, then column 1 starts 4 bytes further down than column 0.  In summary, element 0,0 is at B + (0*10+0)*4, which is the same address value as B, and element 1,2 is at B + (1*10+2)*4, which is B + 48 — the 3rd element (+8) of the 2nd row (+10*4).

That addr can be dereferenced to load the r,c element from the array, or to store an updated value at r,c.

(If we were to chose column-major ordering, then the formula would be addr = B + (c * 5 + r) * 4.  Row major ordering is better if the column index varies faster, i.e. if we use for ( r = .. ) for ( c = .. ) { }, while for column-major we would prefer the inverse, varying the row index faster — the index for an inner loop varies faster than the index for an outer loop.  The difference between them has to do with cache utilization (without a cache it wouldn't matter much).)


Jagged arrays are arrays of arrays.  Each array involved is one-dimensional, so while there is still a rectangular shape to each individual array, the rectangular shape does not apply to the two dimensional structure of the overall "2D" array.

In C a jagged array is declared as an array of pointers, where the pointers themselves are references to individual arrays of zero or more integer elements (or are null).

int *A[10];

This declares an array (the [] take precedence over the *, so this is taken as int *(A[10]), the type of each array element is pointer to int.

The total storage size of this array is 40 bytes, 10 * size of pointer, 4, on MIPS 32-bit system.  Each element of the array is a pointer type.  There are no integers reserved by this array declaration!  There are positions for 10 pointers to integer (arrays), and each of 10 for each position must be further specified, by an initializer or by allocation.  Because the elements of this array are pointers to integer, each such pointer can be null or refer to storage of a different number of integers.  If you wanted to know how many integers each one (row) had/has, you will have to store that information separately, because there is no built in mechanism to store that with this declaration.

In summary, jagged arrays in C have the following issues:

  • Only storage for the pointer array (array of pointer) is allocated by this declaration.
  • The storage (the pointer elements) will be initialized to zero, for a global declaration, and also when using calloc, but using malloc or as a local variable declaration, those pointers will be uninitialized (i.e. garbage).
  • Even when we populate the individual pointers with storage, there is no inherent recording of the length of the storage (the count of how many integer) are referred to by each element position.  The array can be jagged, but you would have to remember that shape in some other way (i.e. in other data structure, like a simple array of integer of the same element count as the pointer array.)

Accessing an element of a jagged array means obtaining the pointer element from the first array, and using that to compute the address of an integer element.  Both of these operations involve one-dimensional arrays, so in at least that sense, is conceptually simpler: there is no issue of row vs. column major ordering.

An element's address, A[r][c] is computed as follows:

addr = (*(A + r * sizeof(pointer))) + c * sizeof(int)

where the first * is indirection that fetches the pointer at index r.  This addr can be used the same as in the description after addr for rectangular array above.


The MIPS implementation of either of these is then fairly straightforward.  For global storage, reserve the appropriate amount.  In code, compute element addresses and use lw or sw once you've computed an addr.

Multi Dimensional Array vs Jagged Array Length method

Jagged arrays ([][]) arrays where the element type is an array too [], so basically is just a one-dimension array.

Multidimensional arrays ([,]) are arrays that have more than one dimension, but all elements are in the same array.

Length property from MSDN:

Gets the total number of elements in all the dimensions of the Array.

According to this Length in jagged array this property returns the number of arrays it contains.

In multidimensional arrays this property returns all elements in it, that is the multiplication of all dimension sizes.

GetLength(int) from MSDN:

Gets an integer that represents the number of elements in the specified dimension of the Array.

Using this method for jagged arrays, the only available dimension is 0, that returns the same value as Length property.

In multidimensional arrays you can pass the zero-based index of the desired dimension. For example in [,,] available dimensions are 0, 1 and 2.

Memory allocation of Jagged arrays in C# vs 2d arrays memory allocation in C++

Yes you are right. A Jagged array in C# is basically an single dimension array in memory where each element is just an reference. When you initialize the array in the for loop, it creates a new array somewhere else in memory and the reference points to it.

In case of multidimensional arrays ([,]), the situation is very different. When you initialize such array, a single block of memory is created and it's size is equal to the product of all dimensions of the array. Technically having a multidimensional array of size [M,N] is represented in memory the same way as a simple array of size [M * N]. The access by multiple indices is basically just a syntactic sugar where the framework calculates the actual position of the element by multiplication of the dimensions.

Cloning jagged array is slower than multidimensional array

Do this:

double[][] copy = new double[matSize][];
for (int j = 0; j < matSize; j++)
{
copy[j] = jaggedMat[j].Clone() as double[];
}

It will make the jagged array only 2x slower than the multidimensional array (on my machine 2450ms vs 1360ms). Simply put, creating 100 objects (the lines of the jagged array) has a cost. The GC will hate you a little :-) All these objects have to be allocated and then freed if the GC runs. This makes the jagged array cloning slower. I'll say that it is interesting that the cost is so slow (it seems that creating an array has the same cost as filling it, given that the multidimensional cloning is pure filling, while the jagged array cloning is half copying and half creating the array)



Related Topics



Leave a reply



Submit