May I Treat a 2D Array as a Contiguous 1D Array

May I treat a 2D array as a contiguous 1D array?

It's up to interpretation. While the contiguity requirements of arrays don't leave much to the imagination in terms of how to layout a multidimensional arrays (this has been pointed out before), notice that when you're doing p[1234] you're indexing the 1234th element of the zeroth row of only 80 columns. Some interpret the only valid indices to be 0..79 (&p[80] being a special case).

Information from the C FAQ which is the collected wisdom of Usenet on matters relevant to C. (I do not think C and C++ differ on that matter and that this is very much relevant.)

Is accessing a 2D array (T[N][M]) as a 1D array of size N*M guaranteed to work?

for example, padding being added to int[3]

There's no danger of that. Arrays are guaranteed to not have padding. The memory layout itself is guaranteed, while the pointer arithmetic is technically undefined, as per this quote mentioned in the comments: expr.add/4 Thus:

Is 2D array (T[N][M]) traversal by pointer (T*) guaranteed to work?

Technically not according to my understanding of the standard, unfortunately.


Here is a version of standard std::accumulate that iterates the innermost elements of a multi dimensional array of any dimension count as if it were a flat array:

template<class T, unsigned size, class BinOp = std::plus<>>
auto arr_accumulate(
T (&arr)[size], std::remove_all_extents_t<T> init = {}, BinOp op = {})
{
if constexpr (std::is_array_v<T>) {
return std::accumulate(std::begin(arr), std::end(arr), init,
[&](const auto& l, const auto& r) {
return arr_accumulate(r, l, op);
}
);
} else {
return std::accumulate(std::begin(arr), std::end(arr), init, op);
}
}

// usage
int sum = arr_accumulate(two_d_array, 0);

I haven't benchmarked this to see if there's any overhead. With compile time constant input such as in your example, the sum was calculated at compile time.

Another implementation; only for trivial types:

template<class T, unsigned size>
constexpr auto flat_size (T (&arr)[size]) {
using E = std::remove_all_extents_t<T>;
return sizeof arr / sizeof (E);
}

template<class T, unsigned size, class BinOp = std::plus<>>
auto arr_accumulate_trivial(
T (&arr)[size], std::remove_all_extents_t<T> init = {}, BinOp op = {})
{
using E = std::remove_all_extents_t<T>;
static_assert(std::is_trivially_copyable_v<E>);
std::array<E, flat_size(arr)> flat;
std::memcpy(flat.data(), arr, sizeof arr);
return std::accumulate(std::begin(flat), std::end(flat), init, op);
}

Why does a 2D array behave like a 1D array of pointers instead of a 1D array of integers?

When trying to pass a 2D array to a function, I learned that you cannot do so with dynamically allocated arrays, since the compiler needs to know array[][columns].

This is true, in the sense that you cannot pass any array to a function. You cannot even express such a concept in C, though you can write code that looks like that to the casual eye. In almost every context where an expression evaluating to an array appears -- including function call expressions -- the array value is replaced by a pointer to the first array element.

It is partially true in the sense that a 2D array is an array of arrays, and the dimension of the (array) element type is is part of the overall array's type, part of the type of every element, and part of the type of a pointer to the first element. As such, that dimension must be part of the type of any function parameter to which you want to pass (a pointer to the first element of) the array.

It is most accurately characterized as false, however, even for 2D arrays both of whose dimensions are determined at run time. Since 1999, C has supported variable-length arrays (though in C11 it was made optional), and these play very nicely indeed with dynamically-allocated multi-dimensional arrays and with pointers to arrays of varying dimension:

// Dynamically allocating a 2D array of runtime-determined dimensions:
unsigned rows = calculate_number_of_rows();
unsigned columns = calculate_number_of_columns();
int (*matrix)[columns] = malloc(rows * sizeof(*matrix));

They work well for functions accepting such pointers, too:

void do_something(unsigned rows, unsigned columns, int matrix[rows][columns]);

... or, equivalently ...

void do_something(unsigned rows, unsigned columns, int matrix[][columns]);

... or ...

void do_something(unsigned rows, unsigned columns, int (*matrix)[columns]);

Those three forms are completely equivalent.

However, I learned that a 2D array is stored a 1D array, where the elements of each new row just follows the elements of the previous row.

A 2D array is an array of 1D arrays. The elements of any array are stored contiguously in memory without padding, so the layout of a 2D array of dimensions (r, c) cannot be distinguished from the layout of a 1D array of dimension r * c, but I recommend against thinking of it in the terms
you used.

When I pass an array name to a function as a pointer to an array, this seems to be the case, and my code works fine.

Do not do that. In practice, it is very likely to work exactly as you say, but you should heed the warnings emitted by your compiler -- and it definitely should be emitting warnings about that.

However, in the function where the 2D array is declared, it behaves as an array of pointers instead.

You've not presented an example of a function that would fit your description. Certainly it is possible to pass an array of pointers, but it is quite possible to pass a pointer to an array instead. See above for examples.

A pointer to a multidimensional array

It really depends on what you are trying to do, but a 2D array cannot decay to a single pointer. As an example, pointer to array would work, letting the first dimension decay to pointer:

int (*plist)[COLS] = list;

You can then access the elements as plist[i][j].

Also beware of using names such as list, especially if you do something adventurous such as saying using namespace std;.

Multi-Dimensional Arrays in Memory

What you are looking for is found in [dcl.array]/6

An object of type “array of N U” contains a contiguously allocated non-empty set of N subobjects of type U, known as the elements of the array, and numbered 0 to N-1.

What this states is that if you have an array like int arr[10] then to have 10 int's that are contiguous in memory. This definition works recursively though so if you have

int arr[5][10]

then what you have is an array of 5 int[10] arrays. If we apply the definition from above then we know that the 5 int[10] arrays are contiguous and then int[10]'s themselves are contiguous so all 50 int's are contiguous. So yes, a 2d array look just like a 1d array in memory since really that is what they are.

This does not mean you can get a pointer to arr[0][0] and iterate to arr[4][9] with it. Per [expr.add]/4

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

  • If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.

  • Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.

  • Otherwise, the behavior is undefined.

What this states is that if you have a pointer to an array, then the valid indices you can add to it are [0, array_size]. So if you did

int * it = &arr[0][0]

then what it points to is the first element of the first array which means you can legally only increment it to it + 10 since that is the past then end element of the first array. Going into the second array is UB even though they are contiguous.

Performance of 2-dimensional array vs 1-dimensional array

In C, 2-dimensional arrays are just a neat indexing scheme for 1-dimensional arrays. Just like with a 1D array, 2D arrays allocate a single block of contiguous memory, and the A[row][col] notation is similar to saying A[row*NCOLS+col].

Usually if you were to implement your own multidimensional arrays using single dimensional arrays, you'd write an indexing function:

int getIndex(int row, int col) { return row*NCOLS+col; }

Assuming your compiler inlines this function, the performance here would be precisely the same as if you used the built in 'indexing function' of 2D arrays.

To illustrate:

#define NROWS 10
#define NCOLS 20

This:

int main(int argc, char *argv[]) {
int myArr[NROWS*NCOLS];
for (int i=0; i<NROWS; ++i) {
for (int j=0; j<NCOLS; ++j) {
myArr[getIndex(i,j)] = i+j;
}
}
return 0;
}

Should perform the same as this:

int main(int argc, char *argv[]) {
int myArr[NROWS][NCOLS];
for (int i=0; i<NROWS; ++i) {
for (int j=0; j<NCOLS; ++j) {
myArr[i][j] = i+j;
}
}
return 0;
}

Though as AraK pointed out, if you are jumping around rows a lot, and the rows are very large, you could hit a lot of page faults... in that case the custom indexing function (with rows and cols switched around) could help, but so could simply changing which of the dimensions in a 2-dimensional array you treat as the rows and which you treat as the columns.



Related Topics



Leave a reply



Submit