Why I Can't Create an Array with Large Size

Why I can't create an array with large size?

Theory

There are two possible exceptions:

  • OutOfMemoryError: Java heap space means your array does not fit into java heap space. In order to solve you can increase the maximum heap size by using JVM option -Xmx. Also take into account that the maximum size of object cannot be larger than the largest heap generation.
  • OutOfMemoryError: Requested array size exceeds VM limit means platform-specific size was exceeded:

    • the upper bound limit is set by the restrictions of the size type used to describe an index in the array, so theoretical array size is limited by 2^31-1=2147483647 elements.
    • the other limit is JVM/platform specific. According to chapter 10: Arrays of The Java Language Specification, Java SE 7 Edition there is no strict limit on array length, thus array size may be reduced without violating JLS.

Practice

In HotSpot JVM array size is limited by internal representation. In the GC code JVM passes around the size of an array in heap words as an int then converts back from heap words to jint this may cause an overflow. So in order to avoid crashes and unexpected behavior the maximum array length is limited by (max size - header size). Where header size depends on C/C++ compiler which was used to build the JVM you are running(gcc for linux, clang for macos), and runtime settings(like UseCompressedClassPointers). For example on my linux:

  • Java HotSpot(TM) 64-Bit Server VM 1.6.0_45 limit Integer.MAX_VALUE
  • Java HotSpot(TM) 64-Bit Server VM 1.7.0_72 limit Integer.MAX_VALUE-1
  • Java HotSpot(TM) 64-Bit Server VM 1.8.0_40 limit Integer.MAX_VALUE-2

Useful Links

  • https://bugs.openjdk.java.net/browse/JDK-8059914
  • https://bugs.openjdk.java.net/browse/JDK-8029587

Why can't I create an array with size determined by a global variable?

In C99, 6.7.8/3:

The type of the entity to be
initialized shall be an array of
unknown size or an object type that is
not a variable length array type.

6.6/2:

A constant expression can be evaluated
during translation rather than runtime

6.6/6:

An integer constant expression
shall have integer type and shall only
have operands that are integer
constants, enumeration constants,
character constants, sizeof
expressions whose results are integer
constants, and floating constants that
are the immediate operands of casts.

6.7.5.2/4:

If the size is an integer constant
expression and the element type has a
known constant size, the array type is
not a variable length array type;
otherwise, the array type is a
variable length array type.

a has variable length array type, because size is not an integer constant expression. Thus, it cannot have an initializer list.

In C90, there are no VLAs, so the code is illegal for that reason.

In C++ there are also no VLAs, but you could make size a const int. That's because in C++ you can use const int variables in ICEs. In C you can't.

Presumably you didn't intend a to have variable length, so what you need is:

#define size 5

If you actually did intend a to have variable length, I suppose you could do something like this:

int a[size];
int initlen = size;
if (initlen > 5) initlen = 5;
memcpy(a, (int[]){1,2,3,4,5}, initlen*sizeof(int));

Or maybe:

int a[size];
for (int i = 0; i < size && i < 5; ++i) {
a[i] = i+1;
}

It's difficult to say, though, what "should" happen here in the case where size != 5. It doesn't really make sense to specify a fixed-size initial value for a variable-length array.

Why can't I create an array of size n?

int n;
cin >> n;
int array[n];

This will work if use g++. g++ support VLAs as an extension. However ISO C++ mandates size of an array to be a constant expression i.e the size must be known at compile time.

Why is it that C++ don't allow you to create array of dynamic length in stack memory?

Simple answer "Because the standard says so". Even the upcoming C++ Standard (C++0x) is not going to allow Variable Length Arrays.

BTW we always have std::vector in C++. So there's no reason to worry. :)

Why aren't arrays expandable?

This question didn't mention a language so I'm going to choose 'C' based arrays for my answer.

Arrays are allocated as a single chunk of memory. Growing an array is problematic because the only way to do it properly is to grow it at the end. For a growth of size N there must be at least N free bytes at the end of the array before the next allocated address.

Supporting this type of allocation necessitates that allocations be spread across the virtual address space. This both removes the benefits of having memory allocations closer to each other and serves to increase fragmentation. This flies in the face of most memory managers which try to pack memory together and reduce fragmentation.

Allocating a new array at a place in memory with sufficient space and copying the array there is simply not an option as a general solution. The reason why is that the previous location of the array is visible to consumers through pointers.

int* array = malloc(int*someSize);
int* pointer1 = &(arr[2]);
growArray(&array, 12); // Can't move because pointer1 knows the address of the array

Why is the maximum size of an array too large?

The limit SIZE_MAX / 2 comes from the definitions of size_t and ptrdiff_t on your implementation, which choose that the types ptrdiff_t and size_t have the same width.

C Standard mandates1 that type size_t is unsigned and type ptrdiff_t is signed.

The result of difference between two pointers, will always2 have the type ptrdiff_t. This means that, on your implementation, the size of the object must be limited to
PTRDIFF_MAX, otherwise a valid difference of two pointers could not be represented in type ptrdiff_t, leading to undefined behavior.

Thus the value SIZE_MAX / 2 equals the value PTRDIFF_MAX. If the implementation choose to have the maximum object size be SIZE_MAX, then the width of the type ptrdiff_t would have to be increased. But it is much easier to limit the maximum size of the object to SIZE_MAX / 2, then it is to have the type ptrdiff_t have a greater or equal positive range than that of type size_t.

Standard offers these3 comments4 on the topic.


(Quoted from ISO/IEC 9899:201x)

1 (7.19 Common definitions 2)

The types are

ptrdiff_t

which is the signed integer type of the result of subtracting two pointers;

size_t

which is the unsigned integer type of the result of the sizeof operator;

2 (6.5.6 Additive operators 9)

When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; the result is the difference of the
subscripts of the two array elements. The size of the result is implementation-defined,
and its type (a signed integer type) is ptrdiff_t defined in the header.
If the result is not representable in an object of that type, the behavior is undefined.

3 (K.3.4 Integer types 3)

Extremely large object sizes are frequently a sign that an object’s size was calculated
incorrectly. For example, negative numbers appear as very large positive numbers when
converted to an unsigned type like size_t. Also, some implementations do not support
objects as large as the maximum value that can be represented by type size_t.

4 (K.3.4 Integer types 4)

For those reasons, it is sometimes beneficial to restrict the range of object sizes to detect
programming errors. For implementations targeting machines with large address spaces,
it is recommended that RSIZE_MAX be defined as the smaller of the size of the largest
object supported or (SIZE_MAX >> 1), even if this limit is smaller than the size of
some legitimate, but very large, objects. Implementations targeting machines with small
address spaces may wish to define RSIZE_MAX as SIZE_MAX, which means that there is no object size that is considered a runtime-constraint violation.

Create too large array in C++, how to solve?

A matrix of size 60,000 x 60,000 has 3,600,000,000 elements.

You're using type float so it becomes:

60,000 x 60,000 * 4 bytes = 14,400,000,000 bytes ~= 13.4 GB

Do you even have that much memory in your machine?


Note that the issue of stack vs heap doesn't even matter unless you have enough memory to begin with.


Here's a list of possible problems:

  • You don't have enough memory.
  • If the matrix is declared globally, you'll exceed the maximum size of the binary.
  • If the matrix is declared as a local array, then you will blow your stack.
  • If you're compiling for 32-bit, you have far exceeded the 2GB/4GB addressing limit.


Related Topics



Leave a reply



Submit