Array Overflow (Why Does This Work)

Array overflow (why does this work?)

It's undefined behavior. UB comes in many flavors. Here are a few:

1) It will kick your dog.

2) It will reformat your hard drive.

3) It will work without a problem.

In your case, with your compiler and on your platform and on this particular day, you are seeing (3). But try it elsewhere, and you might get (1), (2), or something else completely (most likely an access violation).

C++ weird array behaviour

Compiling it with g++ -Wall -Wextra you get this warning:

rando_so.cpp: In function 'int main()':
rando_so.cpp:15:19: warning: iteration 7 invokes undefined behavior [-Waggressive-loop-optimizations]
count[i+1]+=count[i];
~~~~~~~~~~^~~~~~~~~~
rando_so.cpp:13:18: note: within this loop
for(int i=0;i<r+1;i++)

This hints you to look a bit closer at that second loop. Your variable i goes up to the highest possible index of count - and then you add 1. That is undefined behaviour. In your case, it is likely that you happen to be writing into the first element of arr now, because of how it's laid out on the stack. But as far as I know, anything could happen as a result of this.

Getting a stack overflow exception when declaring a large array

Your array is way too big to fit into the stack, consider using the heap:

int *sieve = malloc(2000000 * sizeof(*sieve));

If you really want to change the stack size, take a look at this document.

Tip: - Don't forget to free your dynamically allocated memory when it's no-longer needed.

Accessing an array out of bounds gives no error, why?

Welcome to every C/C++ programmer's bestest friend: Undefined Behavior.

There is a lot that is not specified by the language standard, for a variety of reasons. This is one of them.

In general, whenever you encounter undefined behavior, anything might happen. The application may crash, it may freeze, it may eject your CD-ROM drive or make demons come out of your nose. It may format your harddrive or email all your porn to your grandmother.

It may even, if you are really unlucky, appear to work correctly.

The language simply says what should happen if you access the elements within the bounds of an array. It is left undefined what happens if you go out of bounds. It might seem to work today, on your compiler, but it is not legal C or C++, and there is no guarantee that it'll still work the next time you run the program. Or that it hasn't overwritten essential data even now, and you just haven't encountered the problems, that it is going to cause — yet.

As for why there is no bounds checking, there are a couple aspects to the answer:

  • An array is a leftover from C. C arrays are about as primitive as you can get. Just a sequence of elements with contiguous addresses. There is no bounds checking because it is simply exposing raw memory. Implementing a robust bounds-checking mechanism would have been almost impossible in C.
  • In C++, bounds-checking is possible on class types. But an array is still the plain old C-compatible one. It is not a class. Further, C++ is also built on another rule which makes bounds-checking non-ideal. The C++ guiding principle is "you don't pay for what you don't use". If your code is correct, you don't need bounds-checking, and you shouldn't be forced to pay for the overhead of runtime bounds-checking.
  • So C++ offers the std::vector class template, which allows both. operator[] is designed to be efficient. The language standard does not require that it performs bounds checking (although it does not forbid it either). A vector also has the at() member function which is guaranteed to perform bounds-checking. So in C++, you get the best of both worlds if you use a vector. You get array-like performance without bounds-checking, and you get the ability to use bounds-checked access when you want it.

Why doesn't my program crash when I write past the end of an array?

Something I wrote sometime ago for education-purposes...

Consider the following c-program:

int q[200];

main(void) {
int i;
for(i=0;i<2000;i++) {
q[i]=i;
}
}

after compiling it and executing it, a core dump is produced:

$ gcc -ggdb3 segfault.c
$ ulimit -c unlimited
$ ./a.out
Segmentation fault (core dumped)

now using gdb to perform a post mortem analysis:

$ gdb -q ./a.out core
Program terminated with signal 11, Segmentation fault.
[New process 7221]
#0 0x080483b4 in main () at s.c:8
8 q[i]=i;
(gdb) p i
$1 = 1008
(gdb)

huh, the program didn’t segfault when one wrote outside the 200 items allocated, instead it crashed when i=1008, why?

Enter pages.

One can determine the page size in several ways on UNIX/Linux, one way is to use the system function sysconf() like this:

#include <stdio.h>
#include <unistd.h> // sysconf(3)

int main(void) {
printf("The page size for this system is %ld bytes.\n",
sysconf(_SC_PAGESIZE));

return 0;
}

which gives the output:

The page size for this system is 4096 bytes.

or one can use the commandline utility getconf like this:

$ getconf PAGESIZE
4096

post mortem

It turns out that the segfault occurs not at i=200 but at i=1008, lets figure out why. Start gdb to do some post mortem ananlysis:

$gdb -q ./a.out core

Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
[New process 4605]
#0 0x080483b4 in main () at seg.c:6
6 q[i]=i;
(gdb) p i
$1 = 1008
(gdb) p &q
$2 = (int (*)[200]) 0x804a040
(gdb) p &q[199]
$3 = (int *) 0x804a35c

q ended at at address 0x804a35c, or rather, the last byte of q[199] was at that location. The page size is as we saw earlier 4096 bytes and the 32-bit word size of the machine gives that an virtual address breaks down into a 20-bit page number and a 12-bit offset.

q[] ended in virtual page number:

0x804a = 32842
offset:

0x35c = 860
so there were still:

4096 - 864 = 3232
bytes left on that page of memory on which q[] was allocated. That space can hold:

3232 / 4 = 808
integers, and the code treated it as if it contained elements of q at position 200 to 1008.

We all know that those elements don’t exists and the compiler didn’t complain, neither did the hw since we have write permissions to that page. Only when i=1008 did q[] refer to an address on a different page for which we didn’t have write permission, the virtual memory hw detected this and triggered a segfault.

An integer is stored in 4 bytes, meaning that this page contains 808 (3236/4) additional fake elements meaning that it is still perfectly legal to access these elements from q[200], q[201] all the way up to element 199+808=1007 (q[1007]) without triggering a seg fault. When accessing q[1008] you enter a new page for which the permission are different.

Why aren't variable-length arrays part of the C++ standard?

There recently was a discussion about this kicked off in usenet: Why no VLAs in C++0x.

I agree with those people that seem to agree that having to create a potential large array on the stack, which usually has only little space available, isn't good. The argument is, if you know the size beforehand, you can use a static array. And if you don't know the size beforehand, you will write unsafe code.

C99 VLAs could provide a small benefit of being able to create small arrays without wasting space or calling constructors for unused elements, but they will introduce rather large changes to the type system (you need to be able to specify types depending on runtime values - this does not yet exist in current C++, except for new operator type-specifiers, but they are treated specially, so that the runtime-ness doesn't escape the scope of the new operator).

You can use std::vector, but it is not quite the same, as it uses dynamic memory, and making it use one's own stack-allocator isn't exactly easy (alignment is an issue, too). It also doesn't solve the same problem, because a vector is a resizable container, whereas VLAs are fixed-size. The C++ Dynamic Array proposal is intended to introduce a library based solution, as alternative to a language based VLA. However, it's not going to be part of C++0x, as far as I know.

The effects of writing past the end of an array

It's undefined behavior, and anything can happen.

There's much more variables to take into account than just the compiler - version, OS, hardware, weather, what day of the week it is, etc.

The standard says that undefined behavior can mean anything, so you can't really have any expectations, not even with the same compiler.

If, for example, you had a different variable located just after test.a, you could get an access violation. Or you could simply overwrite that variable. Anything goes.

Basically, it's not the writing part that undefined in this case, but the call to

test.a[i]

with i>=128. It's just not permitted.

Array vs. Object efficiency in JavaScript

The short version: Arrays are mostly faster than objects. But there is no 100% correct solution.

Update 2017 - Test and Results

var a1 = [{id: 29938, name: 'name1'}, {id: 32994, name: 'name1'}];

var a2 = [];
a2[29938] = {id: 29938, name: 'name1'};
a2[32994] = {id: 32994, name: 'name1'};

var o = {};
o['29938'] = {id: 29938, name: 'name1'};
o['32994'] = {id: 32994, name: 'name1'};

for (var f = 0; f < 2000; f++) {
var newNo = Math.floor(Math.random()*60000+10000);
if (!o[newNo.toString()]) o[newNo.toString()] = {id: newNo, name: 'test'};
if (!a2[newNo]) a2[newNo] = {id: newNo, name: 'test' };
a1.push({id: newNo, name: 'test'});
}

test setup
test results

Original Post - Explanation

There are some misconceptions in your question.

There are no associative arrays in Javascript. Only Arrays and Objects.

These are arrays:

var a1 = [1, 2, 3];
var a2 = ["a", "b", "c"];
var a3 = [];
a3[0] = "a";
a3[1] = "b";
a3[2] = "c";

This is an array, too:

var a3 = [];
a3[29938] = "a";
a3[32994] = "b";

It's basically an array with holes in it, because every array does have continous indexing. It's slower than arrays without holes. But iterating manually through the array is even slower (mostly).

This is an object:

var a3 = {};
a3[29938] = "a";
a3[32994] = "b";

Here is a performance test of three possibilities:

Lookup Array vs Holey Array vs Object Performance Test

An excellent read about these topics at Smashing Magazine: Writing fast memory efficient JavaScript

Performance of Arrays vs. Lists

Very easy to measure...

In a small number of tight-loop processing code where I know the length is fixed I use arrays for that extra tiny bit of micro-optimisation; arrays can be marginally faster if you use the indexer / for form - but IIRC believe it depends on the type of data in the array. But unless you need to micro-optimise, keep it simple and use List<T> etc.

Of course, this only applies if you are reading all of the data; a dictionary would be quicker for key-based lookups.

Here's my results using "int" (the second number is a checksum to verify they all did the same work):

(edited to fix bug)

List/for: 1971ms (589725196)
Array/for: 1864ms (589725196)
List/foreach: 3054ms (589725196)
Array/foreach: 1860ms (589725196)

based on the test rig:

using System;
using System.Collections.Generic;
using System.Diagnostics;
static class Program
{
static void Main()
{
List<int> list = new List<int>(6000000);
Random rand = new Random(12345);
for (int i = 0; i < 6000000; i++)
{
list.Add(rand.Next(5000));
}
int[] arr = list.ToArray();

int chk = 0;
Stopwatch watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 100; rpt++)
{
int len = list.Count;
for (int i = 0; i < len; i++)
{
chk += list[i];
}
}
watch.Stop();
Console.WriteLine("List/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

chk = 0;
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 100; rpt++)
{
for (int i = 0; i < arr.Length; i++)
{
chk += arr[i];
}
}
watch.Stop();
Console.WriteLine("Array/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

chk = 0;
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 100; rpt++)
{
foreach (int i in list)
{
chk += i;
}
}
watch.Stop();
Console.WriteLine("List/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

chk = 0;
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 100; rpt++)
{
foreach (int i in arr)
{
chk += i;
}
}
watch.Stop();
Console.WriteLine("Array/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

Console.ReadLine();
}
}


Related Topics



Leave a reply



Submit