Why Is Arr = [] Faster Than Arr = New Array

Why is arr = [] faster than arr = new Array?

Further expanding on previous answers...

From a general compilers perspective and disregarding VM-specific optimizations:

First, we go through the lexical analysis phase where we tokenize the code.

By way of example, the following tokens may be produced:

[]: ARRAY_INIT
[1]: ARRAY_INIT (NUMBER)
[1, foo]: ARRAY_INIT (NUMBER, IDENTIFIER)
new Array: NEW, IDENTIFIER
new Array(): NEW, IDENTIFIER, CALL
new Array(5): NEW, IDENTIFIER, CALL (NUMBER)
new Array(5,4): NEW, IDENTIFIER, CALL (NUMBER, NUMBER)
new Array(5, foo): NEW, IDENTIFIER, CALL (NUMBER, IDENTIFIER)

Hopefully this should provide you a sufficient visualization so you can understand how much more (or less) processing is required.

  1. Based on the above tokens, we know as a fact ARRAY_INIT will always produce an array. We therefore simply create an array and populate it. As far as ambiguity, the lexical analysis stage has already distinguished ARRAY_INIT from an object property accessor (e.g. obj[foo]) or brackets inside strings/regex literals (e.g. "foo[]bar" or /[]/)

  2. This is miniscule, but we also have more tokens with new Array. Furthermore, it's not entirely clear yet that we simply want to create an array. We see the "new" token, but "new" what? We then see the IDENTIFIER token which signifies we want a new "Array," but JavaScript VM's generally do not distinguish an IDENTIFIER token and tokens for "native global objects." Therefore...

  3. We have to look up the scope chain each time we encounter an IDENTIFIER token. Javascript VMs contain an "Activation object" for each execution context which may contain the "arguments" object, locally defined variables, etc. If we cannot find it in the Activation object, we begin looking up the scope chain until we reach the global scope. If nothing is found, we throw a ReferenceError.

  4. Once we've located the variable declaration, we invoke the constructor. new Array is an implicit function call, and the rule of thumb is that function calls are slower during execution (hence why static C/C++ compilers allow "function inlining" - which JS JIT engines such as SpiderMonkey have to do on-the-fly)

  5. The Array constructor is overloaded. The Array constructor is implemented as native code so it provides some performance enhancements, but it still needs to check for arguments length and act accordingly. Moreover, in the event only one argument is supplied, we need to further check the type of the argument. new Array("foo") produces ["foo"] where as new Array(1) produces [undefined]

So to simplify it all: with array literals, the VM knows we want an array; with new Array, the VM needs to use extra CPU cycles to figure out what new Array actually does.

Is it better to write: var arr=[]; than var arr=new Array();?

There is not a big difference between these definitions, except that the first way uses the array/object literal and the second the array/object constructor.

The array constructor may return different results, depending on the number of arguments passed in. If you pass in one argument, a new empty array is created of the length of that argument. For example:

// arr1 is the same as arr2
var arr1 = new Array(1, 2, 3, 4);
var arr2 = [1, 2, 3, 4];

alert(arr1.length == arr2.length); // true
alert(arr1[0]); // 1
alert(arr2[0]); // 1

But, passing in one argument results differently:

// arr3 has length 200 and is empty, while arr4 has length 1 and contains a number
var arr3 = new Array(200);
var arr4 = [200];

alert(arr3.length == arr4.length); // false
alert(arr3[0]); // 'undefined'
alert(arr4[0]); // 200

The speediest way to define an array or object is of course the literal way, because you don't need to call the constructor first. Anyway, the actual speed difference is negligible, really.

I did a speed test in Chrome 6, in which I defined 20 times 10000000 the same array 1, 2, 3, which gave these results:

Average speed per 10000000 calls
Array Constructor : 226.55 ms
Array Literal : 159.1 ms

As you can see, the array literal is 67,45ms faster per 10000000 array definitions.

Why is $arr[] = 'x' faster than $arr[0] = 'x'

In the first solution, php has to figure out what index must be used to set the new value and check if we are going to update existed element or add a new one.

In b.php new element is always put on the end of the array, the additional check of index is not required. This is basically how stack works.

Why use Lists, when Arrays are faster?

How can Array be so fast even though it has to copy around items continuously?

Arrays are faster for linear processing because array contents are stored contiguously in memory. When you access memory linearly, multiple objects are fetched to the processor cache simultaneously. Linked list nodes on the other hand are scattered throughout the memory, so processing them linearly results in more acccesses in main memory. Reading cache is much, much faster than reading main memory.

And why even use Lists then?

One major reason to use a linked list, is that inserting new elements, or removing existing ones, does not invalidate references (including iterators and pointers) to other elements in the linked list. An array can not have such guarantee.

Why is array.push sometimes faster than array[n] = value?

All sorts of factors come into play, most JS implementations use a flat array that converts to sparse storage if it becomes necessary later on.

Basically the decision to become sparse is a heuristic based on what elements are being set, and how much space would be wasted in order to remain flat.

In your case you are setting the last element first, which means the JS engine will see an array that needs to have a length of n but only a single element. If n is large enough this will immediately make the array a sparse array -- in most engines this means that all subsequent insertions will take the slow sparse array case.

You should add an additional test in which you fill the array from index 0 to index n-1 -- it should be much, much faster.

In response to @Christoph and out of a desire to procrastinate, here's a description of how arrays are (generally) implemented in JS -- specifics vary from JS engine to JS engine but the general principle is the same.

All JS Objects (so not strings, numbers, true, false, undefined, or null) inherit from a base object type -- the exact implementation varies, it could be C++ inheritance, or manually in C (there are benefits to doing it in either way) -- the base Object type defines the default property access methods, eg.

interface Object {
put(propertyName, value)
get(propertyName)
private:
map properties; // a map (tree, hash table, whatever) from propertyName to value
}

This Object type handles all the standard property access logic, the prototype chain, etc.
Then the Array implementation becomes

interface Array : Object {
override put(propertyName, value)
override get(propertyName)
private:
map sparseStorage; // a map between integer indices and values
value[] flatStorage; // basically a native array of values with a 1:1
// correspondance between JS index and storage index
value length; // The `length` of the js array
}

Now when you create an Array in JS the engine creates something akin to the above data structure. When you insert an object into the Array instance the Array's put method checks to see if the property name is an integer (or can be converted into an integer, e.g. "121", "2341", etc.) between 0 and 2^32-1 (or possibly 2^31-1, i forget exactly). If it is not, then the put method is forwarded to the base Object implementation, and the standard [[Put]] logic is done. Otherwise the value is placed into the Array's own storage, if the data is sufficiently compact then the engine will use the flat array storage, in which case insertion (and retrieval) is just a standard array indexing operation, otherwise the engine will convert the array to sparse storage, and put/get use a map to get from propertyName to value location.

I'm honestly not sure if any JS engine currently converts from sparse to flat storage after that conversion occurs.

Anyhoo, that's a fairly high level overview of what happens and leaves out a number of the more icky details, but that's the general implementation pattern. The specifics of how the additional storage, and how put/get are dispatched differs from engine to engine -- but this is the clearest i can really describe the design/implementation.

A minor addition point, while the ES spec refers to propertyName as a string JS engines tend to specialise on integer lookups as well, so someObject[someInteger] will not convert the integer to a string if you're looking at an object that has integer properties eg. Array, String, and DOM types (NodeLists, etc).

What is faster in Ruby, `arr += [x]` or `arr x`

I believe this is down to how MRI allocates arrays (all of this answer is very MRI specific). Ruby tries really hard to be efficient with arrays: small arrays (<= 3 elements) are packed right into the RARRAY structure for example.

Another thing is that if you take an array and start appending values one at a time, ruby doesn't grow the buffer one element at a time, it does it in chunks: this is more efficient, at the expense of a small amount of memory.

One tool to see all this is using memsize_of:

ObjectSpace.memspace_of([]) #=> 40 (on 64 bit platforms
ObjectSpace.memspace_of([1,2]) #=> 40 (on 64 bit platforms
ObjectSpace.memsize_of([1,2,3,4]) #=> 72
ObjectSpace.memsize_of([]<<1<<2<<3<<4) #=> 200

In the first 2 cases, the array is packed within the RARRAY structure so the memory size is just the base size of any object (40 bytes). In the third case ruby had to allocate a array for 4 values (8 bytes each) so the size is 40 + 32 = 72. In the last case ruby grew the storage to 20 elements

This is relevant to the second case. The block inside the benchmark has a freshly created array which still has some spare capacity:

 ObjectSpace.memsize_of((0..20).to_a) #=> 336, enough for nearly 40 elements.

<< can just write its object to the appropriate slot, whereas += has to allocate a new array (both the object and its buffer) and copy all the data.

If I do

a = [1,2,3,4,5]
b = a.dup
ObjectSpace.memsize_of(b) #=> 40

Here b shares its buffer with a, so is reported as using no memory beyond the basic object size. At the point where b gets written to, ruby will have to copy the data (copy on write): in the first benchmark BOTH += and << are actually allocating a new buffer of sufficient size and copying all the data across.

Here is where I get handwavy: this would completely explain things if << and += performed identically, but that's not what is happening. My reading of things is that + is simpler. All it has to do, no matter what is allocate a buffer, and memcpy some data from 2 locations - this is fast.

<< on the other hand is mutating the array so it is paying the overhead of copy on write: it is doing extra work compare to +=. For example ruby needs to track who is sharing buffers so that garbage collection of the original array is possible when no one is sharing it anymore.

A benchmark that goes someway to convincing me that this interpretation is correct is as follows:

require 'benchmark/ips'
b = (0..20).to_a.dup
y = 21
Benchmark.ips do |x|
x.report('<<') { a = b.dup; a << y }
x.report('+=') { a = b.dup; a += [y] }

x.report('<<2') { a = b.dup; a << y; a<< y}
x.report('+=2') { a = b.dup; a += [y]; a += [y] }
end

This is basically the same benchmark as the original, but now appending 2 elements. For << the copy on write overhead will only be incurred the first time. The results I get are

              <<      1.325M (± 7.6%) i/s -      6.639M
+= 1.742M (± 9.5%) i/s - 8.677M
<<2 1.230M (±10.3%) i/s - 6.079M
+=2 1.140M (±10.8%) i/s - 5.656M

So appending to the array is back on top if you do it twice.

Why iterate by array with size faster

If you don't specify the array size it will have to keep allocating more space. But if you specify the size at the beginning, it only allocates once.



Related Topics



Leave a reply



Submit