Why Does Array.Slice Behave Differently for (Length, N)

Why does array.slice behave differently for (length, n)

Consider this

a = [0, 1, 2, 3] #=> [0, 1, 2, 3]
a[0, 10] #=> [0, 1, 2, 3]
a[1, 10] #=> [1, 2, 3]
a[2, 10] #=> [2, 3]
a[3, 10] #=> [3]
a[4, 10] #=> []
a[5, 10] #=> nil

So a[4, 10] is the slice between the 3 and the end of the array which is []

Where as a[4] and a[5, 10] are accessing elements that aren't in the array

It may help to think of the slice points as being between the elements, rather than the elements themselves.

[ <0> 0 <1> 1 <2> 2 <3> 3 <4> ]

Where <n> are the points between elements and the start/end of the array. a[4, 10] then becomes a selection of 10 elements, starting from point 4. Whereas a[5, 10] starts from point 5, which is not part of the list.

Why Array#slice and Array#slice! behave differently?

#slice and #slice! behaviors are equivalent: both "return a subarray starting at the start index and continuing for length elements", the same way as #sort and #sort! return a sorted array or #reverse and #reverse! return a reversed array.

The difference is that the bang methods also modify the object itself.

a = [4,2,6,9,1,5,8]
b = a.dup
a.sort == b.sort! # => true
a == b # => false

b = a.dup
a.reverse == b.reverse! # => true
a == b # => false

b = a.dup
a.slice(2,2) == b.slice!(2,2) # => true
a == b # => false

Go Slice - difference between [:n] and [n:]

Subslicing in Go allows you to slice beyond the end of the slice, as long as it's still within range of the underlaying array's capacity. You cannot slice before the start of that slice, but you can slice after it so long as you don't go past that last allocated index.

As an example, s[3:] then s[:3] works, but s[4:] then s[:4] will panic, as you're requesting indexes 4 through 7 of the underlying array, which only has allocated indexes 0-5.

It's a bit of an oddity, but it does allow you to max out any slice simply by doing slice = slice[:cap(slice)].

https://play.golang.org/p/Gq5xoXc3Vd

The language specification annotes this, btw. I've paraphrased it below for the simple slice notation you're using (there's an alternative that also specifies the maximum index for the new slice).

For a string, array, pointer to array, or slice a, the primary expression a[low : high] constructs a substring or slice.
The indices are in range if 0 <= low <= high <= cap(a),
otherwise they are out of range.

JavaScript array.slice() behavior

Array.prototype.slice returns an array. And if you compare an array and a number, they will be different.

Instead, consider doing the same underscore does:

if (n == null /*|| guard*/) return array[array.length - 1];

Understanding slicing

The syntax is:

a[start:stop]  # items start through stop-1
a[start:] # items start through the rest of the array
a[:stop] # items from the beginning through stop-1
a[:] # a copy of the whole array

There is also the step value, which can be used with any of the above:

a[start:stop:step] # start through not past stop, by step

The key point to remember is that the :stop value represents the first value that is not in the selected slice. So, the difference between stop and start is the number of elements selected (if step is 1, the default).

The other feature is that start or stop may be a negative number, which means it counts from the end of the array instead of the beginning. So:

a[-1]    # last item in the array
a[-2:] # last two items in the array
a[:-2] # everything except the last two items

Similarly, step may be a negative number:

a[::-1]    # all items in the array, reversed
a[1::-1] # the first two items, reversed
a[:-3:-1] # the last two items, reversed
a[-3::-1] # everything except the last two items, reversed

Python is kind to the programmer if there are fewer items than you ask for. For example, if you ask for a[:-2] and a only contains one element, you get an empty list instead of an error. Sometimes you would prefer the error, so you have to be aware that this may happen.

Relationship with the slice object

A slice object can represent a slicing operation, i.e.:

a[start:stop:step]

is equivalent to:

a[slice(start, stop, step)]

Slice objects also behave slightly differently depending on the number of arguments, similarly to range(), i.e. both slice(stop) and slice(start, stop[, step]) are supported.
To skip specifying a given argument, one might use None, so that e.g. a[start:] is equivalent to a[slice(start, None)] or a[::-1] is equivalent to a[slice(None, None, -1)].

While the :-based notation is very helpful for simple slicing, the explicit use of slice() objects simplifies the programmatic generation of slicing.

Go slice length is capacity -1, why?

This line:

fruits := [4]string{"apple", "orange", "mango"}

Creates an array, not a slice. It has 4 elements even though you only supplied 3. Output of fmt.Printf("%q", fruits):

["apple" "orange" "mango" ""]

Slicing it:

tasty_fruits := fruits[1:3]

Results in:

["orange" "mango"]

Length: obviously 2. Capacity?

The capacity is ... the sum of the length of the slice and the length of the [underlying] array beyond the slice.

Since there is one element after "mango" in the underlying array, capacity is 2 + 1 = 3.

Indexing the slice (tasty_fruits): spec: Index expressions:

For a of slice type S: a[x]

  • if x is out of range at run time, a run-time panic occurs

x is in range if 0 <= x < len(a), otherwise it is out of range. Since len(tasty_fruits) is 2, the index 2 is out of range, and therefore runtime panic occurs.

You can't index the slice beyond the length of the slice, even if capacity would allow it. You can only reach the elements beyond the length if you reslice the slice, e.g.:

tasty_fruits2 := tasty_fruits[:3]
tasty_fruits2[2] = "nectarine" // This is ok, len(tasty_fruits2) = 3
fmt.Printf("%q", tasty_fruits2)

Output:

["orange" "mango" "nectarine"]

numpy.array slicing behaviour

Because NumPy is a high-performance data collection. For Python to create a new list, it must construct a new list, increment all pointers to each element in the list, add the item to the list, and then return the slice. NumPy (likely) simply increments the offset of the start array and changes the end of the array.

NumPy slicing

Think of a NumPy array as something like this (yes, this is highly oversimplified):

struct array
{
size_t type_size;
size_t length
void* start;
};

If you don't know C, then basically means that an array can be thought of as an address to memory, designating the start of the array, it stores the size of each type it is storing, and then the length of the buffer. For an integer array, we might have a type_size of 4 and in this example, a length of 5 (for a buffer of 20 bytes).

When slicing, rather than copy the entire data, NumPy can simply increment the start and reduce the size.

array slice(array* array, size_t start, size_t end)
{
array arr = *array;
arr.start = (char*)arr.start + start;
arr.length = end - start;
return arr;
}

This is dramatically cheaper than allocating memory for a new list and then assigning (and incrementing, Python is reference counted) those pointers into the list.

Python Slicing

Here's a simplified Python example:

PyObject* slice(PyObject* list, size_t start, size_t end)
{
size_t length = end - start;
PyObject* out = PyList_New(length);
for (size_t i = start; size_t i < end; ++i) {
PyObject*item = PyList_GetItem(list, i);
PyList_Append(&out, i);
}

return out;
}

Notice how much more involved this is? And a lot more goes under the hood.

Rational

Think performance: for NumPy to have the original slice behavior, it must occupy a new address in memory (since the data is contiguous in memory). This would mean copying the data, likely via memcpy(). This is expensive: say I have an array of 20,000 np.int32 (~80 KB), I would need to copy all this data to a new array. In the slice example above, I only copy ~24 bytes worth of memory (assuming 8-byte size_t and pointers).



Related Topics



Leave a reply



Submit