How to Return Subscript in Constant Time

How to Return subscript in Constant Time?

The collection's Index does not have to be an Int. One possible approach
is to use a custom index type which has a reference to the corresponding
element. However this requires the list nodes to be instances of a class.

Here is something that I came up with. It can probably be improved,
but hopefully demonstrates the idea.

class ListNode stores
the element and a pointer to the next node, and in addition, an increasing
integer ordinal, which is used to make struct ListIndex
adopt the Comparable protocol.

struct ListIndex contains a reference to the list node, or nil
for endIndex.

struct LinkedListCollection<T>: Collection {

class ListNode {
let element: T
let next: ListNode?
let ordinal: Int

init(element: T, next: ListNode?, ordinal: Int) {
self.element = element
self.next = next
self.ordinal = ordinal
}

// Create ListNode as the head of a linked list with elements from an iterator.
convenience init?<I: IteratorProtocol>(it: inout I, ordinal: Int = 0) where I.Element == T {
if let el = it.next() {
self.init(element: el, next: ListNode(it: &it, ordinal: ordinal + 1), ordinal: ordinal)
} else {
return nil
}
}
}

struct ListIndex: Comparable {
let node: ListNode?

static func <(lhs: ListIndex, rhs: ListIndex) -> Bool {
// Compare indices according to the ordinal of the referenced
// node. `nil` (corresponding to `endIndex`) is ordered last.

switch (lhs.node?.ordinal, rhs.node?.ordinal) {
case let (r?, l?):
return r < l
case (_?, nil):
return true
default:
return false
}
}

static func ==(lhs: ListIndex, rhs: ListIndex) -> Bool {
return lhs.node?.ordinal == rhs.node?.ordinal
}
}

let startIndex: ListIndex
let endIndex: ListIndex

// Create collection as a linked list from the given elements.
init<S: Sequence>(elements: S) where S.Iterator.Element == T {
var it = elements.makeIterator()
startIndex = ListIndex(node: ListNode(it: &it))
endIndex = ListIndex(node: nil)
}

func index(after i: ListIndex) -> ListIndex {
guard let next = i.node?.next else {
return endIndex
}
return ListIndex(node: next)
}

subscript (position: ListIndex) -> T {
guard let node = position.node else {
fatalError("index out of bounds")
}
return node.element
}
}

Example usage:

let coll = LinkedListCollection(elements: [1, 1, 2, 3, 5, 8, 13])
for idx in coll.indices {
print(coll[idx])
}

Good behavior for subscript

If you're implementing a subscript on String, you might want to first think about why the standard library chooses not to.

When you call self.startIndex.advancedBy(index), you're effectively writing something like this:

var i = self.startIndex
while i < index { i = i.successor() }

This occurs because String.CharacterView.Index is not a random-access index type. See docs on advancedBy. String indices aren't random-access because each Character in a string may be any number of bytes in the string's underlying storage — you can't just get character n by jumping n * characterSize into the storage like you can with a C string.

So, if one were to use your subscript operator to iterate through the characters in a string:

for i in 0..<string.characters.count {
doSomethingWith(string[i])
}

... you'd have a loop that looks like it runs in linear time, because it looks just like an array iteration — each pass through the loop should take the same amount of time, because each one just increments i and uses a constant-time access to get string[i], right? Nope. The advancedBy call in first pass through the loop calls successor once, the next calls it twice, and so on... if your string has n characters, the last pass through the loop calls successor n times (even though that generates a result that was used in the previous pass through the loop when it called successor n-1 times). In other words, you've just made an O(n2) operation that looks like an O(n) operation, leaving a performance-cost bomb for whoever else uses your code.

This is the price of a fully Unicode-aware string library.


Anyhow, to answer your actual question — there are two schools of thought for subscripts and domain checking:

  • Have an optional return type: func subscript(index: Index) -> Element?

    This makes sense when there's no sensible way for a client to check whether an index is valid without performing the same work as a lookup — e.g. for a dictionary, finding out if there's a value for a given key is the same as finding out what the value for a key is.

  • Require that the index be valid, and make a fatal error otherwise.

    The usual case for this is situations where a client of your API can and should check for validity before accessing the subscript. This is what Swift arrays do, because arrays know their count and you don't need to look into an array to see if an index is valid.

    The canonical test for this is precondition: e.g.

    func subscript(index: Index) -> Element {
    precondition(isValid(index), "index must be valid")
    // ... do lookup ...
    }

    (Here, isValid is some operation specific to your class for validating an index — e.g. making sure it's > 0 and < count.)

In just about any use case, it's not idiomatic Swift to return a "real" value in the case of a bad index, nor is it appropriate to return a sentinel value — separating in-band values from sentinels is the reason Swift has Optionals.

Which of these is more appropriate for your use case is... well, since your use case is problematic to being with, it's sort of a wash. If you precondition that index < count, you still incur an O(n) cost just to check that (because a String has to examine its contents to figure out which sequences of bytes constitute each character before it knows how many characters it has). If you make your return type optional, and return nil after calling advancedBy or count, you've still incurred that O(n) cost.

return an array from subscripting operator overload

I need to return an array from that method

int& PagedArray::operator [] (int position)

The return type that you've given for your function does not match what you need to return. This function returns a reference to an integer; not an array.

A problem with what you need is that you cannot return an array in C++; that's just not allowed in the language. That is to say, return type cannot be an array. There's a fairly easy way to get around that however: You can return instances of classes and you can store arrays as member of a class. There is a template for such array wrapper class in the standard library. It's called std::array.

Here is an example of returning a sub-array:

constexpr std::size_t page_size = 256;

std::array<int, page_size>
PagedArray::operator[] (int){
std::array<int, page_size> page;
std::copy(completeArray, page);
return page;
}

Given that you originally tried to return a reference, you may be looking for a way to avoid copying the sub-array by using indirection. A problem is that the sub array doesn't exist anywhere prior to calling the function and you cannot return a reference to something that you create in the function.

Instead of returning reference to small array, you can return a subrange pointing to the big array. There are actually more than one option in the standard library. There's a general purpose std::ranges::subrange and also std::span that is specific to contiguous ranges. I recommend using the more specific type assuming you aren't templetising the type of the big container. Example:

std::span<int, page_size>
PagedArray::operator[] (int){
return {completeArray, page_size};
}

How can I create a subscript with a getter that returns an optional, but a setter that returns a non-optional

It is unfortunately impossible to do with Swift (and other languages working with dynamic arrays) for the reason that the size of the array is dynamic and is not know at compile time (it can be initialized with any value when the program is running).

For example, if the TriangularArray<T> is of size 1, then [0, 0] is a valid element, and the compiler cannot know that in advance.

In the standard library, there's no compile-time error when you try to access an array like array[-1] - this will always compile but will always result in a runtime error.

I think the solution that you currently have using optionals seems to be the best scenario here. You also be consistent with how Array works, and trigger a fatalError if invalid subscripts are given.

An alternative would to create a custom Index struct within TriangularArray that represents the triangle's indexes, and can be optionally built from a x and y value (though this could complicate things quite a bit).

PS: This answer assumes that TriangularArray<T> can have a triangle of an arbitrary height (a height that can be specified at runtime), as it's not specified in the question. If the height is defined at compile-time, then the bounds can be hardcoded and like @Raul Mantilla mentionned #error may be used.

Return a malloc’ed matrix while being able to use subscript notation

If you want to allocate a buffer of type T, the typical procedure is

T *ptr = malloc( sizeof *ptr * N ); // sizeof *ptr == sizeof (T)

You're allocating enough space for N elements of type T.

Now let's replace T with an array type, R [M]:

R (*ptr)[M] = malloc( sizeof *ptr * N  ); // sizeof *ptr == sizeof (R [M])

You're allocating enough space for N elements of type R [M] - IOW, you've just allocated enough space for an N by M array of R. Note that the semantics are exactly the same as for the array of T above; all that's changed is the type of ptr.

Applying that to your example:

int (*tab)[y] = malloc( sizeof *tab * x );

You can then index tab as you would any 2D array:

tab[x][y] = new_value();

Edit

Answering the comment:

yet, still, I’m not sure to understand: what’s the meaning of the “(*tab)” syntax? it’s not a function pointer I guess, but why wouldn’t *tab without parenthesis work: what’s the actual different meaning? why doesn’t it work and what does change then?

The subscript [] and function call () operators have higher precedence than unary *, so a declaration like

int *a[N];

is parsed as

int *(a[N]);

and declares a as an array of pointers to int. To declare a pointer to an array, you must explicitly group the * operator with the identifier, like so:

int (*a)[N];

This declares a as a pointer to an array of int. The same rule applies to function declarations. Here's a handy summary:

T *a[N];    // a is an N-element array of pointers to T
T (*a)[N]; // a is a pointer to an N-element array of T
T *f(); // f is a function returning pointer to T
T (*f)(); // f is a pointer to a function returning T

In your code,

int *tab[x][y]=malloc(x*y*sizeof(int));

declares tab as a 2D array of pointers, not as a pointer to a 2D array, and a call to malloc(...) is not a valid initializer for a 2D array object.

The syntax

int (*tab)[x][y]=malloc(x*y*sizeof(int));

declares tab as a pointer to a 2D array, and a call to malloc is a valid initializer for it.

But...

With this declaration, you'll have to explicitly dereference tab before indexing into it, like so:

(*tab)[i][j] = some_value();

You're not indexing into tab, you're indexing into what tab points to.

Remember that in C, declaration mimics use - the structure of a declarator in a declaration matches how it will look in the executable code. If you have a pointer to an int and you want to access the pointed-to value, you use the unary * operator:

x = *ptr;

The type of the expression *ptr is int, so the declaration of ptr is written

int *ptr;

Same thing for arrays, if the ith element of an array has type int, then the expression arr[i] has type int, and thus the declaration of arr is written as

int arr[N];

Thus, if you declare tab as

int (*tab)[x][y] = ...;

then to index into it, you must write

(*tab)[i][j] = ...;

The method I showed avoids this. Remember that the array subscript operation a[i] is defined as *(a + i) - given an address a, offset i elements (not bytes!) from a and dereference the result. Thus, the following relationship holds:

*a == *(a + 0) == a[0]

This is why you can use the [] operator on a pointer expression as well as an array expression. If you allocate a buffer as

T *p = malloc( sizeof *p * N );

you can access each element as p[i].

So, given a declaration like

T (*a)[M];

we have the relationship

 (*a)[i] == (*(a + 0))[i] == (a[0])[i] == a[0][i];

Thus, if we allocate the array as

T (*a)[M] = malloc( sizeof *a * N );

then we can index each element of a as

a[i][j] = some_value();

Array.reduce cannot assign through subscript: 'x' is a 'let' constant

Closure parameters (unless declared with inout) are implicitly constants, as if declared with let.
If you want to modify it then you have to make mutable copy first:

let stats = fields.reduce([String:String]()) { (res, field) -> [String:String] in
var res = res
res[field] = json[field] ?? ""
return res
}

As of Swift 4 you can use reduce(into:_:):

let stats = fields.reduce(into: [String:String]()) { (res, field) in
res[field] = json[field] ?? ""
}

Here res is an "inout" parameter and can be mutated in the closure.
This is far more efficient, because no copies are made in each
iteration step.

See also SE-0171 Reduce with inout.


A different Swift 4 solution would be create a new dictionary
by mapping each field to a key/value pair:

let stats = Dictionary(uniqueKeysWithValues: fields.map { ($0, json[$0] ?? "") })

Providing a default implementation for Collection conformance prevents additional subscript requirements

Two problems:

  1. Your declaration of the subscript in the SearchTree protocol needs to have { get } after it.

  2. Collection requires a subscript that returns its Element. You have two subscripts, one of which returns a String? and one of which returns a (key: Int, value: String), but neither of these is Element, which the compiler needs; therefore the type does not conform to Collection. If you define Element in either your protocol or in the extension, it should compile.

In the protocol:

associatedtype Element = (key: Int, value: String)

or:

associatedtype Element = String?

Or in the extension:

typealias Element = (key: Int, value: String)

or:

typealias Element = String?

EDIT:

The above is true for Swift 4; however, for Swift 3, you also need to define _Element in addition to Element. Copying and pasting your code into a project, the following declaration of the protocol causes everything to compile in Swift 3:

protocol SearchTree: Collection {
associatedtype Element = (key: Int, value: String)
associatedtype _Element = (key: Int, value: String)

subscript(key: Int) -> String? { get }
}

How can I specify a generic constraint that enforces a subscript in Swift?

You can use a protocol that implements subscript functionality. For example:

protocol Container {
typealias ItemType
mutating func append(item: ItemType)
var count: Int { get }
subscript(i: Int) -> ItemType { get }
}

func index<T:Container, U where U == T.ItemType>(x:T) -> U {
return x[0]
}


Related Topics



Leave a reply



Submit