Indirect Enums and Structs

indirect enums and structs

The indirect keyword introduces a layer of indirection behind the scenes.

You indicate that an enumeration case is recursive by writing indirect before it, which tells the compiler to insert the necessary layer of indirection.

From here

The important part of structs and enums is that they're a constant size. Allowing recursive structs or enums directly would violate this, as there would be an indeterminable number of recursions, hence making the size non constant and unpredictable. indirect uses a constant size reference to refer to a constant size enum instance.

There's a different between the two code snippets you show.

  1. The first piece of code makes BinaryTree<T> stored by a reference everywhere it's used.

  2. The second piece of code makes BinaryTree<T> stored by a reference only in the case of node. I.e. BinaryTree<T> generally has its value stored directly, except for this explicitly indirect node case.

When exactly do I need indirect with writing recursive enums?

I think part of the confusion stems from this assumption:

I thought arrays and tuples have the same memory layout, and that is why you can convert arrays to tuples using withUnsafeBytes and then binding the memory...

Arrays and tuples don't have the same memory layout:

  • Array<T> is a fixed-size struct with a pointer to a buffer which holds the array elements contiguously* in memory
    • Contiguity is promised only in the case of native Swift arrays [not bridged from Objective-C]. NSArray instances do not guarantee that their underlying storage is contiguous, but in the end this does not have an effect on the code below.
  • Tuples are fixed-size buffers of elements held contiguously in memory

The key thing is that the size of an Array<T> does not change with the number of elements held (its size is simply the size of a pointer to the buffer), while a tuple does. The tuple is more equivalent to the buffer the array holds, and not the array itself.

Array<T>.withUnsafeBytes calls Array<T>.withUnsafeBufferPointer, which returns the pointer to the buffer, not to the array itself. *(In the case of a non-contiguous bridged NSArray, _ArrayBuffer.withUnsafeBufferPointer has to create a temporary contiguous copy of its contents in order to return a valid buffer pointer to you.)


When laying out memory for types, the compiler needs to know how large the type is. Given the above, an Array<Foo> is statically known to be fixed in size: the size of one pointer (to a buffer elsewhere in memory).

Given

enum Foo {
case one((Foo, Foo))
}

in order to lay out the size of Foo, you need to figure out the maximum size of all of its cases. It has only the single case, so it would be the size of that case.

Figuring out the size of one requires figuring out the size of its associated value, and the size of a tuple of elements is the sum of the size of the elements themselves (taking into account padding and alignment, but we don't really care about that here).

Thus, the size of Foo is the size of one, which is the size of (Foo, Foo) laid out in memory. So, what is the size of (Foo, Foo)? Well, it's the size of Foo + the size of Foo... each of which is the size of Foo + the size of Foo... each of which is the size of Foo + the size of Foo...

Where Array<Foo> had a way out (Array<T> is the same size regardless of T), we're stuck in an infinite loop with no base case.

indirect is the keyword required to break out of the recursion and give this infinite reference a base case. It inserts an implicit pointer by making a given case the fixed size of a pointer, regardless of what it contains or points to. That makes the size of one fixed, which allows Foo to have a fixed size.

indirect is less about Foo referring to Foo in any way, and more about allowing an enum case to potentially contain itself indirectly (because direct containment would lead to an infinite loop).


As an aside, this is also why a struct cannot contain a direct instance of itself:

struct Foo {
let foo: Foo // error: Value type 'Foo' cannot have a stored property that recursively contains it
}

would lead to infinite recursion, while

struct Foo {
let foo: UnsafePointer<Foo>
}

is fine.

structs don't support the indirect keyword (at least in a struct, you have more direct control over storage and layout), but there have been pitches for adding support for this on the Swift forums.

enums - indirect case is reference for associated value, why/how is it persistent?

An indirect case is implemented as a reference to a heap allocated box containing the payload. There's no copy-on-write because enumerations are immutable (i.e you have to re-assign to self to actually do a 'mutation'), so value semantics is preserved even with such a reference.

You could re-write your example like this and the behaviour wouldn't change:

enum List<Element> {

final class NodeRef {

let element: Element
let next: List

init(element: Element, next: List) {
self.element = element
self.next = next
}
}

case end
case anotherEnd
case node(NodeRef)
}

var listA = List<Int>.end
var listB = List<Int>.node(List.NodeRef(element: 5, next: listA))
listA = .anotherEnd

if case let .node(n) = listB {
print(n.next) // end
}

You're just changing the value of listA, it has no bearing on the value of listB.

Working with typedef enums in structs and avoiding type mixing warnings

If you really, literally want to initialize all the memory of a struct to 0, then that's spelled

memset(&bar, 0, sizeof(bar_t));

That's actually fairly common, and it even tends to work, but technically it's incorrect for what most people actually want. It is not guaranteed to do the right thing for most element types, because C makes fewer guarantees than many people think about the representations of values of different types, and about the meaning of various bit patterns.

The correct way to initialize an aggregate object (such as a struct) as if assigning the value zero to every element is what you started with, though the canonical way to write it is simply

bar_t bar = { 0 };

(no need for the 'u' suffix). There is a remote possibility that that variant would avoid the compiler warning, though I wouldn't really expect so.

Ashalynd gives the answer I think is most correct (that is,

bar_t bar = { foo1 };

). If that somehow violates code conventions with which you must comply, then perhaps you could reorganize your struct so that the first element is of any type other than an enum type.

Typedef with or without typename on structs and enum, what are pro/cons for each and what is best practice?

In a typedef declaration similar to this

// 1
typedef struct {
int value;
} Foo;

you can not refer to the structure within the structure definition. So for example you can not declare a structure that denotes a node of a linked list because within the structure you need to refer to a pointer to the structure type that points to the next node.

In this typedef definition

// 3
typedef struct Foo_s {
int value;
} Foo_t;

two different names Foo_s and Foo_t only confuse readers of code. For example it can be written like

struct Foo_s f = malloc( sizeof( Foo_t ) );

or

Foo_t f = malloc( sizeof( struct Foo_s ) );

This typedef definition

// 4
typedef struct Foo {
int value;
} Foo_t;

has the same problem. For example if you will encounter two declarations like this

struct Foo f1;

and

Foo_t f2;

it is difficult to say whether the both types are the same.

So I prefer the following typedef definition

// 2
typedef struct Foo {
int value;
} Foo;

Now you know that the identifier Foo is reserved for the structure.



Related Topics



Leave a reply



Submit