What Is the In-Practice Difference Between Generic and Protocol-Typed Function Parameters

What is the in-practice difference between generic and protocol-typed function parameters?

(I realise that OP is asking less about the language implications and more about what the compiler does – but I feel it's also worthwhile also to list the general differences between generic and protocol-typed function parameters)

1. A generic placeholder constrained by a protocol must be satisfied with a concrete type

This is a consequence of protocols not conforming to themselves, therefore you cannot call generic(some:) with a SomeProtocol typed argument.

struct Foo : SomeProtocol {
var someProperty: Int
}

// of course the solution here is to remove the redundant 'SomeProtocol' type annotation
// and let foo be of type Foo, but this problem is applicable anywhere an
// 'anything that conforms to SomeProtocol' typed variable is required.
let foo : SomeProtocol = Foo(someProperty: 42)

generic(some: something) // compiler error: cannot invoke 'generic' with an argument list
// of type '(some: SomeProtocol)'

This is because the generic function expects an argument of some type T that conforms to SomeProtocol – but SomeProtocol is not a type that conforms to SomeProtocol.

A non-generic function however, with a parameter type of SomeProtocol, will accept foo as an argument:

nonGeneric(some: foo) // compiles fine

This is because it accepts 'anything that can be typed as a SomeProtocol', rather than 'a specific type that conforms to SomeProtocol'.

2. Specialisation

As covered in this fantastic WWDC talk, an 'existential container' is used in order to represent a protocol-typed value.

This container consists of:

  • A value buffer to store the value itself, which is 3 words in length. Values larger than this will be heap allocated, and a reference to the value will be stored in the value buffer (as a reference is just 1 word in size).

  • A pointer to the type's metadata. Included in the type's metadata is a pointer to its value witness table, which manages the lifetime of value in the existential container.

  • One or (in the case of protocol composition) multiple pointers to protocol witness tables for the given type. These tables keep track of the type's implementation of the protocol requirements available to call on the given protocol-typed instance.

By default, a similar structure is used in order to pass a value into a generic placeholder typed argument.

  • The argument is stored in a 3 word value buffer (which may heap allocate), which is then passed to the parameter.

  • For each generic placeholder, the function takes a metadata pointer parameter. The metatype of the type that's used to satisfy the placeholder is passed to this parameter when calling.

  • For each protocol constraint on a given placeholder, the function takes a protocol witness table pointer parameter.

However, in optimised builds, Swift is able to specialise the implementations of generic functions – allowing the compiler to generate a new function for each type of generic placeholder that it's applied with. This allows for arguments to always be simply passed by value, at the cost of increasing code size. However, as the talk then goes onto say, aggressive compiler optimisations, particularly inlining, can counteract this bloat.

3. Dispatch of protocol requirements

Because of the fact that generic functions are able to be specialised, method calls on generic arguments passed in are able to be statically dispatched (although obviously not for types that use dynamic polymorphism, such as non-final classes).

Protocol-typed functions however generally cannot benefit from this, as they don't benefit from specialisation. Therefore method calls on a protocol-typed argument will be dynamically dispatched via the protocol witness table for that given argument, which is more expensive.

Although that being said, simple protocol-typed functions may be able to benefit from inlining. In such cases, the compiler is able to eliminate the overhead of the value buffer and protocol and value witness tables (this can be seen by examining the SIL emitted in a -O build), allowing it to statically dispatch methods in the same way as generic functions. However, unlike generic specialisation, this optimisation is not guaranteed for a given function (unless you apply the @inline(__always) attribute – but usually it's best to let the compiler decide this).

Therefore in general, generic functions are favoured over protocol-typed functions in terms of performance, as they can achieve static dispatch of methods without having to be inlined.

4. Overload resolution

When performing overload resolution, the compiler will favour the protocol-typed function over the generic one.

struct Foo : SomeProtocol {
var someProperty: Int
}

func bar<T : SomeProtocol>(_ some: T) {
print("generic")
}

func bar(_ some: SomeProtocol) {
print("protocol-typed")
}

bar(Foo(someProperty: 5)) // protocol-typed

This is because Swift favours an explicitly typed parameter over a generic one (see this Q&A).

5. Generic placeholders enforce the same type

As already said, using a generic placeholder allows you to enforce that the same type is used for all parameters/returns that are typed with that particular placeholder.

The function:

func generic<T : SomeProtocol>(a: T, b: T) -> T {
return a.someProperty < b.someProperty ? b : a
}

takes two arguments and has a return of the same concrete type, where that type conforms to SomeProtocol.

However the function:

func nongeneric(a: SomeProtocol, b: SomeProtocol) -> SomeProtocol {
return a.someProperty < b.someProperty ? b : a
}

carries no promises other than the arguments and return must conform to SomeProtocol. The actual concrete types that are passed and returned do not necessarily have to be the same.

Difference between using Generic and Protocol as type parameters, what are the pros and cons of implement them in a function

There is actually a video from this year's WWDC about that (it was about performance of classes, structs and protocols; I don't have a link but you should be able to find it).

In your second function, where you pass a any value that conforms to that protocol, you are actually passing a container that has 24 bytes of storage for the passed value, and 16 bytes for type related information (to determine which methods to call, ergo dynamic dispatch). If the passed value is now bigger than 24 bytes in memory, the object will be allocated on the heap and the container stores a reference to that object! That is actually extremely time consuming and should certainly be avoided if possible.

In your first function, where you use a generic constraint, there is actually created another function by the compiler that explicitly performs the function's operations upon that type. (If you use this function with lots of different types, your code size may, however, increase significantly; see C++ code bloat for further reference.) However, the compiler can now statically dispatch the methods, inline the function if possible and does certainly not have to allocate any heap space. Stated in the video mentioned above, code size does not have to increase significantly as code can still be shared... so the function with generic constraint is certainly the way to go!

Differences generic protocol type parameter vs direct protocol type

The key confusion is that Swift has two concepts that are spelled the same, and so are often ambiguous. One of the is struct T: A {}, which means "T conforms to the protocol A," and the other is var a: A, which means "the type of variable a is the existential of A."

Conforming to a protocol does not change a type. T is still T. It just happens to conform to some rules.

An "existential" is a compiler-generated box the wraps up a protocol. It's necessary because types that conform to a protocol could be different sizes and different memory layouts. The existential is a box that gives anything that conforms to protocol a consistent layout in memory. Existentials and protocols are related, but not the same thing.

Because an existential is a run-time box that might hold any type, there is some indirection involved, and that can introduce a performance impact and prevents certain optimizations.

Another common confusion is understanding what a type parameter means. In a function definition:

func f<T>(param: T) { ... }

This defines a family of functions f<T>() which are created at compile time based on what you pass as the type parameter. For example, when you call this function this way:

f(param: 1)

a new function is created at compile time called f<Int>(). That is a completely different function than f<String>(), or f<[Double]>(). Each one is its own function, and in principle is a complete copy of all the code in f(). (In practice, the optimizer is pretty smart and may eliminate some of that copying. And there are some other subtleties related to things that cross module boundaries. But this is a pretty decent way to think about what is going on.)

Since specialized versions of generic functions are created for each type that is passed, they can in theory be more optimized, since each version of the function will handle exactly one type. The trade-off is that they can add code-bloat. Do not assume "generics are faster than protocols." There are reasons that generics may be faster than protocols, but you have to actually look at the code generation and profile to know in any particular case.

So, walking through your examples:

func direct(a: A) {
// Doesn't work
let _ = A.init(someInt: 1)
}

A protocol (A) is just a set of rules that types must conform to. You can't construct "some unknown thing that conforms to those rules." How many bytes of memory would be allocated? What implementations would it provide to the rules?

func indirect<T: A>(a: T) {
// Works
let _ = T.init(someInt: 1)
}

In order to call this function, you must pass a type parameter, T, and that type must conform to A. When you call it with a specific type, the compiler will create a new copy of indirect that is specifically designed to work with the T you pass. Since we know that T has a proper init, we know the compiler will be able to write this code when it comes time to do so. But indirect is just a pattern for writing functions. It's not a function itself; not until you give it a T to work with.

let a: A = B(someInt: 0)

// Works
direct(a: a)

a is an existential wrapper around B. direct() expects an existential wrapper, so you can pass it.

// Doesn't work
indirect(a: a)

a is an existential wrapper around B. Existential wrappers do not conform to protocols. They require things that conform to protocols in order to create them (that's why they're called "existentials;" the fact that you created one proves that such a value actually exists). But they don't, themselves, conform to protocols. If they did, then you could do things like what you've tried to do in direct() and say "make a new instance of an existential wrapper without knowing exactly what's inside it." And there's no way to do that. Existential wrappers don't have their own method implementations.

There are cases where an existential could conform to its own protocol. As long as there are no init or static requirements, there actually isn't a problem in principle. But Swift can't currently handle that. Because it can't work for init/static, Swift currently forbids it in all cases.

Why use generics when you can just use types

This is just one benefit that I can immediately think of. I'm sure there are lots more.

Let's say We have two classes A and B that both conform to Amazing.

If we pass A() into this function:

func doMoreCoolThings<T: Amazing>(awesome: T) -> T { ... }

like this:

let val = doMoreCoolThings(awesome: A())

We are sure that val is of type A and the compiler knows that too. This means we can access A's members using val.

On the other hand if we pass A() into this function:

func doMoreCoolThings(awesome: Amazing) -> Amazing { ... }

like this:

let val = doMoreCoolThings(awesome: A())

val's compile time type is Amazing. The compiler does not know what type of Amazing it is. Is it A or is it B or is it something else? The compiler doesn't know. You will have to cast the result to A in order to access its members.

let a = val as! A

The cast is also not guaranteed to succeed.

If you put these casts everywhere your code will soon become very messy.

When to use generics in Swift

In the case you have provided you are correct. It doesn't necessarily add anything by making it generic.

But take the example where you have some protocol MyProtocol and you want to create a function that takes two of these and returns a third. But the function only works if first and second are of the same type...

func combine(first: MyProtocol, second: MyProtocol) -> MyProtocol {
// do some combining here.
}

Now it's less well defined because first and second can be of different types here. The only thing that is required is that they conform to the protocol. And what is the return type?

Now consider...

function combine<T: MyProtocol>(first: T, second: T) -> T {
// do some combining here
}

Now the function is generic but what that adds is that still first and second must conform to the protocol. But now they must be of the same type. And the function will return another item of the same type as first and second.

In this case you definitely benefit from using generics rather than just the protocol.

Is there a practical difference between a type constraint on a generic type directly vs using a 'where' clause?

There is no difference. The first form

func testX<T>(value: T) where T: StringProtocol

was introduced with SE-0081 Move where clause to end of declaration to increase readability, in particular for longer lists of constraints. The rationale was to remove the where clause out of the generic parameter list, for example

func foo<S: Sequence where S.Element == Int>(seq: S)

became

func foo<S: Sequence>(seq: S) where S.Element == Int

in Swift 3. As a side-effect, even simple constraints such as
your T: StringProtocol can be moved to the newly introduced where-clause.

Why can we not cast to protocol types with associated types but achieve the same effect using generics?

Protocol-typed values are represented using an 'existential container' (see this great WWDC talk on them; or on Youtube), which consists of a value-buffer of fixed size in order to store the value (if the value size exceeds this, it'll heap allocate), a pointer to the protocol witness table in order to lookup method implementations and a pointer to the value witness table in order to manage the lifetime of the value.

Unspecialised generics use pretty much the same format (I go into this in slightly more depth in this Q&A) – when they're called, pointers to the protocol and value witness tables are passed to the function, and the value itself is stored locally inside the function using a value-buffer, which will heap allocate for values larger than that buffer.

Therefore, because of the sheer similarity in how these are implemented, we can draw the conclusion that not being able to talk in terms of protocols with associated types or Self constraints outside of generics is just a current limitation of the language. There's no real technical reason why it's not possible, it just hasn't been implemented (yet).

Here's an excerpt from the Generics Manifesto on "Generalized existentials", which discusses how this could work in practice:

The restrictions on existential types came from an implementation
limitation, but it is reasonable to allow a value of protocol type
even when the protocol has Self constraints or associated types. For
example, consider IteratorProtocol again and how it could be used as
an existential:

protocol IteratorProtocol {
associatedtype Element
mutating func next() -> Element?
}

let it: IteratorProtocol = ...
it.next() // if this is permitted, it could return an "Any?", i.e., the existential that wraps the actual element

Additionally, it is reasonable to want to constrain the associated
types of an existential, e.g., "a Sequence whose element type is
String" could be expressed by putting a where clause into
protocol<...> or Any<...> (per "Renaming protocol<...> to Any<...>"):

let strings: Any<Sequence where .Iterator.Element == String> = ["a", "b", "c"]

The leading . indicates that we're talking about the dynamic type,
i.e., the Self type that's conforming to the Sequence protocol.
There's no reason why we cannot support arbitrary where clauses within
the Any<...>.

And from being able to type a value as a protocol with an associated type, it's but a short step to allow for type-casting to that given type, and thus allow something like your first extension to compile.

Understanding swift generics vs treating parameters as a protocol or base type

In the case of protocols, it depends on the protocol itself. If the protocol uses Self or a typealias, it cannot be used directly. For any other protocol, you can declare variables and parameters of type protocol<MyProtocol>, e.g., var o: protocol<MyProtocol>.

The reason you can't say var o: protocol<Equatable> is because the Equatable protocol is designed in a way that certain constraints it declares (in this case Self) must be satisfied, and thus it can only be used as a generic type constraint. In other words, the compiler must be able to figure out at compile time what Self is in regards to anything that is Equatable, and it cannot (always) do that in var o: protocol<Equatable>.

Why use generics rather than protocols or base classes? Because generics can be even more general than those while still being type-safe. This is especially useful, for instance, with something like a callback. Here's a very contrived example:

class Useless<T> {
private let o: T
private let callback: (T, String) -> Void
required init(o: T, callback: (T, String) -> Void) {
self.o = o
self.callback = callback
}
func publish(message: String) {
callback(o, message)
}
}

var useless = Useless(o: myObject) { obj, message in
// Here in the callback I get type safety.
obj.someMethod(message)
}

(This code has never been run by anyone ever. It should be regarded as pseudo-code.)

Now, this is a pretty silly example for many reasons, but it illustrates the point. Thanks to generics, the obj parameter of the callback is entirely type-safe. This could not be done with base classes or protocols because we could never anticipate what code might get called in the callback. The Useless class can take any type as its T.



Related Topics



Leave a reply



Submit