What Is Monomorphisation with Context to C++

What is monomorphisation with context to C++?

Monomorphization means generating specialized versions of generic functions. If I write a function that extracts the first element of any pair:

fn first<A, B>(pair: (A, B)) -> A {
let (a, b) = pair;
return a;
}

and then I call this function twice:

first((1, 2));
first(("a", "b"));

The compiler will generate two versions of first(), one specialized to pairs of integers and one specialized to pairs of strings.

The name derives from the programming language term "polymorphism" — meaning one function that can deal with many types of data. Monomorphization is the conversion from polymorphic to monomorphic code.

What is reification?

Reification is the process of taking an abstract thing and creating a concrete thing.

The term reification in C# generics refers to the process by which a generic type definition and one or more generic type arguments (the abstract thing) are combined to create a new generic type (the concrete thing).

To phrase it differently, it is the process of taking the definition of List<T> and int and producing a concrete List<int> type.

To understand it further, compare the following approaches:

  • In Java generics, a generic type definition is transformed to essentially one concrete generic type shared across all allowed type argument combinations. Thus, multiple (source code level) types are mapped to one (binary level) type - but as a result, information about the type arguments of an instance is discarded in that instance (type erasure).

    1. As a side effect of this implementation technique, the only generic type arguments that are natively allowed are those types that can share the binary code of their concrete type; which means those types whose storage locations have interchangeable representations; which means reference types. Using value types as generic type arguments requires boxing them (placing them in a simple reference type wrapper).
    2. No code is duplicated in order to implement generics this way.
    3. Type information that could have been available at runtime (using reflection) is lost. This, in turn, means that specialization of a generic type (the ability to use specialized source code for any particular generic argument combination) is very restricted.
    4. This mechanism doesn't require support from the runtime environment.
    5. There are a few workarounds to retain type information that a Java program or a JVM-based language can use.
  • In C# generics, the generic type definition is maintained in memory at runtime. Whenever a new concrete type is required, the runtime environment combines the generic type definition and the type arguments and creates the new type (reification). So we get a new type for each combination of the type arguments, at runtime.

    1. This implementation technique allows any kind of type argument combination to be instantiated. Using value types as generic type arguments does not cause boxing, since these types get their own implementation. (Boxing still exists in C#, of course - but it happens in other scenarios, not this one.)
    2. Code duplication could be an issue - but in practice it isn't, because sufficiently smart implementations (this includes Microsoft .NET and Mono) can share code for some instantiations.
    3. Type information is maintained, which allows specialization to an extent, by examining type arguments using reflection. However, the degree of specialization is limited, as a result of the fact that a generic type definition is compiled before any reification happens (this is done by compiling the definition against the constraints on the type parameters - thus, the compiler has to be able "understand" the definition even in the absence of specific type arguments).
    4. This implementation technique depends heavily on runtime support and JIT-compilation (which is why you often hear that C# generics have some limitations on platforms like iOS, where dynamic code generation is restricted).
    5. In the context of C# generics, reification is done for you by the runtime environment. However, if you want to more intuitively understand the difference between a generic type definition and a concrete generic type, you can always perform a reification on your own, using the System.Type class (even if the particular generic type argument combination you're instantiating didn't appear in your source code directly).
  • In C++ templates, the template definition is maintained in memory at compile time. Whenever a new instantiation of a template type is required in the source code, the compiler combines the template definition and the template arguments and creates the new type. So we get a unique type for each combination of the template arguments, at compile time.

    1. This implementation technique allows any kind of type argument combination to be instantiated.
    2. This is known to duplicate binary code but a sufficiently smart tool-chain could still detect this and share code for some instantiations.
    3. The template definition itself is not "compiled" - only its concrete instantiations are actually compiled. This places fewer constraints on the compiler and allows a greater degree of template specialization.
    4. Since template instantiations are performed at compile time, no runtime support is needed here either.
    5. This process is lately referred to as monomorphization, especially in the Rust community. The word is used in contrast to parametric polymorphism, which is the name of the concept that generics come from.

How to use mail filter context data?

Thanks to glts on Github, the author of the crate, the problem was that the string slices passed into the handle_header method were not borrowed by the external code that stores the data pointer so by the time that handle_eom is called, the memory has been reused for something else.

All I had to do was change Context<&str> to Context<String> and convert the strings using mystr.to_owned() and in the reverse direction val = &*mystring

GHC fails to bind polymorphic function without monomorphising it

Part 1

The Good

What happens when you write

a :: forall f. Functor f => f ()
a = _

? Specifically, what type of expression is GHC looking for to fill the hole (_)? You might think it's looking for a forall f. Functor f => f (), but that's not quite right. Instead, a is actually a bit more like a function, and GHC implicitly inserts two arguments: one type argument named f with kind * -> *, and one instance of the constraint Functor f (which is unnamed, like all instances).

a :: forall f. Functor f => f ()
a @f {Functor f} = _
-- @-syntax is a currently proposed extension, {}-syntax is fake

GHC is looking for, in the context of type f :: * -> *; instance Functor f, a f (). This is the same difference as between looking for a (A -> B) -> C -> D and looking for a D in the context of f :: A -> B; c :: C. If I directly have solution :: (A -> B) -> C -> D, in the first case I can just write solution, but, in the second case, I must write out solution f c.

The Bad

This is not what happens when you write

a :: forall f. Functor f => f () = _

Because you've used a pattern type signature, not a normal one, GHC no longer implicitly binds the type variable and the instance for you. GHC now, honestly and truly, wants you to fill that _ with a forall f. Functor f => f (). This is hard, as we'll see soon...

(I really don't think the thing Daniel Wagner has quoted is relevant here. I believe it's just referring to the difference (when ScopedTypeVariables is on and type a is not in scope) between the way that 5 :: Num a => a implicitly means 5 :: forall a. Num a => a and the way g :: a -> a = id doesn't mean g :: forall a. a -> a = id.)

Part 2

What happens when you write

undefined

? Specifically, what is its type? You might think it's forall a. a, but that's not quite right. Yes, it's true that, all by itself, undefined has the type forall a. a, but GHC doesn't let you write undefined all by itself. Instead, every occurrence of undefined in an expression is always applied to a type argument. The above is implicitly rewritten to

undefined @_a0

and a new unification variable (which doesn't really have a name) _a0 is created. This expression has type _a0. If I use this expression in a context that requires an Int, then _a0 must be equal to Int, and GHC sets _a0 := Int, rewriting the expression to

undefined @Int

(Because _a0 can be set to something, it's called a unification variable. It's under "our", internal control. Above, f couldn't be set to anything. It's a given, and its under "their" (the user's), external control, which makes it a skolem variable.)

Part 3

The Good

Usually, the automatic binding of type variables and the automatic application of unification variables works well. E.g., both of these work out fine:

undefined :: forall a. a
bot :: forall a. a
bot = undefined

Because they respectively expand as

(\@a -> undefined @a) :: forall a. a
bot :: forall a. a
bot @a = undefined @a

The Medium

When you do

a :: forall f. Functor f => f () = undefined

, you make something very strange happen. As I said before, a pattern type signature with a forall doesn't introduce anything. The undefined on the RHS actually needs to be a forall f. Functor f => f (). The implicit application of unification variables still occurs:

a :: forall f. Functor f => f () = undefined @_a0

undefined @_a0 :: _a0, so _a0 ~ (forall f. Functor f => f ()) must hold. GHC thus has to set _a0 := (forall f. Functor f => f ()). Usually, this is a no-no, because this is impredicative polymorphism: setting a type variable to a type containing foralls. However, in sufficiently outdated GHCs, this is allowed for certain functions. That is, undefined is not defined with the type forall (a :: *). a, but forall (a :: OpenKind). a, where OpenKind allows for this impredicativity. That means your code goes through as

a :: forall f. Functor f => f () = undefined @(forall f. Functor f => f ())

If you write

a :: forall f. Functor f => f () = bot

, it won't work, as bot doesn't have the same magic sauce that undefined has. Additionally, this won't work in more recent GHCs, which have stamped out this weird kind of impredicative polymorphism. (Which I say is a very good thing).

The Bad

Your definition of a, even with the pattern signature, does indeed come out with the desired type forall f. Functor f => f (). The issue is now in g:

g :: forall f. Functor f => f () = a

g, again, doesn't bind type f :: * -> * or instance Functor f. Meanwhile, a gets applied to some implicit stuff:

g :: forall f. Functor f => f () = a @_f0 {_}

But... the RHS now has type _f0 (), and the LHS wants it to have type forall f. Functor f => f (). These don't look alike. Therefore, type error.

As you can't stop the implicit application of a to a type variable and just write g = a, you must instead allow the implicit binding of type variables in g:

g :: forall f. Functor f => f ()
g = a

This works.

Why do I get a size error when using a type alias of `dyn Trait`?

All functions need to know the size of their arguments at compile time. However, you are using w whose size cannot be determined at compile time. Instead to be able to perform dynamic dispatch, you need to use a trait object. You can accomplish this in 2 ways.

You can pass a pointer by using a Box:

fn edit(w: Box<Word>)

or a reference:

fn edit(w: &Word)

Another option you have is to avoid the use of trait objects altogether. You can instead make edit a generic method as follows:

fn edit(w: impl AsRef<str>)

or

fn edit<W: AsRef<str>>(w: W)

With this approach, the compiler performs what is known as "monomorphization" where it determines at compile time exactly which types are being passed in for your generic arguments and generates multiple non-generic versions for each of those calls with those specific types. Monomorphization is explained quite well in What is monomorphisation with context to C++?.



Related Topics



Leave a reply



Submit