What Are Reified Generics? How Do They Solve Type Erasure Problems and Why Can't They Be Added Without Major Changes

What are Reified Generics? How do they solve Type Erasure problems and why can't they be added without major changes?

The whole point is that reified generics have support in the compiler for preserving type information, whereas type erased generics don't. AFAIK, the whole point of having type erasure in the first place was to enable backwards compatibility (e.g. lower versioned JVMs could still understand generic classes).

You can explicitly add the type information in the implementation, as you have above, but that requires additional code every time the list is used, and is pretty messy in my opinion. Also, in this case, you still don't have runtime type checking for all of the list methods unless you add the checks yourself, however reified generics will ensure the runtime types.

Why should I care that Java doesn't have reified generics?

From the few times that I came across this "need", it ultimately boils down to this construct:

public class Foo<T> {

private T t;

public Foo() {
this.t = new T(); // Help?
}

}

This does work in C# assuming that T has a default constructor. You can even get the runtime type by typeof(T) and get the constructors by Type.GetConstructor().

The common Java solution would be to pass the Class<T> as argument.

public class Foo<T> {

private T t;

public Foo(Class<T> cls) throws Exception {
this.t = cls.newInstance();
}

}

(it does not necessarily need to be passed as constructor argument, as a method argument is also fine, the above is just an example, also the try-catch is omitted for brevity)

For all other generic type constructs, the actual type can easily be resolved with a bit help of reflection. The below Q&A illustrate the use cases and possibilities:

  • Get generic type of java.util.List
  • How to get the generic type at runtime?
  • Get actual type of generic type argument on abstract superclass

Why not remove type erasure from the next JVM?

To some extent erasure will be removed in the future with project valhalla to enable specialized implementations for value types.

Or to put it more accurately, type erasure really means the absence of type specialization for generics, and valhalla will introduce specialization over primitives.

Specifically I'm asking if there are any technical reasons why type erasure couldn't be removed in the next version of the JVM

Performance. You don't have to generate specialized code for all combinations of generic types, instances or generated classes don't have to carry type tags, polymorphic inline caches and runtime type checks (compiler-generated instanceof checks) stay simple and we still get most of the type-safety through compile-time checks.

Of course there are also plenty of downsides, but the tradeoff has already been made, and the question what would motivate the JVM devs to change that tradeoff.

And it might also be a compatibility thing, there could be code that performs unchecked casts to abuse generic collections by relying on type erasure that would break if the type constraints were enforced.

How does the reified keyword in Kotlin work?

TL;DR: What is reified good for

fun <T> myGenericFun(c: Class<T>) 

In the body of a generic function like myGenericFun, you can't access the type T because it's only available at compile time but erased at runtime. Therefore, if you want to use the generic type as a normal class in the function body you need to explicitly pass the class as a parameter as shown in myGenericFun.

If you create an inline function with a reified T, the type of T can be accessed even at runtime, and thus you do not need to pass the Class<T> additionally. You can work with T as if it was a normal class - e.g. you might want to check whether a variable is an instance of T, which you can easily do then: myVar is T.

Such an inline function with reified type T looks as follows:

inline fun <reified T> myGenericFun()


How reified works

You can only use reified in combination with an inline function. By doing so, you instruct the compiler to copy the function's bytecode to every spot the function is invoked from (the compiler "inlines" the function). When you call an inline function with reified type, the compiler has to be able to know the actual type passed as a type argument so that it can modify the generated bytecode to use the corresponding class directly. Therefore a call like myVar is T becomes myVar is String in the bytecode (if the type argument is String).



Example

Let's have a look at an example that shows how helpful reified can be.
We want to create an extension function for String called toKotlinObject that tries to convert a JSON string to a plain Kotlin object with a type specified by the function's generic type T. We can use com.fasterxml.jackson.module.kotlin for this and the first approach is the following:

a) First approach without reified type

fun <T> String.toKotlinObject(): T {
val mapper = jacksonObjectMapper()
//does not compile!
return mapper.readValue(this, T::class.java)
}

The readValue method takes a type that it is supposed to parse the JsonObject to. If we try to get the Class of the type parameter T, the compiler complains: "Cannot use 'T' as reified type parameter. Use a class instead."

b) Workaround with explicit Class parameter

fun <T: Any> String.toKotlinObject(c: KClass<T>): T {
val mapper = jacksonObjectMapper()
return mapper.readValue(this, c.java)
}

As a workaround, the Class of T can be made a method parameter, which then used as an argument to readValue. This works and is a common pattern in generic Java code. It can be called as follows:

data class MyJsonType(val name: String)

val json = """{"name":"example"}"""
json.toKotlinObject(MyJsonType::class)

c) The Kotlin way: reified

Using an inline function with reified type parameter T makes it possible to implement the function differently:

inline fun <reified T: Any> String.toKotlinObject(): T {
val mapper = jacksonObjectMapper()
return mapper.readValue(this, T::class.java)
}

There’s no need to take the Class of T additionally, T can be used as if it was an ordinary class. For the client the code looks like this:

json.toKotlinObject<MyJsonType>()

Important Note: Working with Java

An inlined function with reified type is not callable from Java code.

What is reification?

Reification is the process of taking an abstract thing and creating a concrete thing.

The term reification in C# generics refers to the process by which a generic type definition and one or more generic type arguments (the abstract thing) are combined to create a new generic type (the concrete thing).

To phrase it differently, it is the process of taking the definition of List<T> and int and producing a concrete List<int> type.

To understand it further, compare the following approaches:

  • In Java generics, a generic type definition is transformed to essentially one concrete generic type shared across all allowed type argument combinations. Thus, multiple (source code level) types are mapped to one (binary level) type - but as a result, information about the type arguments of an instance is discarded in that instance (type erasure).

    1. As a side effect of this implementation technique, the only generic type arguments that are natively allowed are those types that can share the binary code of their concrete type; which means those types whose storage locations have interchangeable representations; which means reference types. Using value types as generic type arguments requires boxing them (placing them in a simple reference type wrapper).
    2. No code is duplicated in order to implement generics this way.
    3. Type information that could have been available at runtime (using reflection) is lost. This, in turn, means that specialization of a generic type (the ability to use specialized source code for any particular generic argument combination) is very restricted.
    4. This mechanism doesn't require support from the runtime environment.
    5. There are a few workarounds to retain type information that a Java program or a JVM-based language can use.
  • In C# generics, the generic type definition is maintained in memory at runtime. Whenever a new concrete type is required, the runtime environment combines the generic type definition and the type arguments and creates the new type (reification). So we get a new type for each combination of the type arguments, at runtime.

    1. This implementation technique allows any kind of type argument combination to be instantiated. Using value types as generic type arguments does not cause boxing, since these types get their own implementation. (Boxing still exists in C#, of course - but it happens in other scenarios, not this one.)
    2. Code duplication could be an issue - but in practice it isn't, because sufficiently smart implementations (this includes Microsoft .NET and Mono) can share code for some instantiations.
    3. Type information is maintained, which allows specialization to an extent, by examining type arguments using reflection. However, the degree of specialization is limited, as a result of the fact that a generic type definition is compiled before any reification happens (this is done by compiling the definition against the constraints on the type parameters - thus, the compiler has to be able "understand" the definition even in the absence of specific type arguments).
    4. This implementation technique depends heavily on runtime support and JIT-compilation (which is why you often hear that C# generics have some limitations on platforms like iOS, where dynamic code generation is restricted).
    5. In the context of C# generics, reification is done for you by the runtime environment. However, if you want to more intuitively understand the difference between a generic type definition and a concrete generic type, you can always perform a reification on your own, using the System.Type class (even if the particular generic type argument combination you're instantiating didn't appear in your source code directly).
  • In C++ templates, the template definition is maintained in memory at compile time. Whenever a new instantiation of a template type is required in the source code, the compiler combines the template definition and the template arguments and creates the new type. So we get a unique type for each combination of the template arguments, at compile time.

    1. This implementation technique allows any kind of type argument combination to be instantiated.
    2. This is known to duplicate binary code but a sufficiently smart tool-chain could still detect this and share code for some instantiations.
    3. The template definition itself is not "compiled" - only its concrete instantiations are actually compiled. This places fewer constraints on the compiler and allows a greater degree of template specialization.
    4. Since template instantiations are performed at compile time, no runtime support is needed here either.
    5. This process is lately referred to as monomorphization, especially in the Rust community. The word is used in contrast to parametric polymorphism, which is the name of the concept that generics come from.

Why have the reified keyword in Kotlin, isn't marking a function inline sufficient?

Reified type parameters are requiring type arguments passed in them to be reified as well. Sometimes it's an impossible requirement (for instance, class parameters can't be reified), so making all parameters of inline functions reified by default would make it impossible to call ALL inline functions in cases when now it's only impossible to call ones with reified type parameters:

inline fun<T> genericFun(x: T)  {}
inline fun<reified T> reifiedGenericFun(x: T) {}

class SimpleGenericClass<T>() {
fun f(x: T) {
genericFun<T>(x) //compiles fine
reifiedGenericFun<T>(x) //compilation error
}
}

UPDATE. Why not automatically infer "reifibility" based on the context?

  1. Approach 1 (suggested by @Tenfour04): Analyze code of inlined function and consider its type parameter as reified if it has T::class calls (I'd also added is T calls).
  2. Approach 2 (suggested by @SillyQuestion): Consider all type parameters of inline functions as reified by default; if it leads to compilation error on usage site, then fallback to non-reified type.

Here is a counter-example to both: "a" as? T. Function with this body would have different semantics depending on whether or not its type parameter is declared (or, hypothetically, inferred) as reified:

inline fun<reified T> castToReifiedGenericType() = "a" as? T
inline fun<T> castToSimpleGenericType() = "a" as? T

fun main() {
println(castToReifiedGenericType<Int>()) //null
println(castToSimpleGenericType<Int>()) //a
}

/*P.S. unsafe cast ("a" as T) also have different semantics for reified and non-reified T,
causing `ClassCastException` in the first case and still returning "a" in the latter.*/

So, with the first approach, semantics would change if we add a meaningless call to T::class/is T somewhere inside the inline function. With second - semantics would change if we call this function from the new site (where T can't be reified, while it was "reifiable" before), or, сonversely, remove a call from this site (allowing it to be reified).

Debugging problems coming from these actions (at first glance unrelated to observing semantic changes) is way more complex and panic-inducing, than adding/reading an explicit reified keyword.

We can make class Foo T, why can't I call new T()?

Because you cannot know if T is even instantiable, it could have a private constructor.

Imagine:

class Foo<T> {

public Foo() {
new T();
}
}

class Bar {
private Bar() {}
}

class FooBar {
public FooBar() {
Foo<Bar> foo = new Foo<>();
}
}


Related Topics



Leave a reply



Submit