Java Generics Type Erasure: When and What Happens

What is the concept of erasure in generics in Java?

It's basically the way that generics are implemented in Java via compiler trickery. The compiled generic code actually just uses java.lang.Object wherever you talk about T (or some other type parameter) - and there's some metadata to tell the compiler that it really is a generic type.

When you compile some code against a generic type or method, the compiler works out what you really mean (i.e. what the type argument for T is) and verifies at compile time that you're doing the right thing, but the emitted code again just talks in terms of java.lang.Object - the compiler generates extra casts where necessary. At execution time, a List<String> and a List<Date> are exactly the same; the extra type information has been erased by the compiler.

Compare this with, say, C#, where the information is retained at execution time, allowing code to contain expressions such as typeof(T) which is the equivalent to T.class - except that the latter is invalid. (There are further differences between .NET generics and Java generics, mind you.) Type erasure is the source of many of the "odd" warning/error messages when dealing with Java generics.

Other resources:

  • Oracle documentation
  • Wikipedia
  • Gilad Bracha's Java generics guide (PDF - highly recommended; link may need to change periodically)
  • Angelika Langer's Java Generics FAQ

Java - What's the difference between type erasure and type inference?

Both of them serve completely different needs:

Type erasure is like you said and is needed because java byte code is not generic hence you need to remove the typing. This isn't a feature that helps you code it's just an automatic compile time change that has to happen for the jvm to understand your code.

Type inference on the other hand is the compiler being "smart" and knowing what type you were referring to even though you didn't actually write it. Just like in your example the compiler knows that Box<>() actually means Box<String>() and lets you continue coding with type safety as if you wrote Box<String>. This way you can write less verbose code and the compiler would still understand it.

You can understand from all of this that generics in Java are actually mostly a compile time thing that lets you code with more safety and helps you find errors in compile-time rather than run-time.

How Type Erasure work in java?

Type erasure happens at compile time. Java compiler removes those generic type information from source and adds casts as needed and delivers the byte code. Therefore the generated byte code will not have any information about type parameters and arguments. It will look like a old java code without generics. There is no way to determine the value of T at runtime, because that information is removed before the code is compiled.

What are the benefits of Java's types erasure?

Type Erasure Is Good

Let's stick to the facts

A lot of the answers thus far are overly concerned with the Twitter user. It's helpful to keep focused on the messages and not the messenger. There is a fairly consistent message with even just the excerpts mentioned thus far:

It's funny when Java users complain about type erasure, which is the only thing Java got right, while ignoring all the things it got wrong.

I get huge benefits (e.g. parametricity) and nil cost (alleged cost is a limit of imagination).

new T is a broken program. It is isomorphic to the claim "all propositions are true." I am not big into this.

A goal: reasonable programs

These tweets reflect a perspective that is not interested in whether we can make the machine do something, but more whether we can reason that the machine will do something we actually want. Good reasoning is a proof. Proofs can be specified in formal notation or something less formal. Regardless of the specification language, they must be clear and rigorous. Informal specifications are not impossible to structure correctly, but are often flawed in practical programming. We end up with remediations like automated and exploratory tests to make up for the problems we have with informal reasoning. This is not to say that testing is intrinsically a bad idea, but the quoted Twitter user is suggesting that there is a much better way.

So our goal is to have correct programs that we can reason about clearly and rigorously in a way that corresponds with how the machine will actually execute the program. This, though, is not the only goal. We also want our logic to have a degree of expressivity. For example, there's only so much we can express with propositional logic. It's nice to have universal (∀) and existential (∃) quantification from something like first-order logic.

Using type systems for reasoning

These goals can be very nicely addressed by type systems. This is especially clear because of the Curry-Howard correspondence. This correspondence is often expressed with the following analogy: types are to programs as theorems are to proofs.

This correspondence is somewhat profound. We can take logical expressions, and translate them through the correspondence to types. Then if we have a program with the same type signature that compiles, we have proven that the logical expression is universally true (a tautology). This is because the correspondence is two-way. The transformation between the type/program and the theorem/proof worlds are mechanical, and can in many cases be automated.

Curry-Howard plays nicely into what we'd like to do with specifications for a program.

Are type systems useful in Java?

Even with an understanding of Curry-Howard, some people find it easy to dismiss the value of a type system, when it

  1. is extremely hard to work with
  2. corresponds (through Curry-Howard) to a logic with limited expressivity
  3. is broken (which gets to the characterization of systems as "weak" or "strong").

Regarding the first point, perhaps IDEs make Java's type system easy enough to work with (that's highly subjective).

Regarding the second point, Java happens to almost correspond to a first-order logic. Generics give use the type system equivalent of universal quantification. Unfortunately, wildcards only give us a small fraction of existential quantification. But universal quantification is pretty good start. It's nice to be able to say that functions for List<A> work universally for all possible lists because A is completely unconstrained. This leads to what the Twitter user is talking about with respect to "parametricity."

An often-cited paper about parametricity is Philip Wadler's Theorems for free!. What's interesting about this paper is that from just the type signature alone, we can prove some very interesting invariants. If we were to write automated tests for these invariants we would be very much wasting our time. For example, for List<A>, from the type signature alone for flatten

<A> List<A> flatten(List<List<A>> nestedLists);

we can reason that

flatten(nestedList.map(l -> l.map(any_function)))
≡ flatten(nestList).map(any_function)

That's a simple example, and you can probably reason about it informally, but it's even nicer when we get such proofs formally for free from the type system and checked by the compiler.

Not erasing can lead to abuses

From the perspective of language implementation, Java's generics (which correspond to universal types) play very heavily into the parametricity used to get proofs about what our programs do. This gets to the third problem mentioned. All these gains of proof and correctness require a sound type system implemented without defects. Java definitely has some language features that allow us to shatter our reasoning. These include but are not limited to:

  • side-effects with an external system
  • reflection

Non-erased generics are in many ways related to reflection. Without erasure there's runtime information that's carried with the implementation that we can use to design our algorithms. What this means is that statically, when we reason about programs, we don't have the full picture. Reflection severely threatens the correctness of any proofs we reason about statically. It's no coincidence reflection also leads to a variety of tricky defects.

So what are ways that non-erased generics might be "useful?" Let's consider the usage mentioned in the tweet:

<T> T broken { return new T(); }

What happens if T doesn't have a no-arg constructor? In some languages what you get is null. Or perhaps you skip the null value and go straight to raising an exception (which null values seem to lead to anyway). Because our language is Turing complete, it's impossible to reason about which calls to broken will involve "safe" types with no-arg constructors and which ones won't. We've lost the certainty that our program works universally.

Erasing means we've reasoned (so let's erase)

So if we want to reason about our programs, we're strongly advised to not employ language features that strongly threaten our reasoning. Once we do that, then why not just drop the types at runtime? They're not needed. We can get some efficiency and simplicity with the satisfaction that no casts will fail or that methods might be missing upon invocation.

Erasing encourages reasoning.



Related Topics



Leave a reply



Submit