9-22-04 Puzzling Through Erasure II
The more I read through the conversations about Java Generics on the
Java Developer's Forum, the more it becomes clear that the dominant
influence in Java Generics is erasure. I say this because it seems like most of
the time, the answer to questions of "why can't I do this simple-seeming thing"
is "because of erasure." For example:
class AClass<T> {
private final int SIZE = 100;
public static void aMethod(Object arg) {
if(arg instanceof T) {} // Error
T var = new T(); // Error
T[] array = new T[SIZE]; // Error
T[] array = (T)new Object[SIZE]; // Unchecked warning
}
}
So the new-to-Generics reader (that is to say, my readers in Thinking in
Java, 4th edition) will begin reading this code and the conversation will go
something like this:
What's T?
That's the type. You can put any type in as T.
So when I refer to T, I'm talking about the type.
Exactly!
So why can't I say new T();?
Because of erasure. The type has been erased.
So T has been erased. That's why it doesn't work with the other
expressions.
Wonderful! You're really catching onto this quickly.
No I'm not. First you tell me T is the type, then you tell me the type
gets erased. What's going on here? Who thought of this?
No, What's on second.
Who's on first.
So my question is, if we're going to have these T types, and then erase
them, what's the point? What can you do with them?
You pass them to other classes and methods.
So that those classes and methods can in turn forget what they actually
are?
Well... yes. But the erasure only happens inside the class or method. At the
boundaries when the object is being passed in or out the type-
checking actually happens.
Except that could only be at the outermost boundary, if you have generics
that contain generics, because the type is erased as soon as it passes into the
first generic
Correct. But all the generics inside the boundary are checked at compile time
to make sure they agree with the types that are passed to them. Here's an
example:
import java.util.*;
public class GenericsInsideGenerics<T> {
List<T> lt = new ArrayList<T>();
public static void main(String[] args) {
GenericsInsideGenerics<Integer> gigi = new GenericsInsideGenerics<Integer>();
GenericsInsideGenerics2<String> gigs = new GenericsInsideGenerics2<String>();
}
}
When the class is compiled, the compiler checks that the properties of the
T passed into GenericsInsideGenerics is compatible
with the properties needed by the T passed into
ArrayList. And when you actually instantiate
GenericsInsideGenerics<Integer>, for example, the compiler
only needs to check to see that the type parameter is compatible with the
necessary properties for GenericsInsideGenerics. We can show this
by adding a bound to the outer T:
import java.util.*;
public class GenericsInsideGenerics2<T extends Number> {
List<T> lt = new ArrayList<T>();
public static void main(String[] args) {
GenericsInsideGenerics2<Integer> gigi = new GenericsInsideGenerics2<Integer>();
// Compile error:
// GenericsInsideGenerics2<String> gigs = new GenericsInsideGenerics2<String>();
}
}
Notice that the ArrayList has no bound, because the 'outer'
bound is what matters. Now you can no longer create a
GenericsInsideGenerics2<String>.
However, you can't go the other way by putting the bound on the
ArrayList and not the GenericsInsideGenerics2.
Actually, you can't put bounds on an instantiation.
OK, so you're telling me that I have to use extra brain cycles to remember
that T gets erased, so I can only pass it to other generics and I
can't really use it directly.
I think that sums it up pretty well, yes.
So erasure is clearly very important, since I'm having to do this extra
work.
They seem to have a death grip on the concept at Sun, yes. Apparently the
JCP vote for erasure was unanimous at every stage of the process. The release
has only one foot left in the doorway so it looks like erasure is it.
Fine, that's the way things are. I can live with it if there's an
excellent reason. What is it?
Backwards compatibility, that's the ticket. Yesirree, it's all
about backwards compatibility. You betcha, without erasure
backwards compatibility is pert nigh impossible.......
...and here's where I get stuck
There is lots of published misinformation about this fact. For example, a
fair number of people have firmly asserted that erasure allows Java 5.0
code to run under JDK 1.4. Neal Gafter has stated that this is incorect.
And misinformation goes the other way, too. You can find Sun Java designers
confidently stating that C#/.NET 2.0 will break code from C# 1.0, because .NET
2.0 doesn't use erasure. I checked this out with Anders Hejlsberg (the
lead designer of C#) and he said:
.NET is backwards compatible in the sense that older .NET applications and
libraries continue to function on newer releases of the .NET Framework. To me,
this is the essential meaning of backwards compatible.
[Later] ... there is no need to recompile .NET 1.0
(or 1.1) libraries in order to run them on .NET 2.0. The old binaries
continue to work.
I think everyone is pretty much in agreement that this is what backwards
compatibility means. But C#/.NET 2.0 doesn't use erasure it preserves the
type information at runtime, so you can know at runtime the type of
T, and you can even know in the above example that you have a
List<Integer>. So they did it without erasure how was
the problem different in Java?
Perhaps it has to do with the kind of backward compatibility we're
talking about. I can see two "kinds" of backward compatibility:
- Source code compatibility. This would require you to recompile your code. I
don't think this would be too big a price to pay for a better generic system.
- Object compatibility. This would not require code recompilation. I
suspect this is what Sun's objective was, but the reason erasure was necessary
to accomplish this is still not clear.
I'm not arguing against erasure at this point that particular train
has clearly left the station. It's what is (almost)
shipping, so I'm now in the position of understanding and explaining it. But I
need to be able to explain it, and so far I haven't been able to answer the
question: what made erasure necessary?.
Here's a
thread on the java forums trying to find the answer to the same question.
Type injection
In the process of struggling with this question, and pondering the other ways
that generics might have been implemented in Java, I did have an idea. Now,
everyone spent a lot of time years thinking about all the
different possible ways to implement generics in Java, so I'm going to assume
that this idea has already appeared, at least I hope it has. But I haven't seen
it so I'll just float it by here.
When you call a method, this is quietly passed to that method.
When you create an instance of an inner class, a reference to the outer-class
object is quietly passed into and stored in that instance. So we have some
precedent for this technique. What if, when you instantiate a generic, a
reference to each of the type parameters is quietly passed in and stored in the
resulting object?
At runtime, the necessary class information would then be available. If you
wrote an expression that used a type parameter, the compiler could do something
reasonable with it.
This does not mean I am suggesting adding
this type information to every object. That would be needlessly redundant.
However, the first time you instantiate a generic for a particular set of type
parameters, the runtime system would create a class containing that type
information (and we know that Java is good at doing things like this at
runtime), and all the objects of that class would then have access to the type information.
This would not require any changes to the syntax, so if it was added to Java
6, Java 5 code would still work (since it disallows the use of the parameters in
ways that we saw at the beginning, anyway). And in Java 6, you could just start
using the type parameters in new ways (such as the illegal examples shown at the
beginning of this article) inside the body of your code, and it would start
working.
It might solve the problem.