Return to Home Page
      Blog     Consulting     Seminars     Calendar     Books     CD-ROMS     Newsletter     About     FAQ      Search
 

9-22-04 Puzzling Through Erasure II

The more I read through the conversations about Java Generics on the Java Developer's Forum, the more it becomes clear that the dominant influence in Java Generics is erasure. I say this because it seems like most of the time, the answer to questions of "why can't I do this simple-seeming thing" is "because of erasure." For example:
class AClass<T> {
  private final int SIZE = 100;
  public static void aMethod(Object arg) {
    if(arg instanceof T) {}          // Error
    T var = new T();                 // Error
    T[] array = new T[SIZE];         // Error
    T[] array = (T)new Object[SIZE]; // Unchecked warning
  }
}

So the new-to-Generics reader (that is to say, my readers in Thinking in Java, 4th edition) will begin reading this code and the conversation will go something like this:

What's T?

That's the type. You can put any type in as T.

So when I refer to T, I'm talking about the type.

Exactly!

So why can't I say new T();?

Because of erasure. The type has been erased.

So T has been erased. That's why it doesn't work with the other expressions.

Wonderful! You're really catching onto this quickly.

No I'm not. First you tell me T is the type, then you tell me the type gets erased. What's going on here? Who thought of this?

No, What's on second. Who's on first.

So my question is, if we're going to have these T types, and then erase them, what's the point? What can you do with them?

You pass them to other classes and methods.

So that those classes and methods can in turn forget what they actually are?

Well... yes. But the erasure only happens inside the class or method. At the boundaries – when the object is being passed in or out – the type- checking actually happens.

Except that could only be at the outermost boundary, if you have generics that contain generics, because the type is erased as soon as it passes into the first generic

Correct. But all the generics inside the boundary are checked at compile time to make sure they agree with the types that are passed to them. Here's an example:

import java.util.*;

public class GenericsInsideGenerics<T> {
  List<T> lt = new ArrayList<T>();
  public static void main(String[] args) {
    GenericsInsideGenerics<Integer> gigi = new GenericsInsideGenerics<Integer>();
    GenericsInsideGenerics2<String> gigs = new GenericsInsideGenerics2<String>();
  }
}

When the class is compiled, the compiler checks that the properties of the T passed into GenericsInsideGenerics is compatible with the properties needed by the T passed into ArrayList. And when you actually instantiate GenericsInsideGenerics<Integer>, for example, the compiler only needs to check to see that the type parameter is compatible with the necessary properties for GenericsInsideGenerics. We can show this by adding a bound to the outer T:

import java.util.*;

public class GenericsInsideGenerics2<T extends Number> {
  List<T> lt = new ArrayList<T>();
  public static void main(String[] args) {
    GenericsInsideGenerics2<Integer> gigi = new GenericsInsideGenerics2<Integer>();
    // Compile error:
    // GenericsInsideGenerics2<String> gigs = new GenericsInsideGenerics2<String>();
  }
}

Notice that the ArrayList has no bound, because the 'outer' bound is what matters. Now you can no longer create a GenericsInsideGenerics2<String>.

However, you can't go the other way by putting the bound on the ArrayList and not the GenericsInsideGenerics2. Actually, you can't put bounds on an instantiation.

OK, so you're telling me that I have to use extra brain cycles to remember that T gets erased, so I can only pass it to other generics and I can't really use it directly.

I think that sums it up pretty well, yes.

So erasure is clearly very important, since I'm having to do this extra work.

They seem to have a death grip on the concept at Sun, yes. Apparently the JCP vote for erasure was unanimous at every stage of the process. The release has only one foot left in the doorway so it looks like erasure is it.

Fine, that's the way things are. I can live with it if there's an excellent reason. What is it?

Backwards compatibility, that's the ticket. Yesirree, it's all about backwards compatibility. You betcha, without erasure backwards compatibility is pert nigh impossible.......

...and here's where I get stuck

There is lots of published misinformation about this fact. For example, a fair number of people have firmly asserted that erasure allows Java 5.0 code to run under JDK 1.4. Neal Gafter has stated that this is incorect.

And misinformation goes the other way, too. You can find Sun Java designers confidently stating that C#/.NET 2.0 will break code from C# 1.0, because .NET 2.0 doesn't use erasure. I checked this out with Anders Hejlsberg (the lead designer of C#) and he said:

.NET is backwards compatible in the sense that older .NET applications and libraries continue to function on newer releases of the .NET Framework. To me, this is the essential meaning of backwards compatible.

[Later] ... there is no need to recompile .NET 1.0 (or 1.1) libraries in order to run them on .NET 2.0. The old binaries continue to work.

I think everyone is pretty much in agreement that this is what backwards compatibility means. But C#/.NET 2.0 doesn't use erasure – it preserves the type information at runtime, so you can know at runtime the type of T, and you can even know in the above example that you have a List<Integer>. So they did it without erasure – how was the problem different in Java?

Perhaps it has to do with the kind of backward compatibility we're talking about. I can see two "kinds" of backward compatibility:

  1. Source code compatibility. This would require you to recompile your code. I don't think this would be too big a price to pay for a better generic system.

  2. Object compatibility. This would not require code recompilation. I suspect this is what Sun's objective was, but the reason erasure was necessary to accomplish this is still not clear.

I'm not arguing against erasure at this point – that particular train has clearly left the station. It's what is (almost) shipping, so I'm now in the position of understanding and explaining it. But I need to be able to explain it, and so far I haven't been able to answer the question: what made erasure necessary?.

Here's a thread on the java forums trying to find the answer to the same question.

Type injection

In the process of struggling with this question, and pondering the other ways that generics might have been implemented in Java, I did have an idea. Now, everyone spent a lot of time – years – thinking about all the different possible ways to implement generics in Java, so I'm going to assume that this idea has already appeared, at least I hope it has. But I haven't seen it so I'll just float it by here.

When you call a method, this is quietly passed to that method. When you create an instance of an inner class, a reference to the outer-class object is quietly passed into and stored in that instance. So we have some precedent for this technique. What if, when you instantiate a generic, a reference to each of the type parameters is quietly passed in and stored in the resulting object?

At runtime, the necessary class information would then be available. If you wrote an expression that used a type parameter, the compiler could do something reasonable with it.

This does not mean I am suggesting adding this type information to every object. That would be needlessly redundant. However, the first time you instantiate a generic for a particular set of type parameters, the runtime system would create a class containing that type information (and we know that Java is good at doing things like this at runtime), and all the objects of that class would then have access to the type information.

This would not require any changes to the syntax, so if it was added to Java 6, Java 5 code would still work (since it disallows the use of the parameters in ways that we saw at the beginning, anyway). And in Java 6, you could just start using the type parameters in new ways (such as the illegal examples shown at the beginning of this article) inside the body of your code, and it would start working.

It might solve the problem.

Feedback Wiki Page

    Links I Read
Cafe Au Lait
Artima
Daily Python URL
Martin Fowler
Joel on Software
Paul Graham
Cringely
Search     Home     Web Log     Articles     Calendar     Books     CD-ROMS     Seminars     Services     Newsletter     About     Contact     Site Feedback     Site Design     Server Maintenance     Powered by Zope
©2003 MindView, Inc.