3-11-04 About Latent Typing
The very first question to yesterday's weblog (where we discover that "Java Generics" are
actually not generics at all, but simply an autocasting mechanism) was this:
Is latent typing (in a statically-typed language) desirable?
Or rather, does it "go against the grain" of the language?
Let's say I have classes Gun and Camera, both with
a shoot() method. If I had another method
void
makeMyDay(T t){t.shoot()}
do I really want to be able to pass an
instance of either Gun or Camera to it, without
caring? Wouldn't it be better to use an interface like FireArm and
make that the parameter to makeMyDay()?
Doesn't it make it likely that I'll "shoot myself in the foot" (pun intended)
if I just call shoot() on any old object? In Java, the semantics of
a method are not bound to its name, so much as they are bound to the interface
the method belongs to. e.g.: print() means different things to a
Stream than it does to an AWT component.
This question reveals just about every issue at hand. Let's look at the
implications of this question and then address each one:
- Latent typing is somehow connected to whether the language is static or
dynamic.
- A true generic mechanism does something other than loosen the explicit
constraints on the use of a type.
- Java doesn't already have such a constraint-loosening mechanism, so the
issue doesn't exist without true generics.
And the big one:
- The syntax of a program is somehow connected to the semantics of that
program.
Before diving into this, I want to point out that Martin Fowler recently made
the brilliant
observation that the difference in attitudes about issues like "when do we
do type-correctness testing?" depends on whether you have a "directing" approach
(you want to provide guidance to prevent people from falling down) or an
"enabling" approach (you want to provide tools and abilities that allow people
to move forward faster). Both approaches are reasonable and neither is wrong. I
have been in both situations; for example, trying to prevent interns from
ignoring or even actively circumventing coding style guidelines (where more
"direction" was required), and on the other hand being frustrated by the loss of
productivity that comes from being forced to conform to constraints that I
wouldn't have violated anyway, and therefore was not benefiting from. Both
approaches can be helpful in appropriate situations, and neither approach is an
inviolable solution.
What is Latent Typing?
As I pointed out yesterday, when you use a latent typing mechanism (the core
concept in a true implementation of generics), the type is still there, it's
just implied. So on a very simple level, you could look at it as a device
that writes interfaces for you, so that you don't have to. It's an enabling
device that reduces the amount of code you have to write.
Why do we want to use latent typing? It's a code organization and reuse
mechanism. With it I can write a piece of code that can be reused more easily
than without it. Code organization and reuse are the foundational "levers" of
all computer programming: write it once, use it more than once, keep the code in
one place. The most fundamental organization-and-reuse mechanism is hard-coded
into the opcodes of your microprocessor: the subroutine. Procedural programming
builds upon and improves the idea of the subroutine, and object-oriented
programming collects structures and associated subroutines together to produce a
larger and more sensible code chunk with better organization and reuse factors
than independent structures and subroutines. Alongside developments that improve
organization and reuse come, we hope, better ways to discover errors in our
code.
Because I am not required to name an exact interface that my code operates
upon, with latent typing I can write less code and apply it more easily in more
places.
Why can Latent Typing Appear "Dangerous"?
I believe that latent typing appears dangerous for two reasons. First,
because you don't see the type explicitly defined, it can seem like it's not
there and thus the type rules might not be enforced. However, in C++ the latent
type is clearly enforced, and at compile time. In dynamic languages like Python,
Smalltalk, etc., the latent type is also enforced: you still cannot successfully
send an improper message to an object.
The second reason for the appearance of danger is that somehow it makes it
easier to "shoot yourself in the foot" if a type is latent. To see why it's no
easier with or without latent typing, it's important to see the distinction
between latent typing and weakening the type constraints on a function's
arguments.
The argument of a function can be exactly specified ("I will only accept a
String for this argument"), which is what a procedural language
does (one that has strong type checking, anyway: assembly-language and old-style
C simply took bits without discrimination). One of the great benefits of object-
oriented programming is that it weakens this constraint a bit: polymorphism
means that you can stick more than one type of object into a particular
variable. So now our function argument can be "a Shape or anything
derived from a Shape." Of course, it's possible that a "bad" kind
of Shape could then be passed into our function and we would
blithely operate upon it, and then be horribly surprised when our program
breaks. However, with experience we've come to believe that this rarely happens,
and when it does it comes from a gross misunderstanding of the paradigm.
Even saying that something expects a particular class, or subclass of that
class, can be overconstraining. Without a formal interface mechanism in C++
(although "pure abstract base classes" could be created by hand), programmers
tended to use classes everywhere and ended up with code that, for example, would
only accept a Shape, when that particular piece of code could also
have been used on anything that is "Drawable."
By making the interface explicit, Java promoted this idea to first-class status.
But look at what the interface does: it weakens the type
constraints by first decoupling the interface from implementation so you
only have an outline of what the type looks like, but no semantics attached. The
second way the constraints are weakened may have a bigger impact: you can easily
fragment groups of functions into interfaces according to what you want
and thus create code that is more generic by minimizing the constraints upon the
argument to be only those methods that you are actually going to call. So with
interfaces, you're able to say "I don't care what type you are as long as you
can perform these operations." Which is exactly what the questioner was
concerned about. He was worried that we could end up doing something like this:
interface Shootable {
void shoot();
}
class Camera implements Shootable {
public void shoot() { System.out.println("Click!"); }
}
class Gun implements Shootable {
public void shoot() { System.out.println("Bang!"); }
}
public class Shooting {
public static void shootEmUp(Shootable s) {
System.out.println("Yee Haw!");
s.shoot();
}
public static void main(String[] args) {
Camera c = new Camera();
Gun g = new Gun();
shootEmUp(c);
shootEmUp(g);
}
}
Which of course we can, with interfaces. The questioner's solution was to try
to attach semantics to the interface in order to prevent its misuse to
increase the constraints on the argument of shootEmUp() so that it
could only be a firearm, and not a camera. Certainly this is an appropriate
design in some cases, when you always know that you only want to ever pass a
firearm or something derived from it. And if you are really concerned that this
would be misused that someone might pass in a gun that explodes
you can make the Firearm class final. But that is not
the point of either classes or interfaces. Their point is to reduce the
constraints, to provide more flexible programming opportunities, and only to
increase the constraints when absolutely necessary.
So the constraint-loosening mechanism is already there, in the form of
interfaces. Latent typing simply takes this one step further, and makes that
interface latent so you don't have to express the interface, or to implement it
in every class that you want to use in the function. Since we already have
interfaces, latent typing is just a coding convenience.
The questioner's other concern is independent of whether you have interfaces
and/or latent typing. If the constraints are weakened this way, won't the code
be misused? This is also a concern that arose when large numbers of programmers
first began to program with an object-oriented language: because an object of a
type can be replaced with an object of its subtype, won't that open up the
potential for misuse? This is a fair question, but no one asks it anymore
because we have come, through experience, to know that it only happens in the
most egregious examples of misunderstanding, and in those cases the problems are
not isolated to type errors.
Basically, the questioner is saying: "we can stop problems from happening by
preventing the programmer from passing a Camera to
shootEmUp()." To which I reply: "if that's a bad thing and your
programmer is trying to do it, that programmer will find some other way to mess
up your program even if you prevent him/her from breaking it here." We learn
again and again that it's not possible to prevent people from doing something
bad with your programming system no matter how safe you attempt to make it. And
there's a boundary beyond which all the "directing" guidance will fail -- a
programmer must have a certain level of understanding and be able to buy into a
particular language, environment, framework, etc., up to a certain level in
order to use those tools properly. Less than that, and they need training, not
type-checking.
So to summarize, OO programming allows you to lessen the constraints on a
function argument. Interfaces reduce these constraints even more by explicitly
separating interface from implementation, and thus allowing you to require only
the smallest possible set of operations on the argument, opening up the
function's application to a larger set of possible classes. Latent typing simply
makes the interfaces implicit by, in effect, writing both the
interface part and the implementation part for you.
But all these mechanisms reduce the constraints so you can apply your code more
easily. (But type checking still happens!)
The Semantic Issue
An entirely separate discussion is "the meaning of an interface." Because
there are no operations attached to the collection of signatures in an
interface, the only thing that the compiler does when given an interface is to
ensure that the signatures conform. That is the intent: the interface is
separated from the implementation, thus reducing the coupling between the
function that accepts the interface and the object that it acts upon.
But I often hear the argument that "an interface implies a semantic
contract." For example, the questioner above states:
In Java, the semantics of a method are not bound to its name, so
much as they are bound to the interface the method belongs to.
Perhaps I misunderstand his meaning, but there are no semantics at all "bound
to the interface." If semantics can be said to exist in a program, they can only
exist in the methods that are part of a class, rather than the signatures that
are part of an interface (that is, a type, as distinguished from a class).
The only semantics that are associated with an interface are the ones that
you enforce. A class can be an implementation of a particular interface, and
that class has its own semantics. You may require that all classes that
implement an interface follow a certain set of rules, but you can only enforce
those rules using tests that you apply outside of the compiling environment.
Your tests may be code walkthroughs or a conformance test framework, but even if
you feel strongly that your interface implies a semantic contract, the only
contract that is enforced by the language is that any class that implements the
interface will include the signatures in that interface.
Again, that's the point: by rigorously decoupling interface and
implementation using the interface keyword, Java reduces the
constraints on an object that's passed into your function, and thus allows you
to apply your function more flexibly.
And we've found through experience that, just like polymorphism in OOP, our
anticipated dramatic increase of bugs because of this technology doesn't seem to
happen.