Return to Home Page
      Blog     Consulting     Seminars     Calendar     Books     CD-ROMS     Newsletter     About     FAQ      Search
 

12-03-2004 JDK 5 Class File Format Puzzle

Jeremy Meyer (who's here from London "sprinting" on the Annotations chapter with me) and I did a marathon pair programming session yesterday writing a method to extract the qualified class name (including the package) from a class file (so that you can load a class using Class.forName() given only the class file).

We used Bill Venner's "Inside the Java 2 Virtual Machine" (and called Bill at one point) as a reference, and also found the source code for Chris Rathman's Jasper to be helpful. Here's what we came up with to extract the name:

import java.io.*;
import java.util.*;

public class ClassNameFinder {
  static final int
    UTF = 1,
    INTEGER = 3,
    FLOAT = 4,
    LONG = 5,
    DOUBLE = 6,
    CLASS = 7,
    STRING = 8,
    FIELD_REF = 9,
    METHOD_REF = 10,
    INTERFACE_METHOD_REF = 11,
    NAME_AND_TYPE = 12;
  public static String thisClass(String classFile) {
    Map<Integer, Integer> offsetTable =
      new HashMap<Integer, Integer>();
    Map<Integer, String> classNameTable =
      new HashMap<Integer, String>();
    try {
      DataInputStream data = new DataInputStream(
        new BufferedInputStream(
          new FileInputStream(classFile)));
      int magic = data.readInt();  // 0xcafebabe
      int minorVersion = data.readShort();
      int majorVersion = data.readShort();
      int constant_pool_count = data.readShort();
      int[] constant_pool = new int[constant_pool_count];
      for(int i = 1; i < constant_pool_count; i++) {
        int tag = data.read();
        int tableSize;
        switch(tag) {
          case CLASS:
            int offset = data.readShort();
            offsetTable.put(i, offset);
            break;
          case UTF:
            int length = data.readShort();
            char[] bytes = new char[length];
            for(int k = 0; k < bytes.length; k++)
              bytes[k] = (char)data.read();
            String className = new String(bytes);
            classNameTable.put(i, className);
            break;
          case LONG:
          case DOUBLE:
            data.readLong(); // discard 8 bytes
             // Here's the fix: (see wiki comments)
            i++; // Special skip necessary 
            break;
          case STRING:
            data.readShort(); // discard 2 bytes
            break;
          default:
            data.readInt(); // discard 4 bytes;
        }
      }
      short access_flags = data.readShort();
      int this_class = data.readShort();
      int super_class = data.readShort();
      String thisClassName =
        classNameTable.get(offsetTable.get(this_class));
      return thisClassName;
    } catch(Exception e) {
      throw new RuntimeException(e);
    }
  }
  public static void main(String[] args) {
    System.out.println(thisClass(args[0]));
  }
}

This follows the class file information that we had, and we seemed to get it working, but later discovered that it only works on some files. Other files that appear to include some Java 5 constructs will trip it up. Those same files fail with Jasper, as well, so it appears that the class file format has changed for JDK 5. Here's a Sun document that describes it, but nothing jumped out at us right away and we ran out of steam.

If anyone has insights please post them to the wiki comments. Thanks.

My current fallback position is to first try to use the name of the class file as the name of the class to load, and if that fails use a regular expression to look through the class file; I've tried it with this Python program and it seems to work (and it seems reasonable to assume that the first name in the package will be all letters, because it will typically be something like 'com'):

import os, re

for root, dirs, files in os.walk('.'):
  for f in [ f for f in files if f.endswith('.class')]:
      path = root + os.sep + f
      bytes = file(path, "rb").read()
      name = f.split('.')[0]
      name = name.replace('$', '\$')
      qualified = re.findall(
          '([a-z]+/[a-z0-9_/]*' + name + ')[^$;]', bytes)
      if qualified:
          print path
          print "     ", qualified[0].replace('/', '.')

Feedback Wiki Page

    Links I Read
Cafe Au Lait
Artima
Daily Python URL
Martin Fowler
Joel on Software
Paul Graham
Cringely
Search     Home     Web Log     Articles     Calendar     Books     CD-ROMS     Seminars     Services     Newsletter     About     Contact     Site Feedback     Site Design     Server Maintenance     Powered by Zope
©2003 MindView, Inc.