12-28-04 Finding Unused Imports
For some reason it really bugs me to have an import line
that you don't need, especially in a book. Part of the issue, I guess, is
that it's confusing to see the import and then not see the imported element(s)
used. But it also feels slightly buggy to me.
I know a number of tools are able to detect unused imports; Eclipse, for example,
but I have two problems with the ones that I know about:
- They usually require interaction for each file. I want batch processing.
They don't stand alone. Eclipse is not fully up to J2SE5, as far as I know,
and all the code in TIJ4 is using J2SE5 as fully as possible. Jalopy (the
TRIEMAX version) completely understands J2SE5, but
I can't only do the import checking all by itself, and my code has some custom
formatting (to fit within 60 columns for the book) that I don't want Jalopy to change.
I hunted around a bit for batch import checkers but any that I came across seem to have vanished.
I thought of writing a brute-force one but my brain was prematurely optimizing until I realized that
I could go away and leave the thing running for as long as it needs, and that I only need to run it
occasionally. The final inspiration was the realization that I could use Python's new (to me) generator
functionality (as you'll see in the next edition of TIJ, if you haven't discovered it already, I
am fascinated with generators and similar ideas). Which turns out to be very nifty indeed; I'm now
a Python generator convert and will probably be classified as a "two year old with a yield
keyword" (everything looks like a generator). For awhile, anyway, until it's internalized the way
list comprehensions now are.
The program walks the directory tree from wherever you start it, looking for .java files.
For each Java file, it finds the block of import statements and comments each import out,
one at a time, writes the file back and runs the Java compiler on it. If the compiler succeeds (returns
zero from the os.command()), then that import must not have been necessary, so the file
is rewritten with the import removed. Otherwise the original version of the file is restored.
Although this program works for me, caveat emptor: you should perform backups and make a copy
of your code before running the program -- it might have bugs in it that could damage your code. You've been
warned (however, you can examine the code fairly easily and decide that it probably won't have problems.
Of course, I keep multiple redundant backups of all my code).
"""RedundantImportDetector.py
Discover redundant java imports using brute force.
Requires Python 2.3"""
import os, sys, re
reportFile = file("RedundantImports.txt", 'w')
# Regular expression to find the block of import statements:
findImports = re.compile("\n(?:import .*?\n)+")
def main():
'''Walk the directory tree and test all Java files'''
for root, dirs, files in os.walk('.'):
for javaFile in [os.path.join(root, f)
for f in files if f.endswith(".java")]:
checkImports(javaFile)
def checkImports(javaFile):
java = file(javaFile).read()
imports = findImports.search(java)
if imports:
imports = [f for f in imports.group(0).split('\n') if f != '']
fileParts = findImports.split(java)
assert len(fileParts) == 2
for mutated in mutateImports(imports):
file(javaFile, 'w').write(
fileParts[0] + mutated + fileParts[1])
os.chdir(os.path.dirname(javaFile))
if os.system("javac " + os.path.basename(javaFile)) == 0:
print >>reportFile, javaFile + "\n" + mutated
redundantRemoved = "\n".join(
[m for m in mutated.split("\n")
if not m.startswith("//")])
print >>reportFile, redundantRemoved
file(javaFile, 'w').write(fileParts[0] +
redundantRemoved + fileParts[1])
return # No further attempts
file(javaFile, 'w').write(java) # Restore original file
def mutateImports(imports):
'''Generates different versions of imports, each with a
different line commented out'''
for i in range(len(imports)):
mutated = imports[:]
mutated[i] = '//' + mutated[i]
yield "\n".join([''] + mutated + [''])
if __name__ == "__main__": main()
The generator is mutateImports(). As long as you call yield,
it retains the state of the function, and when you call it again it wakes up right
after the yield. Only when you exit the function via return (which
is implicit here) or raise a StopIteration exception does the generator terminate.
In addition, anything with a yield becomes a generator. Here's a good definition
of generators from the online Python Cookbook:
Generators simplify writing iterators which produce data as needed rather than all at once. The
yield keyword freezes execution state, eliminating the need for instance variables and progress
flags. As well, generators automatically create the __iter__() and next() methods for the iterator interface.
One limitation is that it only finds one redundant import in a particular file. However, it reports
all the redundancies that it finds, so you know if you've had a clean pass. I figure I'll need
to run it regularly anyway, as I develop the book, so eventually all the extra imports will get
flushed out.