Tricks for Java interop

When I first started learning Java, it was very early in its evolution. It was a simple language with classes and methods. Over the years, they've added all sorts of stuff. Inner classes, enums, generics, and vararg methods. These things continuously trip me up when writing Clojure. What's cool is that while Java has added lots of features as a language, the JVM bytecodes have been very stable. These features are compiled into the same basic class/method scheme as everything else.

Luckily, Clojure gives us exactly what we need to consume classes and call methods. Over the years, I've had to deal with all of those Java features I listed above. It's not always obvious how they map. I still have to look these things up all the time. So I'm collecting these tricks here for myself and for you to refer to.

Java class and package names

Ok, this isn't really a Java language feature issue. But it's important. Clojure allows characters in names that are not legal in Java names. The most common problem with this is that hyphens in Clojure names get translated into underscores. But there are many more characters that need translating (?, !, etc.).

Clojure has a function called clojure.core/munge that will convert Clojure-style names into their equivalent Java legal name. The reverse operation is called clojure.core/demunge. These fns actually wrap methods in the Clojure compiler, so they're the same logic the compiler uses itself. It uses a lookup table to know which characters are allowed and which need to be translated. Refer to that table to see quickly how things will be translated.

Inner classes

If you use a lot of Java interop, you'll often find a class like this one. It's called java.lang.Thread.UncaughtExceptionHandler. The package is java.lang, and the classname is Thread.UncaughtExceptionHandler. You know the Thread part is part of the classname because it starts with a capital T. But now it means there's a . in the name.

This type of class is called an inner class. It means that the class UncaughtExceptionHandler was defined inside of the Thread class. That's not a problem. The problem is that . (dot) is not a valid character in Java classnames! So how is this thing named?

The answer is that the Java compiler replaces .s (dots) in the class names with $ (dollar sign). So, when you're refering to that class, you need to do the same translation:

(reify Thread$UncaughtExceptionHandler
  ...)

Enums

Java added Enums in Java 5. Enums are a fancy way to represent a choice between different constants. You get a new type (the enum) and you have premade instances of that type made for you. Of course, there's no bytecode for enums. Instances of the enum are implemented as static fields. So if you've got a Java enum like this:

enum DaysOfWeek{ MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY; }

You can refer to those enumerations in Clojure like this:

DaysOfWeek/TUESDAY

That's the standard static field access.

Varargs methods

Java 5 also added varargs. That is, methods that take different numbers of arguments. Specifically, the last argument can have ... (three dots) after the type and Java will accept any number of arguments of that type, including zero.

An example is java.util.Formatter.format(). Here's the type signature:

public Formatter format(String format, Object... args)

Of course, the JVM bytecode does not know anything about this. The Java compiler has to convert that into something the JVM can handle. What it does is package up all of the arguments into an array, and it passes that array as the argument. So, as far as the JVM is concerned, that format() method just has two arguments. A String and an array of Objects. Note that the array might be empty.

What that means to you, as a Clojure programmer, is that you have to build that array yourself—even if it's empty.

Building an empty array is easy with clojure.core/make-array:

(.format formatter "No need for args" (make-array Object 0))

But if you need some stuff in there, that's easy too with clojure.core/into-array:

(.format formatter "%d %d %d" (into-array Object [1 2 3]))

Generics

Java 5 also introduced generics. What a release! Generics are a static type system that lets you parameterize classes on other classes. They're very commonly used for collections, where you want to say you want a list of strings (written List<String>).

But here's the thing: the type inference and checking is done entirely at compile time. There's no representation of this in the bytecode. That means if you need to consume these classes from Clojure, you don't have to deal with the type in any different way. Just consider all the types to be Object, which is typically how they're represented in bytecode anyway.

Now, if you're creating a library for consumption by people writing Java, you might be used to using gen-class. But gen-class does not have any way to specify the generics type signatures. If you really want to include that information, you can write an interface or class in Java directly. Chas Emerick has a good answer on Stack Overflow. I have a lesson on how to include Java code in your Clojure projects.

Conclusions

I'm glad I've documented these tricks because I'm likely to encounter them again. These are frustrating quirks. But writing these down has also helped me appreciate the stability of the JVM bytecode. Java has added significant language features, but the JVM is basically the same. Amazing.

The JVM is full of these kinds of quirks. That's why I created the JVM Fundamentals for Clojure course. You can buy it to view online or to download. Or you can buy a membership and get the JVM course and all of the other courses on the site.

After talking about some quirks, I really want to explore one of the big advantages Clojure gets by running on the JVM. That thing is the highly-optimized garbage collector. Clojure generates a lot of garbage, but it's vacuumed up quickly by the JVM. That's what we'll explore next time.