Java Generational Garbage Collection

How the JVM makes Clojure's data structures possible.

Persistent data structures require really good garbage collection. Lisp has always had persistent data structures. The cons list is persistent because it can share structure. When Clojure came out, it featured immutable persistent data structures. And not just the list. It also has vectors, maps, and sets.

Because they're immutable, once an object is instantiated, it is never changed. That means Clojure has to use a "copy-on-write" discipline. Instead of modifying the object, you make a modified copy. That means you've got lots of garbage. You're making copies all the time.

Luckily, because the data structures are persistent, it means that they share a lot of their structure. The amount of garbage memory per modification can be quite low relative to the amount of memory that can be reused.

But it's still a lot of garbage. Every seq allocates. Every time you modify a vector or a map, it allocates. Garbage is the name of the game in Clojure. That's why garbage collection was invented. A lot of research has gone into garbage collection, particularly for reducing the amount of time the GC has to pause.

One of the coolest ways to reduce that pause time is to consider the age of the objects. Most objects that are old (they were instantiated a while ago) tend to stick around. So why look at them frequently to check? It also turns out that most objects are discarded very soon after being used. So you get rid of most of your garbage very quickly. This type of memory management is called Generational Garbage Collection.

Generational GC is one of the reasons Clojure can be so fast. Clojure creates so much "ephemeral" garbage, it's important to be able to allocate and collect it quickly. Java's memory management has been tuned and worked on for years. I tried to find numbers to quantify how much effort has been put into it, but I couldn't. I think it's safe to say that it's in the millions of dollars. Allocation and collection are down to a handful of instructions per object.

Clojure's data structures exercise the JVM's fast GC. While using them is never going to be as fast as pure Java, the JVM does allow Clojure to be practical. Of course, the memory usage still needs to be configured and managed, but the JVM allows us to program at a much higher level and to take advantage of shared-memory parallelism.

If you're not experienced in the JVM, all of those details can be overwhelming. People have told me flat out that they would love to write Clojure but they're afraid of the JVM. This is a shame, because the JVM is what enables Clojure. I wanted to create something that would help people feel comfortable with the JVM without spending years gaining experience.

So I created a five-hour course called JVM Fundamentals for Clojure. It won't turn you into a JVM expert, but it will give you an in-depth tour. It teaches you all of those things that I do day-to-day that I've seen people get stuck with. How do you deal with out of memory errors? How do you navigate this huge standard library? All of that stuff is covered.

You can get the course by buying either a watch online or a watch-online-and-download version. Both give you lifetime access. Another way to get it is to buy a membership. That way, you get the JVM course and more than 30 hours of other material on everything from web development to macros.

Of course, not everything is rosy on the JVM. Besides the complexity, there are a few things that are not so great that actually make it a less-than-ideal host. We're going to get into some of those next time.

Get this lesson and ten more in your inbox. Sign up below.