Clojure Gazette 1.10

Typed Clojure

Clojure Gazette

Issue 1.10 - May 17, 2012

editorial

Typed Clojure

I am happy to bring you an informative and mind-expanding interview with AmbroseBonnaire-Sergeant. Ambrose was accepted to work on a Google Summer of Code project to bring a static type analyzer to Clojure. You can read the proposal or follow the project's progress on GitHub. The literature review will give you some good background information.

I asked Ambrose about Typed Clojure and how static analysis will play with Clojure's dynamic tendencies. I had fun and learned a lot. Enjoy!

Eric Normand

Ambrose Bonnaire-Sergeant

Clojure Gazette:

What do you find most engaging about Clojure? You gave a presentation about core.logic and now you are working on Typed Clojure for Google Summer of Code. What drew you to those projects?

Ambrose:

Clojure has a lot going for it. Briefly, three elements that have kept me interested are the community, the constant innovation and the constant push for practicality. Clojure has attracted an awesome bunch of people together and it's a pleasure to be part of it!

Last year, members of the Clojure community donated over US$3000 to send me to Clojure Conj 2011, where I talked on core.logic, Clojure's logic engine. Logic programming always put stars in my eyes, but David Nolen's core.logic was my first opportunity to understand the mechanics behind it.

I spent a good 6 months just tinkering with logic programming:

Evolving a Logic Programming Language:\

  • Slowly evolving a Clojure-like language to a backtracking logic implementation, but with an imperative feel for backtracking instead of monadic (this was really a stream a of consciousness)

Relational Arithmetic:\

  • Having fun with reversible arithmetic

Relational Type Checker:\

  • A beginners introduction to core.logic, by building a reversible type checker (!)

So that all cumulated to my Conj talk, in which I attempted to give an intuitive introduction to logic programming with core.logic.

My interest in type checking Clojure code started with this exploration via logic programming. I talked with Rich Hickey at the Conj and he expressed interest in static analysis tools utilising Datalog, such as BDDBDDB. For example, he was interested in proving null pointer exceptions impossible statically.

I wanted a full static type system that supported Clojure idioms, like Typed Racket for Racket, but as an optional feature, like Qi/Shen's type system. Typed Clojure also has the (ambitious) goal of statically proving NPE impossible.

Clojure Gazette:

An annotated answer. I like it!

From a developer's perspective, how will gradual typing work? How much type information will the programmer need to provide? And how will you allow for interactive development (compiling and recompiling one function at a time) with static analysis that might need a broader perspective than a single function?

Ambrose:

Typed Clojure is very rough at the moment, but this is how I envisage it.

Every top-level binding in a typed namespace should be fully annotated. This also serves as valuable documentation that is currently missing from Clojure.

For example, sum takes a Seqable of numbers, and returns a number.

To annotate, we simply add the corresponding type signature with +T.

(+T sum [(Seqable Number) -> Number])
(defn sum [xs]
(apply + xs))

The programmer will almost never be required to annotate local bindings, as they will be inferred using local type inference.

;; no annotations necessary
...
(let [a (+ 1 2)
b (+ a 1.2)
c (/ a b)]
(= a c))
....

Local functions are the exception; local type inference cannot infer the type of myfn:

....
(let [myfn (fn [a b] (+ a b))]
(myfn 1 2))
...

Here, the parameter types would be required (fortunately the return type can be inferred).

Scala, which uses local type inference (LTI), also has this limitation.

In my opinion, LTI is a win overall:

  • fully documented top-levels
  • "obvious" type annotations are inferred
  • edge cases are well defined and simple to understand

Inferring type arguments to polymorphic HOFs is an example of a well defined edge case of LTI.

You can think of a polymorphic type as a type that can be instantiated in more than one way.For example, the identity function is polymorphic in its first argument and return type, typed as:

(+T identity (All [a] [a -> a]))

We can instantiate a polymorphic function by providing explicit type arguments.

identity[Long] :: [Long -> Long]
identity[Integer] :: [Integer -> Integer]

The type checker can infer the type arguments most function calls, so explicit type arguments will be relatively rare.

(identity 1) :: Long ; inferred type arguments
vs.
(identity[Long] 1) :: Long ;explicit type arguments

(Note the syntax for type arguments TBD. The final syntax would benefit from better communication to the Compiler. See: http://dev.clojure.org/display/design/Annotating+Analysis+Results )

If we pretend "map" is a two argument function, we can type it like so:

(+T map (All [a b]
[[a -> b] (Seqable a) -> (Seq b)]))

We can see that "map" is a polymorphic HOF.

So where is the edge case? If we call a polymorphic HOF and omit expected type arguments or type annotations, then

type synthesis fails.

This means the following form

(map #(inc %) [1 2])

either must provide an explicit type argument like so:

(map[Long] #(inc %) [1 2])

or annotate the function parameter:

(map (fn [a :: Long] (inc a)) [1 2])

(again, syntax TBC)

Scala has provided a solution to this problem via Colored Local Type Inference,but I haven't attempted to apply those ideas to Clojure's "map", which has quite a complex type signature.

I have not fleshed out REPL driven development in a typed environment. We would need a REPL that understands type checking, and some way of ensuring a namespace is type-checked before starting a REPL in it.

The important thing to notice with Typed Clojure is that typed code is expanded to normal Clojure code. Everything you can do normally will be available, but with the option of running the type checker over your code.

If you use an external var from an untyped namespace, you must provide a type annotation for that var, which fits with the requirement of annotating top-levels.

Another important note is that compilation does not mean type checking. Recompiling your namespace does not trigger any type checking, which must be performed manually. The interaction between compilation and type checking needs to be further explored before they are combined.

Clojure Gazette:

You mentioned in your proposal exploring typed hash maps. Clojure programs tend to use hash maps a lot as generic data structures, so making them a little safer would help a lot. What are the ways you could analyse hash maps? Specifically, is there any way to statically ensure that a given key exists in a hash map?

On a related note, will the type checker have an understanding of the Java type hierarchy or will it remain limited to Clojure? For instance, in your example with (map #(inc %) [1 2]) above, could the type inferencer not notice that inc is annotated as Number -> Number, and therefore [1 2] is Seqable Number?

Ambrose:

Answering your second question, in (map (fn [a] (inc a)) [1 2]), the tricky issue is inferring the type of "a" without providing the type arguments to map. (map inc [1 2]) would infer just fine, because inc is fully annotated. Problems like this highlights a tradeoff between "local" and "global" type inference. LTI cannot infer as many types, but is vastly simpler to understand, while global type inference doesn 't require annotations, can infer many more types, but requires a complicated system with backtracking.

To your first question: the problem with hash maps in Clojure, from a typing perspective, is that they're often used like ad-hoc objects. A hash map mapping Numbers to Strings could be represented by (Map Number String), but a hash map mapping the keys :a, :b and :c to Numbers, Strings and Characters is not very useful typed as (Map Keyword (U Number String Character)).

We need a hash map type that can store keys as values, and knows whether keys are optional.

The ClojureScript compiler's analysis output is a good example.

Analysis is a hash map that always has an :op key with a keyword, an :env key with a map, and other optional keys. The type could be sketched as:

(Map* [:op Keyword]
[:env (Map Any Any)]
:optional [[:exprs ...]])

Compiler analysis is interesting because it is a recursive type: it refers to itself inside the type. We could (hopefully) get even finer grained: https://gist.github.com/2481092

Tracking which keys are present in particular instances of a map will utilise an inference technique called occurrence typing, developed for Typed Racket.

If we know that the key :mykey is optional in mymap, then we can try and look it up at runtime.

(when (:mykey mymap)
...)

Using occurrence typing, we can infer that (:mykey mymap) is non-nil in the body of the "when", therefore it has an associated value (ie. is not optional) in the body.

Accumulating the keys of ad-hoc maps could be quite challenging, but I don't foresee any problems if keys are always compile-time literals.

(-> {}
(assoc :b (some-fn blah))
(assoc :c (really-complicated))
(assoc :d (when (pred)
(anything))))

We could easily track the keys here, and their potential values.

If a key is not a compile-time literal, we just fall back on the less informative Map type, which gives a single type for all keys and another for all values.

In practice, this shouldn't be an issue. If you are using a map as an ad-hoc object, you will know ahead of time which keys you will use, so there is no reason not to specify them as literals.

You mentioned Java types. Typed Clojure will be compatible with Java's type system, to the extent Clojure is currently. Java Generics and concrete inheritance will not be supported. You can mix Clojure types with Java types.

For example, the type of a Vector would be

(def-type-alias (Vector a)
(I clojure.lang.APersistentVector
(Seqable a)
[AnyInteger -> a]
(Coll a)
(Associative AnyInteger a)))

Clojure Gazette:

It sounds like an interesting challenge. One theme that keeps emerging is the limitations of the type checker. What are some of the things that cannot be checked? Will they be reported as errors, as Haskell does? Will there be a large set of uncheckable programs? Will there be any value in "partially checked" programs?

Ambrose:

Initially, I want most type errors to be reported as errors. There are a few glaring issues, however.

Clojure is dynamic, so it can define classes, extend protocols and add methods to multimethods on the fly. This is a grey area for type checking. It might be premature to throw an error if there isn't a matching dispatch for a protocol method; it could be added later. But is this common in code that would benefit from type checking? Some exploration is needed.

The treatment of nil is also interesting. In Typed Clojure, nil is not a subtype of Object. This differs from Java, where null is a subtype of all reference types. This is fine, as long as there is a systematic way of proving things cannot be nil.

In Clojure, we usually eliminate nil cases by using a conditional:

(let [a (.someInterop MyInterop)]
(when a
(a)))

We can capture this intuition using occurrence typing. To my knowledge, occurrence typing has not been applied to Java interop, so it will be interesting to see its flexibility.

For example, it would be essential to understand that (filter identity objs) returns a seq of non-nil values.

Scala solve s this with Option types. In Typed Clojure, an Option type looks something like this:

(def-type-alias (Option a)
(Case a nil))

where Case is a disjoint union. The main difference is that Scala uses pattern matching to extract a non-null value, and here we can (potentially) use existing Clojure idioms.

Personally, I'm very excited about the potential for null-safe interop with Typed Clojure!

Regarding partially checked programs: I don't expect everything to be statically checkable. I will probably add some ability to define untyped regions that rely on (user provided) runtime contracts. I feel this is a satisfactory compromise, and is certainly just as good as anything Clojure provides today.

An example:

(+T unchecked-fn [Number -> Number])

(untyped-region
(defn unchecked-fn [n]
{:pre [(number? n)]
:post [number?]}
(inc n))
)

Typed Clojure will eventually be integrated with a contracts library to help generate these contracts, like Typed Racket, but this is not in the scope of early versions.

Clojure Gazette:

So let's say I have some code like this:

(.otherMethod (.method javaobject))

It is clear that there is at least the possibility of a NullPointerException (.method could return null).

How will the type checker handle this? Will it suggest you wrap it in a when, like this:

(let [a (.method javaobject)]
(when a
(.otherMethod a)))

Ambrose:

You will be able to override the types of Java methods, probably in the form of non-nullable annotations to parameters/return values.

In your example, Typed Clojure will assume the return value of .method is nullable, so it will throw a type error, because nil is not a reference type.

Clojure Gazette:

This has been a great interview. I have learned a lot.

Two questions to wrap up:

Are there any other things you would like to talk about that I did not ask about?

If someone would like to follow the progress of Typed Clojure, or to help out, where should they go?

Thank you so much for doing this interview. It was very informative!

Am brose:

6 months ago I barely knew what a type system was, I'd like to thank a few people.

I'd like to thank my supervisor Rowan Davies for his support and expertise.

Daniel Spiewak and Sam Tobin-Hochstadt have been very helpful and patient, thank you!

David Nolen has been a constant source of inspiration for over a year, it's always fun to show him cool stuff.

And thanks to everyone who has supported me along the way!

If you enjoyed my ramblings, you can get more by following @ambrosebs
That is the best way to follow development of Typed Clojure too.

Email me at abonnairesergeant@gmail.com if you want to discuss Typed Clojure.

Thanks Eric, I enjoyed myself :-)