PurelyFunctional.tv Newsletter 365: why not more convenience routines for CSV parsing?

Issue 365 - February 17, 2020 · Archives · Subscribe

Follow-up 🙃

why not more convenience routines for CSV parsing?

A couple of people commented on my advice in the last issue. I said that I liked that the clojure.data.csv parser does the bare minimum to parse a CSV and that we should strive to the same in our own libraries.

The comments were clear disagreements. Here's a summary:

Parsing CSVs into lists of maps (or any other common CSV task) is what you want to do most of the time. The library should do that, or at least provide routines for doing that.

Let me respond. I think it's fine for a library to provide routines to handle common cases. But it should definitely make a clear split between

  1. parsing the CSV (bare minimum) and 2) transforming that into maps. Why? Because 1 is more timeless than 2. #1 is true and correct to the de facto standard and is very unlikely to change. #2 is more likely to change and it's wrong in demonstrable cases. The two need to be separated because they change at different rates. I posit that this separation is objectively correct for the purposes of clojure.data.csv.

But there is still an argument for whether #2 should be in the library at all. Clojure makes #2 really easy. You can write a routine to convert a list of vectors of strings to a list of maps really easily.

(defn rows->maps [csv]
  (let [headers (map keyword (first csv))
        rows (rest csv)]
    (map #(zipmap headers %) rows)))

Wait, but do you want to convert them to keywords? Maybe this time, but what about next time? Better make it a parameter. And you might want to skip adding keys where the value is the empty string. Another option? I don't think so. Just let me write what I want instead of having to learn an API with bespoke option names and semantics.

Clojure has a great standard library. What's great about it is that it has found a powerful set of operations that are more timeless than CSV even. It would be impossible to model 100% of the use cases for a CSV parser. But it is possible to model 95% of the use cases of maps and sequences. Clojure's library is that model, and it only leaves out 5% because persistence is not always the answer. Clojure's standard library is built with the same level of minimalism: each operation only does the bare minimum.

Other languages don't make sequences of maps so easy to work with. Imagine trying to do this in JavaScript. To make rows->maps possible, your library would have to provide zipmap and maybe even implement lazy lists. Then you'd have to give them good names and documentation and train people how to use them. Not likely.

Or you could recommend Lodash and have everyone add yet another dependency (to import and to learn). No thanks.

Or you could just provide a few common use cases as routines, give them options, and have a backdoor for the cases you don't cover where people are left on their own. It's the practical thing to do in that case.

The problem is the library never converges. It's never done. There are always new use cases, new bugs, new corner cases, and new questions in the forum for how to deal with those uncommon use cases. Those libraries don't find the model. They're looking in the wrong place, in the use cases. They need to look deeper.

So, yeah, I would be happy if clojure.data.csv had some routines for the most common use cases. But I am even more happy that the library hasn't had major changes in years. It's a matter of taste, I suppose.

Clojure's approach is more humble. It says "I don't want to guess about what you want." That's what I like about Clojure and its ecosystem. We find good models. We don't always get it perfect, but we get it better than I've seen elsewhere. And I believe that if there were a good universal model of how to parse CSVs, we would be the ones to find it.

Book update 📖

Chapter 6 is out! Buy it now: Grokking Simplicity.

Also, the liveBook is ready for prime time! This means you can preview the content of the book before buying and you can read it online after you buy. Amazing!

You can buy the book and use the coupon code TSSIMPLICITY for 50% off.

Podcast episode🎙

In my most recent episode, we are reading from The Early History of Smalltalk by Alan Kay. Listen/watch/read here. Be sure to subscribe and tell your friends.

A new episode is coming soon. In it, I read Lambda: the Ultimate GOTO by Guy Steele, Jr.

Clojure Challenge 🤔

Last week's challenge

The challenge in Issue 364 was to calculate the tax and total of a shopping list. You can see the submissions.

You can leave comments on these submissions in the gist itself. Please leave comments! You can also hit the Subscribe button to keep abreast of the comments. We're all here to learn.

This week's challenge

model a Starbucks coffee

Starbucks famously has 80,000+ ways to make coffee due to the combinations of options you have when ordering them. For instance, there are different coffee roasts (dark, medium, light) and different sizes (tall, grande, venti). Your task is to find a succinct way to model them in Clojure.

One way that is not practical is to name each of the combinations with keywords. Find a practical way using Clojure data.

Here are the three use cases:

  1. The model should make it easy to make it. Someone will read this data with an appropriate UI to know how to make it.
  2. The model should make it easy to calculate the price.
  3. The model should be human-readable when printed on a ticket.

Make it fun for yourself and use your imagination! It could be about any coffee shop if you want.

As usua l, please reply to this email and let me know what you tried. I'll collect them up and share them in the next issue. If you don't want me to share your submission, let me know.

Rock on! Eric Normand