PurelyFunctional.tv Newsletter 315: Use the correct data structure for the job

Issue 315 - February 25, 2019 · Archives · Subscribe

Clojure Tip 💡

Use the correct data structure for the job. Use multiple data structures if there are multiple jobs.

While I was iterating on the Sieve of Eratosthenes (last week's puzzle), I kept realizing that I was using the wrong data structure. I wanted to remove numbers from a collection. I was using a list with filter. It was very natural in Clojure.

But at some point I realized that removing something from a list was never going to be as fast as removing something from a set. I switched to a set and it was easily ten times faster.

The point is, each data structure has its fast and slow operations. It's good to step back and look at how you're using a collection and select the right one. And even us old hands sometimes forget---even after I wrote a whole guide to this way of thinking about collections.

But I still needed them in order! I wound up using both a set and a range, which is just a seq. Each had its job. In each iteration, I was calling disj on the set and drop-while on the seq, both of which are efficient for those collections. It was fast (though of course, not the fastest). The point is that the two jobs (1. remember what to work on later and 2. maintaining their order) weren't satisfied by a single collection, so I used a different data structure for each job.

I call this technique hybrid collection. I've written about it here. I also recorded a paid video lesson about it.

Brain skills 😎

When you're learning something new, try to look at it from multiple perspectives. Draw a diagram of the problem. Make a table comparing and contrasting aspects. Anthropomorphize the parts of an algorithm. Write it in natural language. Imagine how you'd do it without a computer. Explain it to a colleague. Explain it to a five-year-old.

Each of these perspective wil l give you insights into the problem. And one might "click" and give you the last piece of the puzzle to make it real.

Clojure Puzzle 🤔

Answer to last week's puzzle

Last week's puzzle was to implement the Sieve of Eratosthenes. My goal with including this puzzle was to encourage the skill of implementing algorithms described in an imperative style, as you often see on Wikipedia, using Clojure. It's not always obvious, but it is always possible and often more elegant.

I've set up a little test bench to make sure the submissions were correct, they worked for n=100,000, and to test how fast they were.

Check out the test bench and all submissions.

Interesting Stuff

I went through about 10 iterations, trying different combinations of data structures and transients. My fastest wound up getting about 127ms for the mean execution time. I learned a ton each time.

Peter Stromberg, creator of Calva, the Clojure plugin for VS Code, went through his iterations in public. His final version was faster than mine by an order of magnitude!

I thought mine was fast, but wow, Val's version beat mine by 2 orders of magnitude! How'd he do it? Using a mutable, Java boolean array! Nice creative solution to get more speed. He also mentioned that instead of going through Java interop in Clojure, it's often easier to use Zach Tellman's Virgil, which automatically compiles Java code into class files and loads them. It lets you write your algorithm in Java for speed, without sacrificing interactive development. Thanks, Val!

Steve Miner's version uses transients to get a lot more speed.

There were two that took a few seconds to generate primes under 10,000, but they get points for straightforward, idiomatic Clojure implementations: Josh Reighley's and Mathieu Viel's versions.

Finally, James Elliott submitted an article explaining how to do it in Clojure.

Congrats to all of the submissions. ⭐️! Check out the test bench and all submissions.

This week's challenge

I was reading the book How to Measure Anything. It talks about the Rule of Fives, which states that you can learn a lot by getting a random sample of 5 from any sized population. The median value for a measurement is 93.7% likely to fall within the min and max for the 5-point sample.

Stated another way, if you have ten million people in your country, you can figure out something about the median height of the population by randomly choosing five people. Choose those five people randomly, measure their heights. Take the min and max of those heights. The median for the entire population is 93.7% likely to fall between the min and the max.

This is is kind of surprising, but it makes sense. This article does a good job of explaining it. What I want to do is test it.

I've always thought a programming system should let you test these kinds of assertions quickly and easily. You learn something, you run an experiment, and conf irm it. To do that, you need something without much boilerplate---a language that lets you get down to business quickly. Clojure is pretty good at that and it's one of the reasons I like it.

So here's your puzzle: develop an experiment to empirically test the Rule of Fives.

Rock on!
Eric Normand

PS ✍️ The best way to learn Clojure

Clojure is my favorite language (and has been for over ten years), and I assume it's near the top of your list, too. When I first started, I had plenty of time at work to learn the language. I did simple problems and personal projects. I was extremely fortunate.

But now, I've got two kids. There are plenty of things I'd love to learn, but wow, I just don't have the time. I really sympathize with how much time it takes to learn something new---and how little time family life leaves for learning. That's why I want to make it as easy as possible to learn Clojure.

I think more people would like Clojure if they could see what I see, after years of learning. Clojure is a beautiful language. And it's extremely practical---once you learn how to use it. But it's not practical for you or your company if you don't know it. And that's why I created PurelyFunctional.tv. I want Clojure, and functional programming, to be a practical choice---as practical as imperative and OO languages---for millions of programmers.

I've been hard at work since 2013 making Clojure tutorials to make Clojure a practical choice for the industry as a whole. Since then, there are 80+ hours of paid Clojure video tutorials. You can purchase a monthly or yearly membership to those videos, or you can buy individual courses outright. Or, if it's not for you, you can read books-worth of written guides for free.

But if you do choose to pay, it helps me a lot. It helps me continue to produce the high-quality materials I am hard at work on. It helps me keep the old material up-to-date. It helps me write this weekly newsletter. Please consid e r paying because it helps more than you can know.

Another thing to consider is to convince your boss to pay for a PurelyFunctional.tv membership. Team memberships are available. The teams that are subscribed tell me how valuable it is that their whole team is able to learn from and refer to the same material. They share links to lessons on Slack and discuss the ideas as a group.

And even if you don't get a team membership, often there are budgets for training that go unused at the end of the year. If you're getting paid $8,000/month, what's $49/month to make you 10% more effective? Try that argument with your manager and tell me how it goes.

However you choose to support PurelyFunctional.tv, thanks. Even an email telling me how you like it keeps me going.

Rock on!
Eric