Functional Programming Without Feeling Stupid, Part 3: Higher-order functions
Welcome to the third installment of Functional programming without feeling stupid! I originally started to describe my own learnings about FP in general, and Clojure in particular, and soon found myself writing a kind of Clojure tutorial or introduction. It may not be as comprehensive as others out there, and I still don’t think of it as a tutorial — it’s more like a description of a process, and the documented evolution of a tool.
I wanted to use Clojure “in anger”, and found out that I was learning new and interesting stuff quickly. I wanted to share what I’ve learned in the hope that others may find it useful.
Some of the stuff I have done and described here might not be the most optimal, but I see nothing obviously wrong with my approach. Maybe you do; if that is the case, tell me about it in the comments, or contact me otherwise. But please be nice and constructive, because…
…in Part 0 I wrote about how some people may feel put off by the air of “smarter than thou” that sometimes floats around functional programming. I’m hoping to present the subject in a friendly way, because much of the techniques are not obvious to someone (like me) conditioned with a couple of decades of imperative, object-oriented programming. Not nearly as funny as Learn You a Haskell For Great Good, and not as zany as Clojure for the Brave and True — just friendly, and hopefully lucid.
xkcd 1270: Functional.
Licensed under Creative Commons Attribution-Non-Commercial License.
This is a company blog, so it is kind of commercial by definition. Is that a problem?
In Part 1 we played around with the Clojure REPL, and in Part 2 we started making definitions and actually got some useful results. In this third part we’re going to take a look at Clojure functions and how to use them, and create our own — because that’s what functional programming is all about.
We already have a function that turns a map value containing an offset and a character into a string representing a line of output:
(defn character-line [pair] (format "%08d: U+%06X %s" (:offset pair) (int (:character pair)) (character-name (:character pair))))
How do we apply that function to each character in the input string? In the imperative style you would whip up a for loop at this point (or maybe a list comprehension in Python).
Clojure does have a for
macro (very much like the Python list comprehension), but what we want to achieve
is easily done using a higher-order function. This is common to many things that traditionally are done using a for loop.
In functional programming, functions are values just like anything else. You can create them, store them, apply them and
pass them around. This opens up a whole new world of flexibility in program construction. Particularly, you can reduce
many problems that require “doing something” to every member of a collection into a mapping, where you apply
a function that does the “something”. Clojure has a map
function which does exactly that:
(def short-test-str "Naïve") #'user/short-test-str user=> (map int short-test-str) (78 97 239 118 101)
We just applied Clojure’s int
function to all the characters in the short test string. The result is a
sequence of numbers representing the code points of the characters in the string. (For a comparison between for and map,
see FOR vs MAP by John Lawrence Aspden.)
Can we do this with one of our functions? Yes we can:
user=> (map character-name short-test-str) ("LATIN CAPITAL LETTER N" "LATIN SMALL LETTER A" "LATIN SMALL LETTER I WITH DIAERESIS" "LATIN SMALL LETTER V" "LATIN SMALL LETTER E")
Since our character-name
function returns a string, we get back a sequence of strings representing
the names of the characters in short-test-str
.
Clojure’s map
function always returns a sequence. (If you look up the documentation
using (doc map)
, you will see that it actually returns a lazy sequence, but we’ll happily
gloss over the laziness for now, mainly because I’m also lazy.) A sequence, or seq for short,
is actually an abstraction which provides us with a sequential view over some collection. It is quite widely
used in Clojure programming: Clojure strings are actually sequences, and you can turn a vector into a sequence
with the seq
function.
Can we use the map
function with our character-line
function? Well, that function takes
a map value, so it’s not going to be that straight-forward. Also, there is a high potential for confusion
here: on one hand, we’re working with the map function, but on the other, we’re working with map
literals. You need to get these concepts straight in your head!
So, this probably won’t work, but let’s try it anyway:
user=> (map character-line short-test-str) NullPointerException clojure.lang.RT.intCast (RT.java:1087)
I don’t even know what that means. Well, OK, I sort of know — since the character-line
function expects a map, and we’re giving it a string (which is a sequence, really), it’s trying to
pull out the offset using the keyword function :offset
, and there is no way that will succeed.
But if I could create a map value for each of the characters in the string, put them in a collection, and then map
the character-line
function to each one of them, I would be much closer to my solution. At least I think so.
By using the Clojure map
function together with an anonymous function, I can quickly generate the
offset/character pairs I want. But what will the anonymous function do, and how can I define it?
Clojure’s into
function creates a new collection by conjoining items from two collections. So if I have
two maps — an empty map, and a map with the offset and character values — I can conjoin them like this:
user=> (into {} {:offset 0 :character \N}) {:offset 0, :character \N}
That doesn’t seem like much, but I have neglected to mention something quite important: the map function can take more than one collection. So, if I have two collections, one for the offsets and one for the characters, I can traverse them at the same time, and apply a function which takes the corresponding value from both collections.
Since we still don’t have the real offsets of the characters, let’s construct a collection of dummy offsets:
user=> (repeat (count short-test-str) 0) (0 0 0 0 0)
This piece of code uses two simple Clojure functions, repeat
and count
.
Whereas count
simply counts the number of elements in a collection, repeat
creates a new
sequence (a lazy one — there’s that word again) with the specified number of given elements.
So we get back as many zeros as there are characters in short-test-str
. We can use this as the collection
of dummy offsets since it has one item for each character.
The two collections we want to traverse at the same time are the dummy offsets and the characters of the string.
Our usage of the map
function would be something like:
map something (repeat (count short-test-str) 0) short-test-str)
What is that elusive “something“? It needs to be a function which takes the current items from both collections and conjoins them into a map:
#(into {} {:offset %1 :character %2})
The #()
notation stands for an anonymous function, and the %1
and %2
are placeholders
for the items from the two collections. When we put this together, we have:
(map #(into {} {:offset %1 :character %2}) (repeat (count short-test-str) 0) short-test-str)
And if we actually execute this in the REPL, here’s what we’ll see:
user=> (map #(into {} {:offset %1 :character %2}) (repeat (count short-test-str) 0) short-test-str) ({:offset 0, :character \N} {:offset 0, :character \a} {:offset 0, :character \ï} {:offset 0, :character \v} {:offset 0, :character \e})
The offsets are still zeros, but it seems to be working! What if… yes, let’s do it:
user=> (map character-line (map #(into {} {:offset %1 :character %2}) (repeat (count short-test-str) 0) short-test-str)) ("00000000: U+00004E LATIN CAPITAL LETTER N" "00000000: U+000061 LATIN SMALL LETTER A" "00000000: U+0000EF LATIN SMALL LETTER I WITH DIAERESIS" "00000000: U+000076 LATIN SMALL LETTER V" "00000000: U+000065 LATIN SMALL LETTER E")
Well, well — that’s what we have been looking for: mapping the character-line
function to all
the characters in the string! But we’re not done yet, oh no!
I bet your head is spinning from the nested functions and the parentheses. I know mine is, but turning this into a function will alleviate that condition. It is also a great opportunity to introduce Clojure’s local bindings.
Our first stab at the function could be something like this:
(defn character-lines [s] (map character-line (map #(into {} {:offset %1 :character %2}) (repeat (count s) 0) s)))
How is that better? Well, I’ve made this piece of code reusable for any string, not just
the short-test-str
defined earlier. Now any string that will be passed in will be
known as s
inside the function. But we can do better. The dummy offsets are easy
to pull out of the mess, and it will be easy to replace it with the actual offsets, if we ever
get around to actually computing them… For this we can use a local binding,
achieved with let
:
(defn character-lines [s] (let [offsets (repeat (count s) 0)] (map character-line (map #(into {} {:offset %1 :character %2}) offsets s))))
Local bindings in Clojure are vectors, containing pairs of items. The first item in the pair is
the name that will be used inside the function, and the second item is whatever the name
stands for. So in the local binding above, we say that we want the name offsets
to
refer to a collection generated using the repeat
function — we have bound
the name to the value locally, and that’s why it’s called a local binding.
If you throw this function at the REPL, you should be able to use it with whatever string you want:
user=> (defn character-lines [s] #_=> (let [offsets (repeat (count s) 0)] #_=> (map character-line (map #(into {} {:offset %1 :character %2}) #_=> offsets s)))) #'user/character-lines user=> (character-lines "Yay!") ("00000000: U+000059 LATIN CAPITAL LETTER Y" "00000000: U+000061 LATIN SMALL LETTER A" "00000000: U+000079 LATIN SMALL LETTER Y" "00000000: U+000021 EXCLAMATION MARK")
We can make this even cleaner by introducing another local binding for the offset/character pairs. Note that you can just redefine the function in the REPL by pasting it in. The new definition will replace the old one.
(defn character-lines [s] (let [offsets (repeat (count s) 0) pairs (map #(into {} {:offset %1 :character %2}) offsets s)] (map character-line pairs)))
Now there are two local bindings, offsets
and pairs
, and the actual body
of the function is reduced to (map character-line pairs)
.
By the way, we can make our original character-line
function a little better by using
a local binding. We have (:character pair)
twice in there, so let’s make it once only:
(defn character-line [pair] (let [ch (:character pair)] (format "%08d: U+%06X %s" (:offset pair) (int ch) (character-name ch))))
Or is that better? I’ll let you decide.
After that, all that is really missing are the offsets, but this is more than enough for one post.
UPDATE 2014-11-21: If you want to read more about Clojure’s higher-order functions or “HOFs”, try Christopher Maier’s Writing Elegant Clojure Code Using Higher-Order Functions.
Also, earlier I seem to have occasionally used a misleading term out of negligence: higher-order functions or HOFs are not “high-order” functions (those would be functions with a high order, I guess). They are indeed higher-order functions, or “functions of a higher order”.