Functional Programming Without Feeling Stupid, Part 2: Definitions
In this installment of Functional Programming Without Feeling Stupid I would like to show you how to define things in Clojure. Values and function applications are all well and good, but if we can’t give them symbolic names, we need to keep repeating them over and over.
Before we start naming things, let’s have a look at how Clojure integrates with Java. I’m assuming
you are still in the REPL, or have started it again with lein repl
.
Track 1: “If anyone should ask / We are mated”
As it happens, Clojure’s core library is lean and focused on manipulating the data structures of the language,
so many things are deferred to the underlying Java machinery as a rule. For example, mathematical computation is
typically done using the static methods in the java.lang.Math
class:
user=> (java.lang.Math/sqrt 5.0) 2.23606797749979
As you can see, this is a function application like we have already seen in
Part 1,
but this time the function we are using is the sqrt
static method in the
java.lang.Math
class.
Java 7 acquired Unicode character names, and they are accessed through the
getName
method in the java.lang.Character
class. This is no mean feat,
since there are over 110,000 characters in the Unicode standard, and each of them has a name
(although some of them are algorithmically generated). To find out the canonical character name
of a Unicode character, such as the euro currency symbol, you would use the getName
static method:
user=> (java.lang.Character/getName \u20ac) ClassCastException java.lang.Character cannot be cast to java.lang.Number user/eval703 (NO_SOURCE_FILE:1)
Hey, what’s wrong? Well, if you look up the documentation of java.lang.Character.getName
,
you will find out that it takes an int
value as an argument, not a character. You can actually
do this check from inside the REPL:
user=> (javadoc java.lang.Character) true
The REPL doesn’t seem to do much, but you should now have a new web browser window or tab open,
with the JavaDoc of the
java.lang.Character class
loaded up. That’s what the REPL meant when it said
Javadoc: (javadoc java-object-or-class-here)
when it started up. The getName
method does need an int
value, so let’s try something else:
user=> (java.lang.Character/getName (int \u20ac)) "EURO SIGN"
All right! How about another one:
user=> (java.lang.Character/getName 67) "LATIN CAPITAL LETTER C"
Well, some say that Clojure is the new C.
Track 2: “So think about this little scene / Apply it to your life”
Speaking of definitions, let’s do some, to make it easier to test the stuff we’ll be developing.
Giving a symbolic name to the string we used in Part 1 is easy using def
:
user=> (def test-str "Na\u00EFve r\u00E9sum\u00E9s... for 0 \u20AC? Not bad!") #'user/test-str
Clojure tells us that there is now a name test-str
in the user namespace. We don’t have to worry
about namespaces now, we just need to know that when you start up the REPL, user
will be the default
namespace until it is changed. And since it is the default, you don’t even have to specify it:
user=> test-str "Naïve résumés... for 0 €? Not bad!"
Of course, user/test-str
would also work, but it would be redundant.
Our invocation of java.lang.Character.getName
could be wrapped into something more comfortable.
Let’s create a function with defn
(short for de fine f unctio n):
user=> (defn character-name [x] #_=> (java.lang.Character/getName (int x))) #'user/character-name
In the REPL that looks a little weird, because it continues over more than one line. In a Clojure source file it would look like this:
(defn character-name [x] (java.lang.Character/getName (int x)))
This function is called character-name
, and it takes one argument, called x
.
We’ll talk about that in a little while.
Typically Clojure source code is indented only two spaces, and all the trailing parentheses are placed
on the same line, not on separate lines. You can find more information about source code formatting
and coding conventions in The Clojure Style
Guide. One thing to note right away is the naming
style: Clojure programmers use lisp-style-names
, not camelCasing
like Java, and there are no
explicit get
prefixes in the names of functions that get something. So, while in Java you would
call some method getCharacterName
, in Clojure it would be called character-name
, as we just did.
Let’s exercise this new function:
user=> (character-name \u20ac) "EURO SIGN"
Hey, how come this worked with a character literal, but when we called java.lang.Character.getName
like that,
it barfed? Well, that’s because we essentially created a mini-API of our own with the character-name
function. We can pass a character to it, and it will apply Clojure’s int
function to the character
to get its codepoint before invoking Java’s java.lang.Character.getName
on it.
You don’t need to explicitly return a result from the function. The last value computed is the return value. You just need to arrange things so that the function returns what it should.
Does character-name
return what we think it does?
user=> (type (character-name \u20ac)) java.lang.String
Yes, it returns a java.lang.String
instance.
What about the function’s argument, which we imaginatively called x
? In Clojure, functions
are often so short that it doesn’t seem worth it to assign any fancy name to arguments, but it
doesn’t hurt. We could have called it ch
, character
, etc.
Why is the argument in square brackets? Because it is a Clojure vector, a data structure which allows
efficient random access, unlike a list. It is the closest analogy to Java’s ArrayList
or Python’s list
.
The character-name
function only takes one argument, but maybe we’ll deal with multiple arguments later.
Track 3: “Learn to love me / Assemble the ways”
So far, so good. We can now get the name of any Unicode character. With the help of some more core functions we can construct most of the output we want. Remember that we wanted to get from this:
Naïve
To this:
00000000: U+00004E LATIN CAPITAL LETTER N 00000001: U+000061 LATIN SMALL LETTER A 00000002: U+0000EF LATIN SMALL LETTER I WITH DIAERESIS 00000004: U+000076 LATIN SMALL LETTER V 00000005: U+000065 LATIN SMALL LETTER E
I made the input string a little shorter this time, to better make the point. The most important thing here is that we want one line of output per character, and each of the lines have a certain format.
In addition to the character name, we actually also have another output component already available: the
codepoint of the character. Let’s construct a facsimile of the output line with the format
function:
user=> (format "U+%06X %s" (int \N) (character-name \N)) "U+00004E LATIN CAPITAL LETTER N"
The U+xxxxxx
notation is the customary way of presenting Unicode codepoints with six hexadecimal
digits. If the placeholders in the format string look kind of familiar, that’s because they are the
same used by java.lang.String.format
(and actually Clojure’s format
function
is just a wrapper around it — try (source format)
in the REPL to see for yourself).
We could even through in a dummy offset to make the result more realistic:
user=> (format "%08d: U+%06X %s" 0 (int \N) (character-name \N)) "00000000: U+00004E LATIN CAPITAL LETTER N"
We apply the format
function to four arguments:
Argument | Meaning |
---|---|
"%08d: U+%06X %s" | the format string |
0 | the dummy offset |
(int \N) | the codepoint of the character |
(character-name \N) | the name of the character |
The last two of the arguments are the results of applying two different functions to the same value, the character.
Let’s pull in another Clojure data structure, the map. Since we have two related values, we can construct a map and use keywords to get at the values, like this:
user=> { :offset 0 :character \u20ac } {:offset 0, :character \€}
The curly brackets are used to make a map literal in Clojure.
Let’s give this ephemeral map literal a name, so we can practice using a Clojure map:
user=> (def test-ch { :offset 0 :character \u20ac }) #'user/test-ch user=> test-ch {:offset 0, :character \€}
The symbols with a colon in front of them are map keywords, and they are actually functions.
So, if you want to pull the character out of test-ch
, you say:
user=> (:character test-ch) \€
Armed with this knowledge, let’s construct another function that takes a map with an offset and a character, and returns a string with the formatted output we want:
user=> (defn character-line [pair] #_=> (format "%08d: U+%06X %s" #_=> (:offset pair) (int (:character pair)) #_=> (character-name (:character pair)))) #'user/character-line
Once again, doing this in the REPL is a little messy. In a source file the function would look like this:
(defn character-line [pair] (format "%08d: U+%06X %s" (:offset pair) (int (:character pair)) (character-name (:character pair))))
How do you call this thing? You make a map literal and pass it in:
user=> (character-line {:offset 0 :character \N}) "00000000: U+00004E LATIN CAPITAL LETTER N"
The character-line
function expects one argument, which we call pair
inside the
function. It is supposed to be a map value. Before calling the function you pack the offset (a dummy one
at this point) and the character into the map. The function then picks the values it wants from the map,
and applies some functions to them. There is some redundancy, but we’ll fix that later.
Why are we making all these functions? Couldn’t we just do the stuff in place? Yes, and Clojure also has anonymous functions, but it’s a fine line between one-off functions and reusable functions. There is an epigram by the late, great Alan J. Perlis which touches on this:
It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.For more, see the collection of Perlis' epigrams in programming.
Phew! It’s time for a breather. Next time we’ll try to figure out how to apply the character-line
function to all the characters in a string. That will introduce us to Clojure sequences. We’ll also look at
local bindings, which are sort of like variables, only not. Remember, this stuff is different.
Don’t forget the Clojure from the ground up series. It presents Clojure in much more detail.
Or maybe you want to hit the books:
- Clojure Programming by Chas Emerick, Brian Carper and Christophe Grand
- Clojure Cookbook by Luke VanderHart and Ryan Neufeld
Tell me what you think of this in the comments, or even just call out the tracks quoted in the subheadings.
UPDATE 2014-12-14: After nearly a month, I’m closing the comments on all parts of this series, because nothing but SPAM appeared.