Functional Programming Without Feeling Stupid, Part 2: Definitions

In this installment of Functional Programming Without Feeling Stupid I would like to show you how to define things in Clojure. Values and function applications are all well and good, but if we can’t give them symbolic names, we need to keep repeating them over and over.

Before we start naming things, let’s have a look at how Clojure integrates with Java. I’m assuming you are still in the REPL, or have started it again with lein repl.

Track 1: “If anyone should ask / We are mated”

As it happens, Clojure’s core library is lean and focused on manipulating the data structures of the language, so many things are deferred to the underlying Java machinery as a rule. For example, mathematical computation is typically done using the static methods in the java.lang.Math class:

    user=> (java.lang.Math/sqrt 5.0)
    2.23606797749979

As you can see, this is a function application like we have already seen in Part 1, but this time the function we are using is the sqrt static method in the java.lang.Math class.

Java 7 acquired Unicode character names, and they are accessed through the getName method in the java.lang.Character class. This is no mean feat, since there are over 110,000 characters in the Unicode standard, and each of them has a name (although some of them are algorithmically generated). To find out the canonical character name of a Unicode character, such as the euro currency symbol, you would use the getName static method:

    user=> (java.lang.Character/getName \u20ac)
    ClassCastException java.lang.Character cannot be cast to java.lang.Number user/eval703 (NO_SOURCE_FILE:1)

Hey, what’s wrong? Well, if you look up the documentation of java.lang.Character.getName, you will find out that it takes an int value as an argument, not a character. You can actually do this check from inside the REPL:

    user=> (javadoc java.lang.Character)
    true

The REPL doesn’t seem to do much, but you should now have a new web browser window or tab open, with the JavaDoc of the java.lang.Character class loaded up. That’s what the REPL meant when it said

    Javadoc: (javadoc java-object-or-class-here)

when it started up. The getName method does need an int value, so let’s try something else:

    user=> (java.lang.Character/getName (int \u20ac))
    "EURO SIGN"

All right! How about another one:

    user=> (java.lang.Character/getName 67)
    "LATIN CAPITAL LETTER C"

Well, some say that Clojure is the new C.

Track 2: “So think about this little scene / Apply it to your life”

Speaking of definitions, let’s do some, to make it easier to test the stuff we’ll be developing. Giving a symbolic name to the string we used in Part 1 is easy using def:

    user=> (def test-str "Na\u00EFve r\u00E9sum\u00E9s... for 0 \u20AC? Not bad!")
    #'user/test-str

Clojure tells us that there is now a name test-str in the user namespace. We don’t have to worry about namespaces now, we just need to know that when you start up the REPL, user will be the default namespace until it is changed. And since it is the default, you don’t even have to specify it:

    user=> test-str
    "Naïve résumés... for 0 €? Not bad!"

Of course, user/test-str would also work, but it would be redundant.

Our invocation of java.lang.Character.getName could be wrapped into something more comfortable. Let’s create a function with defn (short for de fine f unctio n):

    user=> (defn character-name [x]
    #_=> (java.lang.Character/getName (int x)))
    #'user/character-name

In the REPL that looks a little weird, because it continues over more than one line. In a Clojure source file it would look like this:

    (defn character-name [x]
      (java.lang.Character/getName (int x)))

This function is called character-name, and it takes one argument, called x. We’ll talk about that in a little while.

Typically Clojure source code is indented only two spaces, and all the trailing parentheses are placed on the same line, not on separate lines. You can find more information about source code formatting and coding conventions in The Clojure Style Guide. One thing to note right away is the naming style: Clojure programmers use lisp-style-names, not camelCasing like Java, and there are no explicit get prefixes in the names of functions that get something. So, while in Java you would call some method getCharacterName, in Clojure it would be called character-name, as we just did.

Let’s exercise this new function:

    user=> (character-name \u20ac)
    "EURO SIGN"

Hey, how come this worked with a character literal, but when we called java.lang.Character.getName like that, it barfed? Well, that’s because we essentially created a mini-API of our own with the character-name function. We can pass a character to it, and it will apply Clojure’s int function to the character to get its codepoint before invoking Java’s java.lang.Character.getName on it.

You don’t need to explicitly return a result from the function. The last value computed is the return value. You just need to arrange things so that the function returns what it should.

Does character-name return what we think it does?

    user=> (type (character-name \u20ac))
    java.lang.String

Yes, it returns a java.lang.String instance.

What about the function’s argument, which we imaginatively called x? In Clojure, functions are often so short that it doesn’t seem worth it to assign any fancy name to arguments, but it doesn’t hurt. We could have called it ch, character, etc.

Why is the argument in square brackets? Because it is a Clojure vector, a data structure which allows efficient random access, unlike a list. It is the closest analogy to Java’s ArrayList or Python’s list.

The character-name function only takes one argument, but maybe we’ll deal with multiple arguments later.

Track 3: “Learn to love me / Assemble the ways”

So far, so good. We can now get the name of any Unicode character. With the help of some more core functions we can construct most of the output we want. Remember that we wanted to get from this:

    Naïve

To this:

    00000000: U+00004E LATIN CAPITAL LETTER N
    00000001: U+000061 LATIN SMALL LETTER A
    00000002: U+0000EF LATIN SMALL LETTER I WITH DIAERESIS
    00000004: U+000076 LATIN SMALL LETTER V
    00000005: U+000065 LATIN SMALL LETTER E

I made the input string a little shorter this time, to better make the point. The most important thing here is that we want one line of output per character, and each of the lines have a certain format.

In addition to the character name, we actually also have another output component already available: the codepoint of the character. Let’s construct a facsimile of the output line with the format function:

    user=> (format "U+%06X %s" (int \N) (character-name \N))
    "U+00004E LATIN CAPITAL LETTER N"

The U+xxxxxx notation is the customary way of presenting Unicode codepoints with six hexadecimal digits. If the placeholders in the format string look kind of familiar, that’s because they are the same used by java.lang.String.format (and actually Clojure’s format function is just a wrapper around it — try (source format) in the REPL to see for yourself).

We could even through in a dummy offset to make the result more realistic:

    user=> (format "%08d: U+%06X %s" 0 (int \N) (character-name \N))
    "00000000: U+00004E LATIN CAPITAL LETTER N"

We apply the format function to four arguments:

ArgumentMeaning
"%08d: U+%06X %s"the format string
0the dummy offset
(int \N)the codepoint of the character
(character-name \N)the name of the character

The last two of the arguments are the results of applying two different functions to the same value, the character.

Let’s pull in another Clojure data structure, the map. Since we have two related values, we can construct a map and use keywords to get at the values, like this:

    user=> { :offset 0 :character \u20ac }
    {:offset 0, :character \€}

The curly brackets are used to make a map literal in Clojure.

Let’s give this ephemeral map literal a name, so we can practice using a Clojure map:

    user=> (def test-ch { :offset 0 :character \u20ac })
    #'user/test-ch
    user=> test-ch
    {:offset 0, :character \€}

The symbols with a colon in front of them are map keywords, and they are actually functions. So, if you want to pull the character out of test-ch, you say:

    user=> (:character test-ch)
    \€

Armed with this knowledge, let’s construct another function that takes a map with an offset and a character, and returns a string with the formatted output we want:

    user=> (defn character-line [pair]
    #_=>     (format "%08d: U+%06X %s"
    #_=>       (:offset pair) (int (:character pair))
    #_=>       (character-name (:character pair))))
    #'user/character-line

Once again, doing this in the REPL is a little messy. In a source file the function would look like this:

    (defn character-line [pair]
      (format "%08d: U+%06X %s"
        (:offset pair) (int (:character pair))
        (character-name (:character pair))))

How do you call this thing? You make a map literal and pass it in:

    user=> (character-line {:offset 0 :character \N})
    "00000000: U+00004E LATIN CAPITAL LETTER N"

The character-line function expects one argument, which we call pair inside the function. It is supposed to be a map value. Before calling the function you pack the offset (a dummy one at this point) and the character into the map. The function then picks the values it wants from the map, and applies some functions to them. There is some redundancy, but we’ll fix that later.

Why are we making all these functions? Couldn’t we just do the stuff in place? Yes, and Clojure also has anonymous functions, but it’s a fine line between one-off functions and reusable functions. There is an epigram by the late, great Alan J. Perlis which touches on this:

It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.

For more, see the collection of Perlis' epigrams in programming.

Phew! It’s time for a breather. Next time we’ll try to figure out how to apply the character-line function to all the characters in a string. That will introduce us to Clojure sequences. We’ll also look at local bindings, which are sort of like variables, only not. Remember, this stuff is different.

Don’t forget the Clojure from the ground up series. It presents Clojure in much more detail.

Or maybe you want to hit the books:

Tell me what you think of this in the comments, or even just call out the tracks quoted in the subheadings.

UPDATE 2014-12-14: After nearly a month, I’m closing the comments on all parts of this series, because nothing but SPAM appeared.