Functional Programming Without Feeling Stupid, Part 5: Project
In the last four installments of Functional Programming Without Feeling Stupid
I’ve slowly built up a small utility called ucdump
with Clojure.
Experimenting and developing with the Clojure REPL is fun, but now it’s time to give
some structure to the utility. I’ll package it up as a Leiningen project and create
a standalone JAR for executing with the Java runtime.
Creating a new project with Leiningen
You can use Leiningen to create a skeleton project quickly. In my project’s root directory, I’ll say:
lein new app ucdump
Leiningen will respond with:
Generating a project called ucdump based on the 'app' template.
The result is a directory called ucdump
, which contains:
.gitignore README.md project.clj src/ LICENSE doc/ resources/ test/
For now I’m are most interested in the project file, project.clj
, which is actually
a Clojure source file, and the src
directory, which is intended for the app’s actual source files.
Leiningen creates a directory called src/ucdump
and seeds it with a core.clj
file,
but that’s not what actually what I want, for two reasons:
-
I want
ucdump
to be a good Clojure citizen, so I’m going to put it in a namespace calledcom.coniferproductions.ucdump
. -
My [Git repository for `ucdump`][4] also contains the original Python version of the application,
which is in
projectroot/python
, and I want the Clojure version to live inprojectroot/clojure
.
So first I’ll rename the ucdump
directory created by Leiningen to clojure
:
mv ucdump clojure
Then I’ll make the namespace directories and rename core.clj
to udump.clj
:
mkdir -p clojure/src/com/coniferproductions mv clojure/src/ucdump/core.clj clojure/src/com/coniferproductions/ucdump.clj rmdir clojure/src/ucdump mkdir -p clojure/test/com/coniferproductions mv clojure/test/ucdump/core_test.clj clojure/test/com/coniferproductions/ucdump_test.clj rmdir clojure/test/ucdump
This method of having each namespace in a separate file was suggested in the book Clojure Programming. The result looks like this:
clojure ├── LICENSE ├── README.md ├── doc │ └── intro.md ├── project.clj ├── resources ├── src │ └── com │ └── coniferproductions │ └── ucdump.clj └── test └── com └── coniferproductions └── ucdump_test.clj
There are some namespace references in the source files created by Leiningen which are now obsolete, so I’ll fix them eventually, but I’ll focus first on the project file. At this point it looks like this:
(defproject ucdump "0.1.0-SNAPSHOT" :description "FIXME: write description" :url "http://example.com/FIXME" :license {:name "Eclipse Public License" :url "http://www.eclipse.org/legal/epl-v10.html"} :dependencies [[org.clojure/clojure "1.6.0"]] :main ^:skip-aot ucdump.core :target-path "target/%s" :profiles {:uberjar {:aot :all}})
You can read up on the settings in the Leiningen tutorial. These are suitable for a standalone application, but the actual values still need to be fixed. When I’m done with the project file, it looks like this:
(defproject ucdump "0.1.0-SNAPSHOT" :description "Unicode character dump for UTF-8 encoded files" :url "https://github.com/jerekapyaho/ucdump" :license {:name "MIT License" :url "http://opensource.org/licenses/MIT"} :dependencies [[org.clojure/clojure "1.6.0"]] :main ^:skip-aot com.coniferproductions.ucdump :target-path "target/%s" :profiles {:uberjar {:aot :all}})
Putting the source code in its place
The source file created by Leiningen, which we moved to src/com/coniferproductions/ucdump.clj
,
initially looks like this:
(ns ucdump.core (:gen-class)) (defn -main "I don't do a whole lot ... yet." [& args] (println "Hello, World!"))
I won’t bother running that now (but I’ve done that with other projects before — it’s a useful
smoke test). Instead it’s time to pour all the code we wrote in the earlier parts of this series into
the ucdump.clj
source file. I’ll also fix the namespace definition at the top of the file,
and add some comments to the functions:
(ns com.coniferproductions.ucdump (:gen-class)) (def test-str "Na\u00EFve r\u00E9sum\u00E9s... for 0 \u20AC? Not bad!") (def test-ch { :offset 0 :character \u20ac }) (def short-test-str "Na\u00EFve") (defn character-name [x] (java.lang.Character/getName (int x))) (defn character-line [pair] (let [ch (:character pair)] (format "%08d: U+%06X %s" (:offset pair) (int ch) (character-name ch)))) (defn octet-count [cp] "Determines the length of a Unicode codepoint when encoded in UTF-8. See RFC 3629 for the details." (cond (and (>= cp 0x000000) (<= cp 0x00007F)) 1 (and (>= cp 0x000080) (<= cp 0x0007FF)) 2 (and (>= cp 0x000800) (<= cp 0x00FFFF)) 3 (and (>= cp 0x010000) (<= cp 0x10FFFF)) 4 :else 0)) (defn octet-counts [s] (map octet-count (map int s))) (defn character-lines [s] (let [offsets (butlast (cons 0 (reductions + (octet-counts s)))) pairs (map #(into {} {:offset %1 :character %2}) offsets s)] (map character-line pairs))) (defn -main [& args] (doseq [line (character-lines test-str)] (println line)))
The main program creates a line for each character in test-str
, and prints them to the standard output.
Leiningen knows from the project file’s :main setting that the function to call when starting the program
is in the com.coniferproductions.ucdump
namespace, so the -main
function from there is the one to use.
Time for a test run!
The application can be tested by changing to the project root directory and saying:
lein run
The result should be:
00000000: U+00004E LATIN CAPITAL LETTER N 00000001: U+000061 LATIN SMALL LETTER A 00000002: U+0000EF LATIN SMALL LETTER I WITH DIAERESIS 00000004: U+000076 LATIN SMALL LETTER V 00000005: U+000065 LATIN SMALL LETTER E 00000006: U+000020 SPACE 00000007: U+000072 LATIN SMALL LETTER R 00000008: U+0000E9 LATIN SMALL LETTER E WITH ACUTE 00000010: U+000073 LATIN SMALL LETTER S 00000011: U+000075 LATIN SMALL LETTER U 00000012: U+00006D LATIN SMALL LETTER M 00000013: U+0000E9 LATIN SMALL LETTER E WITH ACUTE 00000015: U+000073 LATIN SMALL LETTER S 00000016: U+00002E FULL STOP 00000017: U+00002E FULL STOP 00000018: U+00002E FULL STOP 00000019: U+000020 SPACE 00000020: U+000066 LATIN SMALL LETTER F 00000021: U+00006F LATIN SMALL LETTER O 00000022: U+000072 LATIN SMALL LETTER R 00000023: U+000020 SPACE 00000024: U+000030 DIGIT ZERO 00000025: U+000020 SPACE 00000026: U+0020AC EURO SIGN 00000029: U+00003F QUESTION MARK 00000030: U+000020 SPACE 00000031: U+00004E LATIN CAPITAL LETTER N 00000032: U+00006F LATIN SMALL LETTER O 00000033: U+000074 LATIN SMALL LETTER T 00000034: U+000020 SPACE 00000035: U+000062 LATIN SMALL LETTER B 00000036: U+000061 LATIN SMALL LETTER A 00000037: U+000064 LATIN SMALL LETTER D 00000038: U+000021 EXCLAMATION MARK
However, I want to read the text from a UTF-8 encoded file, so let’s make the -main
function do just that:
(defn -main [& args] (let [characters (slurp (nth args 0) :encoding "UTF-8")] (doseq [line (character-lines characters)] (println line))))
The slurp
function reads the contents of the file, and here I specify the encoding of the file
as “UTF-8”. (See the slurp
documentation for details.)
The args
vector contains the command-line arguments supplied to the application, so I take the
first argument with (nth args 0)
(the index of the first argument is zero) and use it as the filename.
For a very detailed look at running Clojure applications with Leiningen, see How Clojure Babies Are Made: Understanding lein run by Flying Machine Studios.
If I now specify the filename:
lein run ~/tmp/testfile-utf8.txt
then the application will produce same output as above, because my testfile-utf8.txt
contains the
same text as test-str
in the code.
Put it in a JAR
Leiningen has already equipped the project file with the means to make a standalone application. That is done by creating an “uberjar”, which packages up the application and all its dependencies so that it can be run using the Java VM. So if, in the project directory, I say:
lein uberjar
Leiningen responds with:
Compiling com.coniferproductions.ucdump Created /Users/Jere/Projects/ucdump/clojure/target/uberjar/ucdump-0.1.0-SNAPSHOT.jar Created /Users/Jere/Projects/ucdump/clojure/target/uberjar/ucdump-0.1.0-SNAPSHOT-standalone.jar
Now I can take this JAR and run it as a normal Java application:
cp target/uberjar/ucdump-0.1.0-SNAPSHOT-standalone.jar ~/tmp java -jar ~/tmp/ucdump-0.1.0-SNAPSHOT-standalone.jar ~/tmp/testfile-utf8.txt
The output is the same as above. However, if you neglect to provide the filename when you run the application, you will get an ugly error message:
Exception in thread "main" java.lang.IllegalArgumentException: No implementation of method: :make-reader of protocol: #'clojure.java.io/IOFactory found for class: nil
and a stack trace, which might make no sense at all. There is no need to add extensive command-line argument
handling to the application (if you need that, take a look at the tools.cli
library), but it’s good
to do a quick check for the missing argument. This requires one little change in the -main
function:
(defn -main [& args] (when (not= (count args) 0) (let [characters (slurp (nth args 0) :encoding "UTF-8")] (doseq [line (character-lines characters)] (println line)))))
If the argument count is not zero, read from the file specified in the first argument; otherwise do nothing.
To make ucdump a proper UNIX-style tool, it should read from standard input if there is no filename. Maybe I’ll update it to do so when I find out how. For the latest version of the source, see the ucdump GitHub repository.
Onwards
This concludes the series. I realise I have perhaps irrevocably managed to combine the words “functional”, “programming” and “stupid”, but the real intent is in the “without feeling” part. I’ve sometimes felt that I would need to be some sort of genius programmer to understand Clojure, and certainly some proponents make Clojure sound so obvious that you can’t help thinking if there’s really something wrong with me. There must be something in the air (and not just Clojure/conj coinciding, which I honestly didn’t know about), since I just found out that Adam Bard had published Clojure is not for geniuses on 18 November 2014, a day after I started this series. That’s parallel evolution at work!
I wanted to tease out some practical aspects of Clojure without theory or condescension, and hope that this series helps you learn a little more about Clojure programming.
UPDATE 2014-12-14: After nearly a month, I’m closing the comments on all parts of this series, because nothing but SPAM appeared.
UPDATE 2020-02-23: Corrected typos and formatting.
UPDATE 2021-09-19: Corrected an embarrassing logic error in combining offsets and characters.