Sunday, June 2, 2019

Clojure for non Clojure programmers - Chapter 2, Our first program

In this series, I try and teach how to program in Clojure. My target audience are people who already know how to program in more traditional imperative, procedural, OOP, and garbage-collected languages such as Python, Java, C#, etc. I assume the reader is proficient in at least one of these. I try to be succinct, to the point, and use more familiar terms and comparisons.

Table of Content

Our first program

Preface

Clojure programs are actually Java programs. That’s something which will be a bit strange if you are coming from a language like Python, Ruby, C#, …, which have their own interpreters/compilers, VMs, and where all the ecosystem around them is designed specifically for them. What that means is that Clojure is a parasitic language and it needs a host to thrive. I hope you’ll come to realize that this is actually a great strength of Clojure, but I’m sure at first, it will be a pain point for most of you, which I will try and minimize. Just keep in mind that, you want to embrace the host as well as Clojure, and learning Clojure implies learning to leverage the host just as much.

A Timer application

Our first Clojure program will be a very simple Timer application. The timer starts at a certain time, in seconds, and counts down to 0.

(def start-time
  "Starting time in seconds, for our timer."
  10)

(defn start-timer
  "Starting from start-time-seconds, loop until the countdown reaches 0,
   printing the count at every second."
  [start-time-seconds]
  (loop [countdown start-time-seconds]
    (println countdown) ; Print the current countdown on its own line.
    (Thread/sleep 1000) ; Sleep for 1000 milliseconds.
    (if (> countdown 1) ; If we're not yet at the last second.
      ;; Decrement the countdown and keep looping.
      (recur (dec countdown))
      ;; Otherwise, print that we're done on its own line.
      (println "Done!"))))

;; Call start-timer with our start-time to start our timer.
(start-timer start-time)

There we have it, our first program. A full implementation of a timer application which counts down from 10 seconds and notifies us when the count reaches 0.

We did quite a few things here:

We wrote a fully runnable Clojure script (not to be confused with ClojureScript)
We defined a global variable: start-time
We defined a custom function: start-timer
We ran a recursive loop using: loop and recur
We used Java interop to make the JVM’s main thread sleep
We made a top-level call to our start-timer function

Now let’s learn how to run our program, so we can see it in action, and so you can start messing with it!

Running our program

In order for you to run our Timer application, you will need to:

Get a working JVM/JDK installed on your box.
Install the Clojure CLI.
Copy our code into a source file.
Run a command using the Clojure CLI that will launch our app.

Getting a working JVM/JDK installed on your box

As I mentioned in the preface, Clojure is a hosted language, and Clojure programs are actually Java programs. It goes to say then that you need to have Java installed to use Clojure. If you’re unfamiliar with the JVM and the JDK, I recommend reading The Definitive Guide to Clojure on the JVM, though I’d suggest you read it after you are done with this chapter.

First, it is possible you already have a Java JDK, so run the following command in your terminal shell to check if you do:

java -version

What you want to see is the command existing, and the word JDK anywhere in the printed result, with a version of 1.8 or above, such as:

$ java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (IcedTea 3.9.0) (build 1.8.0_181-b13 suse-1.1-x86_64)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

Finally, you need to make sure that JAVA_HOME is set on your path by running:

$ echo $JAVA_HOME
/usr/lib64/jvm/java

Or if you are on Windows with PowerShell: echo $Env:JAVA_HOME or with the command prompt: echo %JAVA_HOME%.

If you have all that, move to Install the Clojure CLI, and skip what follows.

A small note about the JDK and Oracle…

The JDK is a piece of software which contains many applications. The main one being the Java Virtual Machine (JVM), which is used to run Java and thus Clojure applications. Though it also contains things like the Java compiler, profiling tools, debugging tools, documentation generating tools, etc.

It used to be that some of those tools, and certain features of the JVM were not open source, but in 2019, Oracle made it all Open Source under GPL with classpath exception (code you link to it doesn’t need to be GPL).

Thus, the source code for the JVM and all JDK tools is fully Open Source and free to use under GPL, and maintained at OpenJDK.

That said, as the code is in C and C++ mostly, and building it yourself is a huge pain, you’re going to want pre-built binaries. This is where there are multiple vendors which come into the picture, offering pre-built binaries of the source code for the OpenJDK for various platforms. Not all vendors offer the binaries for free. This is often a source of confusion to people new to the Java ecosystem.

Specifically, Oracle offers binaries known as the OracleJDK, and as 2019, they are no longer free to use for commercial purpose, but are still free for personal or development use. Some of the guides I link below, to help you install and setup a JDK on your machine, link you to download the OracleJDK binaries. As they are not free for commercial purpose, it’s important for me to warn you about it. If your goal is just to learn for now, go ahead, but let me mention that there are free alternatives and I’ll put the download links as well, so as you follow the guides for installing and setting up a JDK, I recommend you swap out the download of the OracleJDK for one of the free (for all usage even commercial) ones instead:

Oracle’s OpenJDK builds

Oracle also offers free (even for commercial use) binaries of OpenJDK, known as Oracle’s OpenJDK. The caveat with these is that older versions don’t get back-ported security fixes or critical bug fixes. Thus to stay up to date with security and bug fixes, you have to upgrade to the latest version. For example, as soon as JDK 12 releases, after JDK 11, Oracle will stop releasing newer updated binary builds of JDK 11. So if you want new fixes, you have to upgrade to JDK 12. A new version is released every 6 months, thus to be up to date on all fixes, you have to upgrade the JDK version every 6 months. In a commercial setting, this might be annoying, and a little too fast. Which is why in general, I’d recommend you first try out one of the alternatives such as #2, #3 or #4.

AdoptOpenJDK

The Java community also offers free (for any use) binary builds of OpenJDK, known as AdoptOpenJDK. If you’re used to CPython, Ruby MRI, Haskell GHC, NodeJS, and other mostly community maintained language runtimes, you will feel right at home here, since it’s basically the same idea. The community volunteers do the work of packaging pre-built binaries and offers them for free. Unless #4 is available to you, or you know better, I’d go for this one.

Also, I’d suggest getting an LTS release, with the HotSpot GC, unless you know better.

Amazon Corretto

Amazon also offers free (for any use including commercial) binary builds of OpenJDK, known as Amazon Corretto. They only offer LTS releases, and have good documentation. The binaries are pretty high quality, as they are certified Java SE compatible, and are used by Amazon in production. This is a solid alternative to #2, especially for production deployments.

OpenJDK from your Linux distro

Many Linux distros offer their own binary builds as well. Search your package manager for openjdk and if you find a version of 8 or above, installing that might be the easiest. To get started, if your distro offers this, it’s the way to go, quick and painless, but might not be updated as regularly depending on your distro.

Whichever you choose, remember that they are all built from the same OpenJDK source code. These are not alternative implementations. The difference is what commits they are built from, if they cherry picked some extra security or bug fixes from later releases, how much testing was done on them afterwards, what platform they build it for, etc. Often times, the biggest difference is how long after they offer builds for older versions with patches applied. Otherwise, they are all going to give you the same set of features. Your Clojure code will be identical and will run on all of them.

Installing and setting up Java and the JDK

With that out of the way, here are the guides to help you install a JDK and/or set JAVA_HOME. I recommend you go for either JDK version 8 or 11, as they both offer long term support (or any newer LTS release if you are reading this into the far future). Similarly, current Clojure version (1.10 as of writing) requires version 8 minimum for the JVM. I will be using version 8 personally as I work through the series.

Installing a JDK

Setting up JAVA_HOME

Install the Clojure CLI

One thing that might be surprising is that Clojure is actually just a Java library. For those of you that know Java, that means Clojure is just a jar. It is published to Maven Central like any other Java library.

This is a bit inconvenient, because it means you always have to launch java with Clojure as a dependency to do anything with Clojure. To address that problem, there is an official Clojure CLI which wraps Java and Clojure together and makes it way easier to interact with Clojure without needing to worry too much about the details of Java. That’s what we’ll be using throughout this series. It is known as the Clojure CLI or tools.deps.

To install the Clojure CLI, follow the instructions in the Clojure installer and CLI tools sections of the official getting started guide.

Create a source file and run our program

Clojure code is contained within files that have the .clj or .cljc extension. For more complex programs, the source files have to be organized in a particular folder structure and with a certain naming convention very similar to that of Java’s. That said, for our first program, we only wrote a script. A script is a Clojure program which is fully contained within a single file. So for this chapter, we will focus on Clojure scripts only.

Create a file anywhere you want and call it timer.clj.
Copy/paste or hand type into the file our program code.
From your terminal shell, within the same directory as where your source file is, run clj timer.clj.

clj is the Clojure CLI. If you give it the path relative or absolute to a Clojure script file it will run it for you.

Now admire:

$ clj timer.clj 
10
9
8
7
6
5
4
3
2
1
Done!

Understanding the execution model

Clojure is a compiled language. This is unlike most other dynamic languages such as Python, JavaScript, Ruby, etc. Which are all interpreted languages, not compiled. As such, Clojure has a compiler which takes Clojure source code and compiles it to Java byte code.

For those who don’t know Java, it runs on a virtual machine called the JVM, and that machine takes its instructions from an assembly language which is known as the Java byte code. When it runs, the Java virtual machine (JVM) will take the Java byte code and compile it to compatible machine code for the currently running hardware (this is known as just in time compilation - JIT). Java source code is thus compiled to Java byte code using the Java compiler. The resulting Java byte code is then shipped to end users along with a JVM for their platform. Using the JVM, they can then run the compiled Java byte code on their respective machine.

It’s similar for Clojure, except it takes the place of Java source code. You write Clojure source code, and use the Clojure compiler to compile it to Java byte code. You can then ship the compiled byte code along with a JVM to an end user, and they can run your compiled Clojure program. This is effectively what you’ve done in this chapter. You installed a JVM, which is bundled as part of the JDK you installed. After which, you’ve used the Clojure CLI to launch a JVM process, with Clojure loaded as a dependency, and had it execute your Clojure script. So why didn’t you have to first compile your script you ask?

You see, another unfamiliar thing about Clojure is that the compiler is a part of the Clojure runtime. What this means is that you don’t have to explicitly compile your source code, Clojure will perform the compilation itself as your program launches. When your script is ran, Clojure first compiles all of the Clojure source code into Java byte code, and then loads the byte code into its own process, which is already running inside a JVM instance. This is how, even though it is a compiled language, it maintains the feel of an interpreted scripting language.

Let me walk through this once more. The Clojure CLI first starts a JVM process using the java command. It sets it up so that the Clojure runtime (which is just a Java library bundled as a .jar file) is on the classpath (Java parlance to mean it is a loaded dependency). It also sets it up so that clojure.main is run on load. That’s the main Clojure runtime entry point. Once the JVM process is loaded, it thus calls the Clojure runtime’s main method. This receives your script as an argument, at which point, it will compile it into Java byte code, load the byte code into itself, and execute your program.

Sounds more complicated than it is. Bottom line, if you use the Clojure CLI, Clojure feels like an interpreted language, yet it gives you the performance of a JIT compiled one. This is a great strength of Clojure, it is one of the most, if not the most performant dynamic programming language around today. Definitely faster than Python, Ruby, JavaScript. In fact, most of the time, Clojure is within 10% of Java’s performance, and with some more advanced knowledge, can be made to match it. Now it comes at the expense of start time, since Clojure has to compile the code and dynamically load all of the byte code when the program first starts. Don’t despair though, there are solutions to that as well, such as relying on one of its dialects: ClojureScript, Joker; or doing native image compilation using GraalVM.

Thus, when people refer to Clojure compilation, or Clojure being a compiled language, it doesn’t mean compiled to native machine code, but to JVM machine code, aka, Java byte code, aka, JVM byte code, or just byte code for short.

Postface

Awesome work! You now have Clojure installed on your machine and ready to be used. You’ve already ran your first Clojure program, a timer application. You have a better understanding of the JVM, JDK and Java byte code. You have a rough idea of how Clojure source code goes from code to compiled byte code to machine code to running program. Most importantly, you can start messing with Clojure on your own, by writing Clojure scripts, and running them with Clojure CLI (clj command) to try them out.

In the next chapter, we’re going to revisit our program, because now that we know how to run it, it is time to understand how it works.

Feel free to ask questions in the comments, and be patient, I’m working on the next chapters!

Friday, May 24, 2019

Clojure for non Clojure programmers - Chapter 1, Syntax

Table of Content

Syntax

Preface

Clojure takes its syntax from Lisp, but enhances it with a few additions. It should take you less than 30 minutes to learn the entire syntax and its rules, but a lot longer to understand code written in it. Think of syntax as the grammar and alphabet for a language. In that regard, Clojure’s syntax rules (think grammar) are very simple, consistent, and minimal. Its syntax (think alphabet) is quite complete, yet familiar and succinct. That said, Clojure has a large set of vocabulary (functions), and, like many mature spoken languages, it has a lot of exceptions to its rules (macros). A big part of learning Clojure will be learning its vocabulary (functions) and the exceptions to its rules (macros). This process is very similar to learning a spoken language, and quite different from how you might have learned other programming languages, which tend to lack a lot of the denser vocabulary you’ll find in Clojure. In addition to that, Clojure is read inside out, and uses a lot of bracketed symbols like parenthesis, square brackets, etc. Your eyes will need to adjust to these, the same way it would if you were learning to read a new alphabet. Just keep at it, and it’ll all become second-nature soon enough. I’d say it takes a good 80 to 120 hours for your eyes to adjust to Clojure’s syntax.

Clojure’s basic syntax rules

Clojure is an expression only language, which means it has no declarations, everything is a function call which runs some procedure and returns a value. Thus, the fundamental syntax block is a function call.

(function arg1 arg2 ... argN)

To run a function, you use parenthesis. The first form after the open parenthesis is the function you want to run, followed by one or more arguments to the function until the closing parenthesis.

The arguments can themselves be function calls, in which case, you do the same thing, you wrap them in parenthesis, with the function to call as the first form followed by the arguments.

(function arg1 (other-function arg1 arg2 ... argN) ... argN)

In this way, you can keep on nesting functions in functions in functions.

When a function is called, Clojure will first call the arguments to it in order from left to right. If those are function calls themselves, their arguments will also first be called in order from left to right. This is why Clojure must be read left to right and inside and out.

(f x (g (h y) z))

The argument x is evaluated and returns its value.
The argument y is evaluated and returns its value.
The function h is called with the value of y passed to it.
The argument z is evaluated and returns its value.
The function g is called with the value of h and z passed to it.
The return value of g is returned.
The function f is called with the value of x and g passed to it.
The return value of f is returned.

That’s the basic syntax rules of Clojure. You should now be able to follow along most Clojure code. There will be exceptions to this which is what the next section explains.

Clojure’s advanced syntax rules

In Clojure, a pair of parenthesis is known as an s-expression, or sexpr for short. It delimits a block of code. In general, sexprs are evaluated as explained in the Clojure’s basic syntax rules section. That’s true every time the first form is a function, but sometimes, it is a macro instead.

(macro arg1 arg2 ... argN)

You’ll learn about macros in-depth much later, but you need to know that macros can be called in place of functions, in which case, you have no idea in what order the arguments are evaluated, if at all. Macros are the exceptions to Clojure’s syntax rules. Each macro is its own exception, and you need to go read its documentation to learn how to use it. The important thing to remember is, know if what you are calling is a function or macro, if it’s a macro, stop and go read its doc.

You might sometimes come across special forms as well.

(special-form arg1 arg2 ... argN)

For now, consider those similar to macros, as the distinction isn’t very important, but just like macros, they are exceptions to the basic syntax rules, and you need to go read their documentation.

The good news is that, in general, Clojure macros and special forms follow one of a few similar patterns. So most of them should become intuitive quickly enough.

Clojure’s syntax

At this point, I’ve taught you the syntax rules, which is like the grammar. Now I’ll cover the syntax itself, which is like the alphabet.

In Clojure, each syntactical construct is known as a reader form, and you can find them all here, but I’ll go over and explain the main ones.

Symbols

Symbols are the main identifiers for variables, functions, namespaces, classes and other such things. They’re just a string of characters without quotes. They support Unicode characters.

Here’s an unqualified symbol:

person

Symbols can be qualified as well, which basically groups them inside a namespace.

com.my-app.entities/person

The namespace qualifier comes before the /.

Comments

You have line comments, which are lines beginning with one or more ;

; This line is commented out.
;; So is this one.

You don’t have multi-line comments, but you do have something called a form comment. It might be a little tricky to understand at first. Basically, a form is any syntactical unit or block, which in theory could span many lines. And you can comment out a form by putting #_ in front of it.

#_person
#_1234
#_"This is
commented out"
#_(+ 2 2)

All the above are examples of forms which are commented out. They are not line comments, because the following returns 6:

(+ 2 #_3 4)

Strings

Strings in Clojure are represented as characters between double quotes. Single quotes are used for something else.

"Hi"

They support full Unicode characters (variable length UTF 16) and multi-line strings.

"Hello Wörld!
I'm also a multi-line string."

Numbers

Most of what you expect is there:

;; Small numbers will be of type Long
1234
;; Big numbers will be of type BigInt
34787823974987498374982748749237498273498273498237498237492387492387492
;; Floating points will be of type Double
12.99
;; Or BigDecimal if you want arbitrary precision (notice the M suffix)
12.3423534634544234534634645342363462443649886756676645466767687M
;; Ratios can also be represented (this is not dividing 22 by 7, but storing the numerator and denominator in a ratio object)
22/7
;; Different radix (from 2 to 36) can be specified, radix comes first
2r011
;; Octal numbers
010
;; Hexadecimal numbers
0xff
;; Scientific notation
1.15e-4

Characters

Clojure has the concept of characters, and they are not just single character strings. They are written by appending a \ before the character or \u for Unicode and \o for octal. They don’t support the full Unicode range, as they are UCS-2 encoded, that is, they are fixed length 2 bytes, so only support code points between 0 and 65535.

;; The char a
\a
;; A tab
\tab
;; A newline
\newline
;; Unicode white smiling face
\u263a
;; Unicode copyright sign as octal (maxes out at 0377)
\o251

Nil

Clojure has the concept of null, but it’s called nil instead. In Clojure, nil is logically false as well. It is represented as a reserved symbol.

nil

Booleans

Clojure has proper boolean true and false representation. They’re represented as reserved symbols.

true
false

Keywords

Clojure has the concept of keywords, which are kind of like an alternative to Strings, but specifically designed to be used for keys, such as for indexing. Just like Strings, they are unique across the program, in that two identical keywords are actually pointing to the same memory address, and therefore they can be compared for equality very efficiently. They are represented like symbols, but with a : prefix.

:person

As I said, a common use case is to use them for keys, such as in a map:

;; The hash-map function takes as arguments alternating key/value pairs.
(hash-map :person1 "John Doe"
          :person2 "Mike Michel")
;; In other languages, Strings would have been used instead, but this is not idiomatic in Clojure
(hash-map "person1" "John Doe"
          "person2" "Judy Michel")

Other use of them are sometimes as a runtime identifier (where symbols are normally used as static identifiers), as enum elements, or as options. You can use them for anything though, they’re just a kind of value.

Just like symbols, they can be namespace qualified:

:animal/dog

Vectors

A Vector is an ordered collection of elements indexed by position (zero based). Think of it as a Java ArrayList or a Python Tuple. They are represented by a pair of enclosing square brackets with each elements separated by whitespace or commas.

;; No commas needed, just whitespace between elements suffice
[1 :a "Hello"]
;; I said whitespace, and not space, this means newlines and multiple spaces all work as well
[1
 :a      "Hello"]
;; But if you prefer, commas can be used as well
[1,:a,"Hello"]
;; Or a mix of commas and whitespace
[1, :a, "Hello"]

Whitespace

Clojure is peculiar in its whitespace in that commas are considered whitespace as well. That’s why Vector elements could also be comma separated, because to Clojure, that’s just whitespace anyways.

;; Just whitespace
,
;; Just more whitespace
,,,
;; This is a Vector of two elements seperated by a bunch of whitespace
[1,,,,,,,,,,,,,,,,2]

Keep that in mind, as I’ll stop pointing out commas specifically like I did in Vectors, since anywhere I say whitespace in the context of Clojure’s syntax it includes commas.

Maps

A map is an associative collection of key/values. They’re like Java HashMaps, or Python Dictionaries. They’re represented by a pair of enclosing curly brackets, with an even number of elements as alternating key value pairs separated by whitespace.

{:key "value"
 :another-key "another value"}

There’s a special syntax for namespaced maps. These are maps whose keys are qualified keywords from the same namespace. In such case, instead of repeating the namespace qualifier for every key, you can specify a default one for the whole map.

#:animal{:dog "woof"
         :cat "meow"
         :sheep "beh"}
;; It acts as a default, which is why this works:
#:animal{:dog "woof"
         :vehicle/car "vroom"}
;; And applies only to keywords, which is why:
#:animal{"dog" "woof"} ; returns {"dog" "woof"}

Sets

Sets are unordered collections of distinct elements. They’re like Java HashSets or Python’s Sets. They’re represented by enclosed curly brackets prefixed with # and with whitespace separated elements.

#{1 "hey" :foo}

Regexes

Clojure has first class support for regexes, and they are represented as a regex string between double quotes and prefixed with #.

#"^hello.[world]+$"

Symbolic values

Some values are symbolic, and don’t really have a value, but the idea of one. Some of those exists in Clojure as well, such as:

;; Positive infinity
##Inf
;; Negative infinity
##-Inf
;; Not a number (such as imaginary numbers)
##NaN

Postface

With the syntax and syntax rules I just covered, you should be in a pretty good place to start making sense of Clojure code. That said, there’s still some more advanced syntax I haven’t covered. The reader page from the official Clojure reference goes over it all. I’ve chosen to omit the more advanced parts, because they require understanding more advanced concepts which I’ll cover in future chapters, at which point, I’ll present the relevant syntax.

Feel free to ask questions in the comments, and be patient, I’m working on the next chapters!