Friday, May 24, 2019

Clojure for non Clojure programmers - Chapter 1, Syntax

In this series, I try and teach how to program in Clojure. My target audience are people who already know how to program in more traditional imperative, procedural, OOP, and garbage-collected languages such as Python, Java, C#, etc. I assume the reader is proficient in at least one of these. I try to be succinct, to the point, and use more familiar terms and comparisons.

Table of Content

Syntax

Preface

Clojure takes its syntax from Lisp, but enhances it with a few additions. It should take you less than 30 minutes to learn the entire syntax and its rules, but a lot longer to understand code written in it. Think of syntax as the grammar and alphabet for a language. In that regard, Clojure’s syntax rules (think grammar) are very simple, consistent, and minimal. Its syntax (think alphabet) is quite complete, yet familiar and succinct. That said, Clojure has a large set of vocabulary (functions), and, like many mature spoken languages, it has a lot of exceptions to its rules (macros). A big part of learning Clojure will be learning its vocabulary (functions) and the exceptions to its rules (macros). This process is very similar to learning a spoken language, and quite different from how you might have learned other programming languages, which tend to lack a lot of the denser vocabulary you’ll find in Clojure. In addition to that, Clojure is read inside out, and uses a lot of bracketed symbols like parenthesis, square brackets, etc. Your eyes will need to adjust to these, the same way it would if you were learning to read a new alphabet. Just keep at it, and it’ll all become second-nature soon enough. I’d say it takes a good 80 to 120 hours for your eyes to adjust to Clojure’s syntax.

Clojure’s basic syntax rules

Clojure is an expression only language, which means it has no declarations, everything is a function call which runs some procedure and returns a value. Thus, the fundamental syntax block is a function call.

To run a function, you use parenthesis. The first form after the open parenthesis is the function you want to run, followed by one or more arguments to the function until the closing parenthesis.

The arguments can themselves be function calls, in which case, you do the same thing, you wrap them in parenthesis, with the function to call as the first form followed by the arguments.

In this way, you can keep on nesting functions in functions in functions.

When a function is called, Clojure will first call the arguments to it in order from left to right. If those are function calls themselves, their arguments will also first be called in order from left to right. This is why Clojure must be read left to right and inside and out.

  1. The argument x is evaluated and returns its value.
  2. The argument y is evaluated and returns its value.
  3. The function h is called with the value of y passed to it.
  4. The argument z is evaluated and returns its value.
  5. The function g is called with the value of h and z passed to it.
  6. The return value of g is returned.
  7. The function f is called with the value of x and g passed to it.
  8. The return value of f is returned.

That’s the basic syntax rules of Clojure. You should now be able to follow along most Clojure code. There will be exceptions to this which is what the next section explains.

Clojure’s advanced syntax rules

In Clojure, a pair of parenthesis is known as an s-expression, or sexpr for short. It delimits a block of code. In general, sexprs are evaluated as explained in the Clojure’s basic syntax rules section. That’s true every time the first form is a function, but sometimes, it is a macro instead.

You’ll learn about macros in-depth much later, but you need to know that macros can be called in place of functions, in which case, you have no idea in what order the arguments are evaluated, if at all. Macros are the exceptions to Clojure’s syntax rules. Each macro is its own exception, and you need to go read its documentation to learn how to use it. The important thing to remember is, know if what you are calling is a function or macro, if it’s a macro, stop and go read its doc.

You might sometimes come across special forms as well.

For now, consider those similar to macros, as the distinction isn’t very important, but just like macros, they are exceptions to the basic syntax rules, and you need to go read their documentation.

The good news is that, in general, Clojure macros and special forms follow one of a few similar patterns. So most of them should become intuitive quickly enough.

Clojure’s syntax

At this point, I’ve taught you the syntax rules, which is like the grammar. Now I’ll cover the syntax itself, which is like the alphabet.

In Clojure, each syntactical construct is known as a reader form, and you can find them all here, but I’ll go over and explain the main ones.

Symbols

Symbols are the main identifiers for variables, functions, namespaces, classes and other such things. They’re just a string of characters without quotes. They support Unicode characters.

Here’s an unqualified symbol:

Symbols can be qualified as well, which basically groups them inside a namespace.

The namespace qualifier comes before the /.

Comments

You have line comments, which are lines beginning with one or more ;

You don’t have multi-line comments, but you do have something called a form comment. It might be a little tricky to understand at first. Basically, a form is any syntactical unit or block, which in theory could span many lines. And you can comment out a form by putting #_ in front of it.

All the above are examples of forms which are commented out. They are not line comments, because the following returns 6:

Strings

Strings in Clojure are represented as characters between double quotes. Single quotes are used for something else.

They support full Unicode characters (variable length UTF 16) and multi-line strings.

Numbers

Most of what you expect is there:

Characters

Clojure has the concept of characters, and they are not just single character strings. They are written by appending a \ before the character or \u for Unicode and \o for octal. They don’t support the full Unicode range, as they are UCS-2 encoded, that is, they are fixed length 2 bytes, so only support code points between 0 and 65535.

Nil

Clojure has the concept of null, but it’s called nil instead. In Clojure, nil is logically false as well. It is represented as a reserved symbol.

Booleans

Clojure has proper boolean true and false representation. They’re represented as reserved symbols.

Keywords

Clojure has the concept of keywords, which are kind of like an alternative to Strings, but specifically designed to be used for keys, such as for indexing. Just like Strings, they are unique across the program, in that two identical keywords are actually pointing to the same memory address, and therefore they can be compared for equality very efficiently. They are represented like symbols, but with a : prefix.

As I said, a common use case is to use them for keys, such as in a map:

Other use of them are sometimes as a runtime identifier (where symbols are normally used as static identifiers), as enum elements, or as options. You can use them for anything though, they’re just a kind of value.

Just like symbols, they can be namespace qualified:

Vectors

A Vector is an ordered collection of elements indexed by position (zero based). Think of it as a Java ArrayList or a Python Tuple. They are represented by a pair of enclosing square brackets with each elements separated by whitespace or commas.

Whitespace

Clojure is peculiar in its whitespace in that commas are considered whitespace as well. That’s why Vector elements could also be comma separated, because to Clojure, that’s just whitespace anyways.

Keep that in mind, as I’ll stop pointing out commas specifically like I did in Vectors, since anywhere I say whitespace in the context of Clojure’s syntax it includes commas.

Maps

A map is an associative collection of key/values. They’re like Java HashMaps, or Python Dictionaries. They’re represented by a pair of enclosing curly brackets, with an even number of elements as alternating key value pairs separated by whitespace.

There’s a special syntax for namespaced maps. These are maps whose keys are qualified keywords from the same namespace. In such case, instead of repeating the namespace qualifier for every key, you can specify a default one for the whole map.

Sets

Sets are unordered collections of distinct elements. They’re like Java HashSets or Python’s Sets. They’re represented by enclosed curly brackets prefixed with # and with whitespace separated elements.

Regexes

Clojure has first class support for regexes, and they are represented as a regex string between double quotes and prefixed with #.

Symbolic values

Some values are symbolic, and don’t really have a value, but the idea of one. Some of those exists in Clojure as well, such as:

Postface

With the syntax and syntax rules I just covered, you should be in a pretty good place to start making sense of Clojure code. That said, there’s still some more advanced syntax I haven’t covered. The reader page from the official Clojure reference goes over it all. I’ve chosen to omit the more advanced parts, because they require understanding more advanced concepts which I’ll cover in future chapters, at which point, I’ll present the relevant syntax.

Feel free to ask questions in the comments, and be patient, I’m working on the next chapters!

1 comment:

  1. The Java VM uses UTF-16 encoding for strings in memory. That uses 2 consecutive 16-bit numbers to represent characters outside of the BMP.

    I do not recall in which Java VM versions this started, but there is a memory saving optimization for the common case of ASCII strings where the Java VM will only use 8 bits per character instead of 16, if the entire string contains only ASCII characters.

    ReplyDelete