Sunday, June 2, 2019

Clojure for non Clojure programmers - Chapter 2, Our first program

In this series, I try and teach how to program in Clojure. My target audience are people who already know how to program in more traditional imperative, procedural, OOP, and garbage-collected languages such as Python, Java, C#, etc. I assume the reader is proficient in at least one of these. I try to be succinct, to the point, and use more familiar terms and comparisons.

Table of Content

Our first program

Preface

Clojure programs are actually Java programs. That’s something which will be a bit strange if you are coming from a language like Python, Ruby, C#, …, which have their own interpreters/compilers, VMs, and where all the ecosystem around them is designed specifically for them. What that means is that Clojure is a parasitic language and it needs a host to thrive. I hope you’ll come to realize that this is actually a great strength of Clojure, but I’m sure at first, it will be a pain point for most of you, which I will try and minimize. Just keep in mind that, you want to embrace the host as well as Clojure, and learning Clojure implies learning to leverage the host just as much.

A Timer application

Our first Clojure program will be a very simple Timer application. The timer starts at a certain time, in seconds, and counts down to 0.

(def start-time
  "Starting time in seconds, for our timer."
  10)

(defn start-timer
  "Starting from start-time-seconds, loop until the countdown reaches 0,
   printing the count at every second."
  [start-time-seconds]
  (loop [countdown start-time-seconds]
    (println countdown) ; Print the current countdown on its own line.
    (Thread/sleep 1000) ; Sleep for 1000 milliseconds.
    (if (> countdown 1) ; If we're not yet at the last second.
      ;; Decrement the countdown and keep looping.
      (recur (dec countdown))
      ;; Otherwise, print that we're done on its own line.
      (println "Done!"))))

;; Call start-timer with our start-time to start our timer.
(start-timer start-time)

There we have it, our first program. A full implementation of a timer application which counts down from 10 seconds and notifies us when the count reaches 0.

We did quite a few things here:

  1. We wrote a fully runnable Clojure script (not to be confused with ClojureScript)
  2. We defined a global variable: start-time
  3. We defined a custom function: start-timer
  4. We ran a recursive loop using: loop and recur
  5. We used Java interop to make the JVM’s main thread sleep
  6. We made a top-level call to our start-timer function

Now let’s learn how to run our program, so we can see it in action, and so you can start messing with it!

Running our program

In order for you to run our Timer application, you will need to:

  1. Get a working JVM/JDK installed on your box.
  2. Install the Clojure CLI.
  3. Copy our code into a source file.
  4. Run a command using the Clojure CLI that will launch our app.

Getting a working JVM/JDK installed on your box

As I mentioned in the preface, Clojure is a hosted language, and Clojure programs are actually Java programs. It goes to say then that you need to have Java installed to use Clojure. If you’re unfamiliar with the JVM and the JDK, I recommend reading The Definitive Guide to Clojure on the JVM, though I’d suggest you read it after you are done with this chapter.

First, it is possible you already have a Java JDK, so run the following command in your terminal shell to check if you do:

java -version

What you want to see is the command existing, and the word JDK anywhere in the printed result, with a version of 1.8 or above, such as:

$ java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (IcedTea 3.9.0) (build 1.8.0_181-b13 suse-1.1-x86_64)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

Finally, you need to make sure that JAVA_HOME is set on your path by running:

$ echo $JAVA_HOME
/usr/lib64/jvm/java

Or if you are on Windows with PowerShell: echo $Env:JAVA_HOME or with the command prompt: echo %JAVA_HOME%.

If you have all that, move to Install the Clojure CLI, and skip what follows.

A small note about the JDK and Oracle…

The JDK is a piece of software which contains many applications. The main one being the Java Virtual Machine (JVM), which is used to run Java and thus Clojure applications. Though it also contains things like the Java compiler, profiling tools, debugging tools, documentation generating tools, etc.

It used to be that some of those tools, and certain features of the JVM were not open source, but in 2019, Oracle made it all Open Source under GPL with classpath exception (code you link to it doesn’t need to be GPL).

Thus, the source code for the JVM and all JDK tools is fully Open Source and free to use under GPL, and maintained at OpenJDK.

That said, as the code is in C and C++ mostly, and building it yourself is a huge pain, you’re going to want pre-built binaries. This is where there are multiple vendors which come into the picture, offering pre-built binaries of the source code for the OpenJDK for various platforms. Not all vendors offer the binaries for free. This is often a source of confusion to people new to the Java ecosystem.

Specifically, Oracle offers binaries known as the OracleJDK, and as 2019, they are no longer free to use for commercial purpose, but are still free for personal or development use. Some of the guides I link below, to help you install and setup a JDK on your machine, link you to download the OracleJDK binaries. As they are not free for commercial purpose, it’s important for me to warn you about it. If your goal is just to learn for now, go ahead, but let me mention that there are free alternatives and I’ll put the download links as well, so as you follow the guides for installing and setting up a JDK, I recommend you swap out the download of the OracleJDK for one of the free (for all usage even commercial) ones instead:

  1. Oracle’s OpenJDK builds

Oracle also offers free (even for commercial use) binaries of OpenJDK, known as Oracle’s OpenJDK. The caveat with these is that older versions don’t get back-ported security fixes or critical bug fixes. Thus to stay up to date with security and bug fixes, you have to upgrade to the latest version. For example, as soon as JDK 12 releases, after JDK 11, Oracle will stop releasing newer updated binary builds of JDK 11. So if you want new fixes, you have to upgrade to JDK 12. A new version is released every 6 months, thus to be up to date on all fixes, you have to upgrade the JDK version every 6 months. In a commercial setting, this might be annoying, and a little too fast. Which is why in general, I’d recommend you first try out one of the alternatives such as #2, #3 or #4.

  1. AdoptOpenJDK

The Java community also offers free (for any use) binary builds of OpenJDK, known as AdoptOpenJDK. If you’re used to CPython, Ruby MRI, Haskell GHC, NodeJS, and other mostly community maintained language runtimes, you will feel right at home here, since it’s basically the same idea. The community volunteers do the work of packaging pre-built binaries and offers them for free. Unless #4 is available to you, or you know better, I’d go for this one.

Also, I’d suggest getting an LTS release, with the HotSpot GC, unless you know better.

  1. Amazon Corretto

Amazon also offers free (for any use including commercial) binary builds of OpenJDK, known as Amazon Corretto. They only offer LTS releases, and have good documentation. The binaries are pretty high quality, as they are certified Java SE compatible, and are used by Amazon in production. This is a solid alternative to #2, especially for production deployments.

  1. OpenJDK from your Linux distro

Many Linux distros offer their own binary builds as well. Search your package manager for openjdk and if you find a version of 8 or above, installing that might be the easiest. To get started, if your distro offers this, it’s the way to go, quick and painless, but might not be updated as regularly depending on your distro.

Whichever you choose, remember that they are all built from the same OpenJDK source code. These are not alternative implementations. The difference is what commits they are built from, if they cherry picked some extra security or bug fixes from later releases, how much testing was done on them afterwards, what platform they build it for, etc. Often times, the biggest difference is how long after they offer builds for older versions with patches applied. Otherwise, they are all going to give you the same set of features. Your Clojure code will be identical and will run on all of them.

Installing and setting up Java and the JDK

With that out of the way, here are the guides to help you install a JDK and/or set JAVA_HOME. I recommend you go for either JDK version 8 or 11, as they both offer long term support (or any newer LTS release if you are reading this into the far future). Similarly, current Clojure version (1.10 as of writing) requires version 8 minimum for the JVM. I will be using version 8 personally as I work through the series.

Installing a JDK
Setting up JAVA_HOME

Install the Clojure CLI

One thing that might be surprising is that Clojure is actually just a Java library. For those of you that know Java, that means Clojure is just a jar. It is published to Maven Central like any other Java library.

This is a bit inconvenient, because it means you always have to launch java with Clojure as a dependency to do anything with Clojure. To address that problem, there is an official Clojure CLI which wraps Java and Clojure together and makes it way easier to interact with Clojure without needing to worry too much about the details of Java. That’s what we’ll be using throughout this series. It is known as the Clojure CLI or tools.deps.

To install the Clojure CLI, follow the instructions in the Clojure installer and CLI tools sections of the official getting started guide.

Create a source file and run our program

Clojure code is contained within files that have the .clj or .cljc extension. For more complex programs, the source files have to be organized in a particular folder structure and with a certain naming convention very similar to that of Java’s. That said, for our first program, we only wrote a script. A script is a Clojure program which is fully contained within a single file. So for this chapter, we will focus on Clojure scripts only.

  1. Create a file anywhere you want and call it timer.clj.
  2. Copy/paste or hand type into the file our program code.
  3. From your terminal shell, within the same directory as where your source file is, run clj timer.clj.

clj is the Clojure CLI. If you give it the path relative or absolute to a Clojure script file it will run it for you.

Now admire:

$ clj timer.clj 
10
9
8
7
6
5
4
3
2
1
Done!

Understanding the execution model

Clojure is a compiled language. This is unlike most other dynamic languages such as Python, JavaScript, Ruby, etc. Which are all interpreted languages, not compiled. As such, Clojure has a compiler which takes Clojure source code and compiles it to Java byte code.

For those who don’t know Java, it runs on a virtual machine called the JVM, and that machine takes its instructions from an assembly language which is known as the Java byte code. When it runs, the Java virtual machine (JVM) will take the Java byte code and compile it to compatible machine code for the currently running hardware (this is known as just in time compilation - JIT). Java source code is thus compiled to Java byte code using the Java compiler. The resulting Java byte code is then shipped to end users along with a JVM for their platform. Using the JVM, they can then run the compiled Java byte code on their respective machine.

It’s similar for Clojure, except it takes the place of Java source code. You write Clojure source code, and use the Clojure compiler to compile it to Java byte code. You can then ship the compiled byte code along with a JVM to an end user, and they can run your compiled Clojure program. This is effectively what you’ve done in this chapter. You installed a JVM, which is bundled as part of the JDK you installed. After which, you’ve used the Clojure CLI to launch a JVM process, with Clojure loaded as a dependency, and had it execute your Clojure script. So why didn’t you have to first compile your script you ask?

You see, another unfamiliar thing about Clojure is that the compiler is a part of the Clojure runtime. What this means is that you don’t have to explicitly compile your source code, Clojure will perform the compilation itself as your program launches. When your script is ran, Clojure first compiles all of the Clojure source code into Java byte code, and then loads the byte code into its own process, which is already running inside a JVM instance. This is how, even though it is a compiled language, it maintains the feel of an interpreted scripting language.

Let me walk through this once more. The Clojure CLI first starts a JVM process using the java command. It sets it up so that the Clojure runtime (which is just a Java library bundled as a .jar file) is on the classpath (Java parlance to mean it is a loaded dependency). It also sets it up so that clojure.main is run on load. That’s the main Clojure runtime entry point. Once the JVM process is loaded, it thus calls the Clojure runtime’s main method. This receives your script as an argument, at which point, it will compile it into Java byte code, load the byte code into itself, and execute your program.

Sounds more complicated than it is. Bottom line, if you use the Clojure CLI, Clojure feels like an interpreted language, yet it gives you the performance of a JIT compiled one. This is a great strength of Clojure, it is one of the most, if not the most performant dynamic programming language around today. Definitely faster than Python, Ruby, JavaScript. In fact, most of the time, Clojure is within 10% of Java’s performance, and with some more advanced knowledge, can be made to match it. Now it comes at the expense of start time, since Clojure has to compile the code and dynamically load all of the byte code when the program first starts. Don’t despair though, there are solutions to that as well, such as relying on one of its dialects: ClojureScript, Joker; or doing native image compilation using GraalVM.

Thus, when people refer to Clojure compilation, or Clojure being a compiled language, it doesn’t mean compiled to native machine code, but to JVM machine code, aka, Java byte code, aka, JVM byte code, or just byte code for short.

Postface

Awesome work! You now have Clojure installed on your machine and ready to be used. You’ve already ran your first Clojure program, a timer application. You have a better understanding of the JVM, JDK and Java byte code. You have a rough idea of how Clojure source code goes from code to compiled byte code to machine code to running program. Most importantly, you can start messing with Clojure on your own, by writing Clojure scripts, and running them with Clojure CLI (clj command) to try them out.

In the next chapter, we’re going to revisit our program, because now that we know how to run it, it is time to understand how it works.

Feel free to ask questions in the comments, and be patient, I’m working on the next chapters!

Friday, May 24, 2019

Clojure for non Clojure programmers - Chapter 1, Syntax

In this series, I try and teach how to program in Clojure. My target audience are people who already know how to program in more traditional imperative, procedural, OOP, and garbage-collected languages such as Python, Java, C#, etc. I assume the reader is proficient in at least one of these. I try to be succinct, to the point, and use more familiar terms and comparisons.

Table of Content

Syntax

Preface

Clojure takes its syntax from Lisp, but enhances it with a few additions. It should take you less than 30 minutes to learn the entire syntax and its rules, but a lot longer to understand code written in it. Think of syntax as the grammar and alphabet for a language. In that regard, Clojure’s syntax rules (think grammar) are very simple, consistent, and minimal. Its syntax (think alphabet) is quite complete, yet familiar and succinct. That said, Clojure has a large set of vocabulary (functions), and, like many mature spoken languages, it has a lot of exceptions to its rules (macros). A big part of learning Clojure will be learning its vocabulary (functions) and the exceptions to its rules (macros). This process is very similar to learning a spoken language, and quite different from how you might have learned other programming languages, which tend to lack a lot of the denser vocabulary you’ll find in Clojure. In addition to that, Clojure is read inside out, and uses a lot of bracketed symbols like parenthesis, square brackets, etc. Your eyes will need to adjust to these, the same way it would if you were learning to read a new alphabet. Just keep at it, and it’ll all become second-nature soon enough. I’d say it takes a good 80 to 120 hours for your eyes to adjust to Clojure’s syntax.

Clojure’s basic syntax rules

Clojure is an expression only language, which means it has no declarations, everything is a function call which runs some procedure and returns a value. Thus, the fundamental syntax block is a function call.

To run a function, you use parenthesis. The first form after the open parenthesis is the function you want to run, followed by one or more arguments to the function until the closing parenthesis.

The arguments can themselves be function calls, in which case, you do the same thing, you wrap them in parenthesis, with the function to call as the first form followed by the arguments.

In this way, you can keep on nesting functions in functions in functions.

When a function is called, Clojure will first call the arguments to it in order from left to right. If those are function calls themselves, their arguments will also first be called in order from left to right. This is why Clojure must be read left to right and inside and out.

  1. The argument x is evaluated and returns its value.
  2. The argument y is evaluated and returns its value.
  3. The function h is called with the value of y passed to it.
  4. The argument z is evaluated and returns its value.
  5. The function g is called with the value of h and z passed to it.
  6. The return value of g is returned.
  7. The function f is called with the value of x and g passed to it.
  8. The return value of f is returned.

That’s the basic syntax rules of Clojure. You should now be able to follow along most Clojure code. There will be exceptions to this which is what the next section explains.

Clojure’s advanced syntax rules

In Clojure, a pair of parenthesis is known as an s-expression, or sexpr for short. It delimits a block of code. In general, sexprs are evaluated as explained in the Clojure’s basic syntax rules section. That’s true every time the first form is a function, but sometimes, it is a macro instead.

You’ll learn about macros in-depth much later, but you need to know that macros can be called in place of functions, in which case, you have no idea in what order the arguments are evaluated, if at all. Macros are the exceptions to Clojure’s syntax rules. Each macro is its own exception, and you need to go read its documentation to learn how to use it. The important thing to remember is, know if what you are calling is a function or macro, if it’s a macro, stop and go read its doc.

You might sometimes come across special forms as well.

For now, consider those similar to macros, as the distinction isn’t very important, but just like macros, they are exceptions to the basic syntax rules, and you need to go read their documentation.

The good news is that, in general, Clojure macros and special forms follow one of a few similar patterns. So most of them should become intuitive quickly enough.

Clojure’s syntax

At this point, I’ve taught you the syntax rules, which is like the grammar. Now I’ll cover the syntax itself, which is like the alphabet.

In Clojure, each syntactical construct is known as a reader form, and you can find them all here, but I’ll go over and explain the main ones.

Symbols

Symbols are the main identifiers for variables, functions, namespaces, classes and other such things. They’re just a string of characters without quotes. They support Unicode characters.

Here’s an unqualified symbol:

Symbols can be qualified as well, which basically groups them inside a namespace.

The namespace qualifier comes before the /.

Comments

You have line comments, which are lines beginning with one or more ;

You don’t have multi-line comments, but you do have something called a form comment. It might be a little tricky to understand at first. Basically, a form is any syntactical unit or block, which in theory could span many lines. And you can comment out a form by putting #_ in front of it.

All the above are examples of forms which are commented out. They are not line comments, because the following returns 6:

Strings

Strings in Clojure are represented as characters between double quotes. Single quotes are used for something else.

They support full Unicode characters (variable length UTF 16) and multi-line strings.

Numbers

Most of what you expect is there:

Characters

Clojure has the concept of characters, and they are not just single character strings. They are written by appending a \ before the character or \u for Unicode and \o for octal. They don’t support the full Unicode range, as they are UCS-2 encoded, that is, they are fixed length 2 bytes, so only support code points between 0 and 65535.

Nil

Clojure has the concept of null, but it’s called nil instead. In Clojure, nil is logically false as well. It is represented as a reserved symbol.

Booleans

Clojure has proper boolean true and false representation. They’re represented as reserved symbols.

Keywords

Clojure has the concept of keywords, which are kind of like an alternative to Strings, but specifically designed to be used for keys, such as for indexing. Just like Strings, they are unique across the program, in that two identical keywords are actually pointing to the same memory address, and therefore they can be compared for equality very efficiently. They are represented like symbols, but with a : prefix.

As I said, a common use case is to use them for keys, such as in a map:

Other use of them are sometimes as a runtime identifier (where symbols are normally used as static identifiers), as enum elements, or as options. You can use them for anything though, they’re just a kind of value.

Just like symbols, they can be namespace qualified:

Vectors

A Vector is an ordered collection of elements indexed by position (zero based). Think of it as a Java ArrayList or a Python Tuple. They are represented by a pair of enclosing square brackets with each elements separated by whitespace or commas.

Whitespace

Clojure is peculiar in its whitespace in that commas are considered whitespace as well. That’s why Vector elements could also be comma separated, because to Clojure, that’s just whitespace anyways.

Keep that in mind, as I’ll stop pointing out commas specifically like I did in Vectors, since anywhere I say whitespace in the context of Clojure’s syntax it includes commas.

Maps

A map is an associative collection of key/values. They’re like Java HashMaps, or Python Dictionaries. They’re represented by a pair of enclosing curly brackets, with an even number of elements as alternating key value pairs separated by whitespace.

There’s a special syntax for namespaced maps. These are maps whose keys are qualified keywords from the same namespace. In such case, instead of repeating the namespace qualifier for every key, you can specify a default one for the whole map.

Sets

Sets are unordered collections of distinct elements. They’re like Java HashSets or Python’s Sets. They’re represented by enclosed curly brackets prefixed with # and with whitespace separated elements.

Regexes

Clojure has first class support for regexes, and they are represented as a regex string between double quotes and prefixed with #.

Symbolic values

Some values are symbolic, and don’t really have a value, but the idea of one. Some of those exists in Clojure as well, such as:

Postface

With the syntax and syntax rules I just covered, you should be in a pretty good place to start making sense of Clojure code. That said, there’s still some more advanced syntax I haven’t covered. The reader page from the official Clojure reference goes over it all. I’ve chosen to omit the more advanced parts, because they require understanding more advanced concepts which I’ll cover in future chapters, at which point, I’ll present the relevant syntax.

Feel free to ask questions in the comments, and be patient, I’m working on the next chapters!

Sunday, May 6, 2018

The Various Kinds of IO - Blocking, Non-blocking, Multiplexed and Async.

I’ve found it was particularly hard to demystify the various kinds of IO that are now offered to software programmers. There is a lot of confusion out there about what are the differences between blocking, non-blocking, multiplexed and async IO. So I thought I’d give a shot at clarifying what each of these kind of IO entails.

In Hardware

On modern operating systems, IO, for input/output, is a way for data to be exchanged with peripherals. This includes reading or writing to disk, or SSD, sending or receiving data over the network, displaying data to the monitor, receiving keyboard and mouse input, etc.

The ways in which modern operating systems communicate with peripherals depends on the specific type of peripherals and their firmware and hardware capabilities. In general, you can assume that the peripherals are quite advanced, and they can handle multiple concurrent requests to write or read data. That is, gone are the days of serial communication. In that sense, all communications between the CPU and peripherals is therefore asynchronous at a hardware level.

This asynchronous mechanism is called hardware interrupt. Think of the simple case, where the CPU would ask the peripheral to read something, and it would then go in an infinite loop, where each time it would check with the peripheral if the data is now available for it to consume, looping until the peripheral gives it the data. This method is known as polling, since the CPU needs to keep checking on the peripheral. On modern hardware, what happens instead is that the CPU asks the peripheral to perform an action, and then forgets about it, continuing to process other CPU instructions. Once the peripheral is done, it will signal the CPU through a circuit interrupt. This happens in hardware, and makes it so the CPU never waits or checks on the peripheral, thus freeing it to perform other work, until the peripheral itself says it’s done.

In Software

So now that we have an understanding of what happens in hardware, we can move on to the software side. This is where the IO is exposed in various kinds, such as blocking, non-blocking, multiplexed and async. Lets go over them one by one.

Blocking

Remember how a user program runs inside a process, and code is executed within the context of a thread? This is always the case, and thus say you are writing a program that needs to read from a file. With blocking IO, what you do is ask the operating system, from your thread, to put your thread to sleep, and wake it up only once the data from the file is available to be consumed by your thread.

That is, blocking IO is called blocking because the thread which uses it will block and sleep until the IO is done.

Non-blocking

The problem with blocking IO is that your thread is sleeping until the IO is done, and thus, it can’t do anything else while it’s waiting for the IO. Sometimes, there’s nothing else your program could be doing, but if there was, it would be nice to be able to do that work concurrently, as it waits for the IO.

One way to do that is with what is called non-blocking IO. The idea is that when you make the call to read the file, instead of putting your thread to sleep, the OS simply returns to you either the file’s content that it read, or a pending status that tells you the IO is not done. It will not block your thread, but it’s still your job to check back at a later time to see if the IO is done. This means you are free to branch on the pending status, and go perform some other work, and when you need the IO again, you can call read for it once more, at which point if the IO is done, it will return the file’s content, otherwise it will once again return a pending status, and you can again choose to go perform some other work.

Multiplexed

The problem with non-blocking IO is that it gets strange if the other work that you want to be doing while waiting for the IO is itself more IO.

In a good scenario, you ask the OS to read content from file A, and then you go perform some heavy computation, once you’re done, you check if file A is done reading, if so, you do whatever you needed that file’s content for, otherwise, you go do some more heavy processing, rinse and repeat. But in the bad scenario, you don’t have heavy processing to do, in fact, you want to read file A, and you also want to read file B. So while you wait for file A’s IO, you make a non-blocking call to read file B’s. Now what will you do while waiting for both? There’s nothing for you left to do, so your program has to go into an infinite polling loop, where you keep checking if A is done, and then if B is done, over and over. This will either consume the CPU simply to poll for the status of your non-blocking IO calls, or you’ll have to manually add some arbitrary sleep time, meaning that you’ll probably realize the IO is ready a little after it really was, slowing your program’s throughput.

To avoid this problem, you can use multiplexed IO instead. What this does is that you will once again block on the IO, but instead of blocking on a single IO operation to be done and then the other, you are able to queue up all the IO operations you need done, and then block on all of them. The OS will wake you up when any one of them is done. Some implementations of multiplexed IO allow even more control, where you can specify that you want to be woken up only if some specified set of IO operations are done, like when file A and C or file B and D are done.

So you would make a non-blocking call to read file A, and then a non-blocking call to read file B, and finally you will tell the OS, put my thread to sleep, and wake it up when A and B’s IO are both done, or when anyone of them is done.

Async

The problem with multiplexed IO is that you’re still sleeping until IO is ready for you to handle. Once again, for many program that’s fine, maybe you have nothing else to be doing while you wait for IO operations to be done. Sometimes though, you do have other things you could be doing. Maybe you’re computing the digits of PI, while also summing up the values in a bunch of files. What you’d like to do is queue up all your file reads, and while you wait for them to be read, you would compute digits of PI. When a file is done reading, you’d sum up its values, and then go back to computing more digits of PI until another file is done reading.

For this to work, you will need a way for your digits of PI computation to be interrupted by the IO as it completes, and you’d need the IO to perform the interrupt when it completes.

This is done through event callbacks. The call to perform a read takes a callback, and returns immediately. On the IO completing, the OS will suspend your thread, and execute your callback. Once the callback is done executing, it will resume your thread.

Multi-threaded vs Single Threaded?

You’d have noticed that all kinds of IO that I described only speak of one single thread, which is your main application thread. The truth is, IO does not require a thread to be performed, because as I explained in the beginning, the peripherals all perform the IO asynchronously within their own circuitry. Thus it’s possible to do blocking, non-blocking, multiplexed and async IO all within a single threaded model. Which is why concurrent IO can work without multi-threaded support.

Now, the processing that is done on the result of the IO operations, or which is requesting the IO operations can obviously be multi-threaded if you need it to be. This allows you to have concurrent computation on top of concurrent IO. So nothing prevents mixing multi-threaded and these IO mechanisms.

In fact, there is a very popular fifth kind of IO which does depend on multi-threading. It is often confusingly referred to as non-blocking IO or async IO also, because it presents itself with a similar interface as one or the other. In truth, it is faking true non-blocking or async IO. The way it works is simple, it uses blocking IO, but each blocking call is made in its own thread. Now depending on the implementation, it either takes a callback, or uses a polling model, like returning a Future.

In Closing

I hope this has clarified your understanding of the various kinds of IO. It’s important to keep in mind that these are not all supported by all operating systems and for all peripherals. Similarly, not all programming languages expose an API for all kinds of IO the operating system supports.

There you go. All various kinds of IO explained.

Hope you enjoyed!

Further Reading

Disclaimer

I am not a system level programmer, and so I’m not an expert on all kinds of IO operating systems offer. This post is my best effort to sum up what I know, which I would say is probably intermediate level knowledge. Thus please correct me in the comments if you find that anything here is wrong.

Monday, April 16, 2018

Overview of ClojureScript Features

ClojureScript is the Clojure compiler for JavaScript. With it, you can use a dialect of Clojure, called ClojureScript, which is 99% identical in syntax and semantics to Clojure, to write your JavaScript code. So wherever you used to use JavaScript, you can now use Clojure instead!

Not sure what ClojureScript has to offer? Well this is the overview for you.

Please note that this list was put together at the time of ClojureScript 1.10. If you are using an older version, some features might be missing, and if you are using a newer version, there might be newer features not mentioned here. You can refer to the official changes list for all the gritty details.

ClojureScript offers some of the following features:

Proper modules

Language-level support for modules. This lets you define logical components with clear dependencies on one another. In ClojureScript, we call them namespaces, and you create them with ns. ClojureScript namespaces are compatible with Google Closure modules too, so you can use those as is. It also has native support to convert CommonJS, AMD, ES6 and Node modules to Google Closure modules, only a few lines of configs are required for the conversion, you can read more about it here.

(ns cljs-features-demo
  (:require [goog.string :as gstr]
            goog.string.format))

;; You can require other modules, such as the standard Google Closure
;; library, which is part of ClojureScript's standard library.

;; Other modules can now require the use of `cljs-features-demo`.

;; `:as` was used to create an alias, so we can use `gstr` instead of
;; `goog.string` to access functions in `goog.string`.

Lambda functions

Lambda functions are a way to declare a function inline, and with minimal syntax. Generally, they don't have names, though they can. They are similar to Java 8, C# and JavaScript arrow's syntax declared with =>, except they are declared with a different syntax in ClojureScript, respectively either using #() or fn. The former is shorter to type, but the latter has more features. Both support full closure, and will therefore capture the global and local context along with them.

;;; With shorthand `#()` syntax

(remove #(= % 10) [2 4 8 10 12])
;;=> [2 4 8 12]

;; %, %2, %3, %... are the arguments


;;; With custom named arguments using `fn` syntax

(remove (fn[v] (= v 10)) [2 4 8 10 12])
;;=> [2 4 8 12]

Protocol based OOP

ClojureScript is a functional programming language first and foremost, yet it allows you to do much the same things that you would do using JavaScript's objects and prototypes. Protocols define a new interface, and any type that implements them will have an implementation for the methods they define. Records are a named bundle of properties, much like JavaScript objects, which can implement Protocols. That said, they are immutable, and don't support inheritance, preferring composition over it.

;; All objects are interface based, extendable and immutable.


;; Notice how the interface is at the forefront, and is defined first
(defprotocol APerson
  (age-by [this years])
  (change-name [this new-name]))


;; The class is defined second
(defrecord Person
    [name age] ; The object fields

  APerson ; Define that Person implements the methods of APerson
  (age-by [this years] ; The first method implementation
    (assoc this :age (+ age years))) ; Adds years to the age of the object under this

  (change-name [this new-name] ; The second method implementation
    (assoc this :name new-name))) ; Replaces name of the object by new-name.


;; Create a new instance of Person
(def john (->Person "John" 22))
;;=> #cljs.user.Person{:name "John", :age 22}


;; Call methods on it
(age-by john 10)
;;=> #cljs.user.Person{:name "John", :age 32}


;; But everything is immutable, including objects
(:age john)
;;=> 22

Large standard library

ClojureScript comes bundled with more functions and objects than normal JavaScript. It always includes the vast set of functions and objects from the ClojureScript standard library, as well as the Google Closure library.

That's right, ClojureScript always includes the Google Closure library, making it an integral part of ClojureScript. Whenever you can't find what you want in the ClojureScript standard lib, look to the Google Closure library instead.

The Google Closure library is similar to jQuery and core.js, it includes a huge swat of functions to manipulate the DOM, for server communication, animation, data-structures, unit testing, text editing and more. It also comes with UI widgets and controls. It is used by Google in all of their web products such as Gmail, Search, Maps, Docs, Photos, etc. Its robust, highly optimized, complete, well tested, and it also will minimize itself to end up taking the least amount of space possible, making page load fast. Oh, and it does all that in a way that works across all browsers.

Yup, that means with ClojureScript, you don't need jQuery, or core.js, or babel, or UglifyJS2, etc. Since it's all handled through its first class support for the Google Closure compiler.

Multiline and template strings

Strings can span multiple lines, which will result in newline characters being part of the string exactly where a new line was added when defining the string in code. Templates allow you to interpolate a string with variables, for safer string creation from user input.

;;; A multiline string
"This string spans
multiple lines."
;;=> "This string spans\nmultiple lines."


;;; Template strings
(let [first-name "John"
      last-name "White"]
  (gstr/format "Hello %s, %s!" first-name last-name))
;;=> "Hello John, White!"

Destructuring

Destructuring allows binding using pattern matching. It supports matching for all ClojureScript data structures and records. Failed destructuring does not throw, instead what was not matched is set to nil. See Type extension for a way to extend ClojureScript so that JavaScript objects can also be destructured.

;;; List matching
(let [[a _ b] [1 2 3]]
  (println a)  ;=> 1
  (println b)) ;=> 3


;;; Object matching
(let [{:keys [name age]} john]
  (println name) ;=> "John"
  (println age)) ;=> 22


;;; Associative Map matching
(let [{:keys [a b]} {:a 10, :b 20}]
  (println a)  ;=> 10
  (println b)) ;=> 20


;;; Can be used in parameter position
(defn g[{:keys [name]}]
  (println name))

(g {:name "Marie"})


;;; Fail-soft destructuring
(let [[a] []]
  (println a)) ;=> nil


;;; Default values in associative destructuring
(let [{:keys [a] :or {a 10}} {}]
  (println a)) ;=> 10

Rest, Named, Spread and Overloaded arguments

Rest arguments lets you take a variable number of arguments. Named arguments allow you to pass arguments in any order. Spread arguments allow you to pass a sequence (list, vector, etc.) of elements as the arguments to a function. Finally, overloaded arguments allows to dispatch on a different implementation based on the number of arguments.

;;; Rest arguments
(defn f[x & rest]
  ;; rest is a sequence
  (* x (count rest))) ; Multiplies x by the length of values in rest

(f 3 "hello" true)
;;=> 6


;;; Named arguments
(defn f[& {:keys [x y]}]
  (- x y))

(f :y 5 :x 10)
;;=> 5
(f :x 10 :y 5)
;;=> 5


;;; Named arguments with default values
(defn f[& {:keys [x y] :or {x 100}}]
  (- x y))

(f :y 5)
;;=> 95


;;; Spread function calls
(defn f[x y z]
  (+ x y z))

;; Pass each element of sequence as argument
(apply f [1 2 3])
;;=> 6


;;; Overloaded function based on number of args
(defn f
  ([a b] (+ a b))
  ([a b c] (- a b c)))

(f 10 10) ;=> 20
(f 10 10 10) ;=> -10

Proper scopes

ClojureScript has proper scope, as everything is blocked scoped, supports nesting, and it all works as you'd expect. Inner scopes can see outer scopes, and they can also shadow outer scopes so names can be re-used. While outer scopes can not access inner scopes.

;; Define a global scope variable x
(def x -50)

;; Define a global scope function f
(defn f[x] ;; The argument x is defined in function scope
  (let [x 100] ;; Define a block scope variable x
    (let [x 0] ;; Nested block scope variable x
      (println x))
    (println x))
  (println x)
  (println cljs-features-demo/x))

(f 1)
;;=> 0
;;=> 100
;;=> 1
;;=> -50

;; Everything is always properly scoped.

;; `let` is also always immutable, so its similar to `const` in JS

;; Must wrap it in mutable container to get mutable variable

(let [a (atom 0)]
  (println @a)
  (reset! a 100)
  (println @a))
;;=> 0
;;=> 100

Iterators and Lazy Sequences

ClojureScript supports iterators, much like Python, Java, and ES6. Its iterators behave more like Generators in their ease of use. The difference being that ClojureScript iterators are immutable, thus carried over data must be passed along in a recursive style.

Lazy sequences allows computation to run only when needed. Pipelines can be created over collections, like in C#'s LINQ, or Java 8 streams. Making data manipulation a trivial task.

;;; Iterators
(defn fibonacci[]
  ;; Create an iterator that generates the next fib
  ;; and carries over the last fib to be used by the
  ;; next generation.
  (->> (iterate (fn [[a b]] [b (+ a b)]) [0 1])
       ;; Return the first element of all generated values
       ;; to drop all carried state used only for the next
       ;; generation
       (map first)))

(take 5 (fibonacci)) ; Return the first 5 generation of fib
;;=> (0 1 1 2 3)

;; Similar to Generators in Python, except they are immutable


;;; Lazy sequences
(->> (fibonacci)
     (take 10)
     (map inc)
     (remove even?))
;;=> (1 3 9 35)

;; Similar to C# LINQ expressions

Unicode support

ClojureScript supports unicode to an equal extent that the JavaScript target does.

;;; Single byte character syntax
(println \u263A)
;;=> ☺


;;; Code points aware count
(= 2 (count "𠮷"))
;;=> true


;;; Can define multibyte unicode chars
(= "\uD842\uDFB7" "𠮷")
;;=> true


;;; Can iterate over code points
(for [code-point (seq "\uD842\uDFB7")]
  (println code-point))
;;=> �
;;=> �

Immutable data structures

If you hadn't clued-in yet, almost everything in ClojureScript is immutable, that includes its data structures. The trick is that, they're also extremely fast, and consume a minimal amount of memory. Most of them also come with really convenient literate syntax to create them.

;;; Persistent list
'(1 2 3)
;;=> (1 2 3)

;; Conj (add in Clojure parlance) to the front
(conj '(1 2 3) 4)
;;=> (4 1 2 3)

;; Peek from the front
(peek '(1 2 3))
;;=> 1


;;; Persistent vector
[1 2 3]
;;=> [1 2 3]

;; Conj to the back
(conj [1 2 3] 4)
;;=> [1 2 3 4]

;; Peek from the back
(peek [1 2 3])
;;=> 3


;;; Persistent queue
#queue[1 2 3]
;;=> (1 2 3)

;; Conj to the back
(conj #queue[1 2 3] 4)
;;=> (1 2 3 4)

;; Peek from the front
(peek #queue[1 2 3])
;;=> 1


;;; Persistent hash Set
#{3 2 1}
;;=> #{3 2 1}


;;; Persistent sorted set
(sorted-set 3 2 1)
;;=> #{1 2 3}


;;; Persistent hash map
{:b 2 :a 1}
;;=> {:b 2, :a 1}


;;; Persistent sorted map
(sorted-map :b 2 :a 1)
;;=> {:a 1, :b 2}

Mutable data structures

ClojureScript also comes with mutable data structures, but their use is discouraged to only when performance demands it. That's why it simply delegates to the host's JavaScript mutable data structures. Feel free to use all ES6 mutable data-structures, since ClojureScript will polyfill them to older JavaScript version through its use of the Google Closure compiler. You can also use the mutable data structures in Google Closure lib, such as AvlTree, Heap, Pool, Queue, Trie, etc.

You'll never be short on data structures in ClojureScript.

;; Simply re-uses the ones offered by JavaScript


;;; JavaScript Object
#js{:a 1 :b 2}
;;=> #js {:a 1, :b 2}

;; Read object properties by prepending `.-` to their name
(.-a #js{:a 1 :b 2})
;;=> 1

;; Can mutate using `set!`
(def my-obj #js{:a 1 :b 2})
my-obj
;;=> #js {:a 1, :b 2}

(set! (.-a my-obj) "Mutated!")
my-obj
;;=> #js {:a "Mutated!", :b 2}


;;; Arrays
(make-array 3)
;;=> #js [nil nil nil]
#js["first-array-element" "second-array-element"]
;;=> #js ["first-array-element" "second-array-element"]

;; Can mutate using `aset`
(def my-array #js[1 2 3])
my-array
;;=> #js [1 2 3]

(aset my-array 1 "Mutated!")
my-array
;;=> #js [1 "Mutated!" 3]

;; Can access indexed value using aget
(aget my-array 1)
;;=> "Mutated!"


;;; Map
(js/Map.)
;;=> #object[Map [object Map]]

;; Can set and get using host interop
(def my-mutable-map (js/Map.))
(.set my-mutable-map "a" 1)
(.get my-mutable-map "a")
;;=> 1


;;; Set
(js/Set.)
;;=> #object[Set [object Set]]

;; Can add and has using host interop
(def my-mutable-set (js/Set.))
(-> my-mutable-set (.add 1) (.add 2))
(.has my-mutable-set 2)
;;=> true

Controlled mutable values

Every value container that is not from the Google Closure library, or the host JavaScript, or third party library that you have imported yourself is immutable in ClojureScript, except for Atoms. An Atom allows very explicit use of a mutable value, and supports adding validation on change, as well as watches events that trigger on mutation of the value. Letting you put the proper control in place to avoid the mutability to cause bugs in your code.

;;; Mutable values with `atom`
(def a-mutable-val (atom nil))
@a-mutable-val ; Get the current value with `@` prefix
;;=> nil
(reset! a-mutable-val "Not nil!")
@a-mutable-val
;;=> "Not nil!"

;; Atoms are the only mutable value objects, apart for JavaScript host
;; data-structures/objects seen in the previous section.


;;; Watches, trigger event callback every time value changes
(add-watch a-mutable-val
           :print-change-details
           (fn[key a old-val new-val]
             (println (gstr/format "Atom changed from %s to %s"
                                   old-val new-val))))

(reset! a-mutable-val "I damn well changed its value again!")
;;=> Atom changed from Not nil! to I damn well changed its value again!


;;; Validators, validate new values
(set-validator! a-mutable-val
                (fn[new-val]
                  (not= 0 new-val)))

(try
  (reset! a-mutable-val 0)
  (catch js/Error e
    (println e)))
;;=> #object[Error Error: Validator rejected reference state]

Type extension

Don't think ClojureScript has enough features out of the box for you? Well, you are in luck, because it gives you the ability to easily extend any type, be they ClojureScript or host JavaScript, to your will, quite easily and simply.

;; You can extend any ClojureScript and JavaScript object at any time

;; Lets add a reverse method to JavaScript Strings
;; First we declare the protocol, remember, interface first!
(defprotocol Reversable
  (reverse [this]))

;; Now we implement it for JavaScript strings
(extend-type string
  Reversable
  (reverse [this]
    (-> this (.split "") (.reverse) (.join ""))))

(reverse "cool")
;;=> "looc"


;; Lets extend all JavaScript objects so they can be destructured
(extend-type object
  ILookup
  (-lookup 
    ([this key] 
     (goog.object/get this (name key)))
    ([this key not-found] 
     (goog.object/get this (name key) not-found))))

(let [{:keys [a b]} #js{:a 10 :b 20}]
  (+ a b))
;;=> 30

Thanks to Mike Fikes for the destructuring extension.

Syntax extension

It's also possible to extend the ClojureScript syntax by using macros and tagged literals. With macros, completely new syntax and semantics can be created, and called upon by the invocation of the macro. You don't need to wait for the next version of ClojureScript to add any missing language feature, but be careful, macros are for the advanced users, a powerful weapon, that can easily be misused, so avoid it unless absolutely necessary. Tagged literals are simple ways to add new literal notations to construct objects or data-structures.

;;; Macros
;; Wish you could use infix notation for binary functions?
(defmacro infix
  [& code]
  `(~(second code)
    ~(first code)
    ~(last code)))

(infix 10 + 20)
;;=>30


;;; Tagged literals
;; Wish there was a way to create JavaScript mutable Maps with literal syntax?
;; In a data_readers.cljc file, you add custom literal tags
{mut/map custom-tags/make-js-map}

;; Which calls the function associated with it at read time
(ns custom-tags)
(defn make-js-map
  [m]
  (js/Map.
   (clj->js (into [] m))))

;; Letting you then use a literal syntax to create anything
(let [js-map #mut/map{:a 1 :b 2}]
  (println (.get js-map "a")))
;;=> 1

Many more

ClojureScript supports many more features than the ones I highlighted here. It's a pretty complete language, you probably won't find anything missing, and if you do, the extension capabilities will allow you to add it yourself, without waiting for a new version to come out.

So consider this list incomplete, but here's some of the added features I didn't cover:

  • Number literals: decimal, exponent, binary, octal, arbitrary base, etc.
  • Regex literal
  • EDN: JSON for ClojureScript
  • shebang support
  • Block comments
  • Default arguments
  • Asynchronous programming using Communicating sequential processes, same as in GO.
  • Other literals: uuid, instant, etc.
  • Metadata: Data about your data
  • Reader conditionals: Allows code to work simultaneously with Clojure and ClojureScript
  • Tail Call optimized recursion
  • Custom types
  • React support
  • Spec: Declarative Data Specification with auto validation and generation. Can be used for generative testing, and powerful runtime validation
  • Transients: Mutable escape hatch for immutable data structures when performance is required.
  • Transducers: Efficient and composable data transformation
  • Zippers: Navigational iteration through Tree and Graph data structures
  • Output source minification
  • Output source dead code elimination
  • Polyfill of ECMAScript 6+ to ECMAScript 5 or 3, so you can use newer APIs in older JavaScript environemnts
  • And so much more...

Saturday, March 3, 2018

My observations from evangelizing Clojure at work for the last year.

Learning Clojure is like going from driving a car on the right side of the road, to driving a car on the left side of the road.

If you can't figure it out in under a month, you're either a hopeless driver, or just didn't really care to try to adapt to the slightly unfamiliar.

I've evangelized Clojure at my work. I've seen many people having to learn and try it given different context.

There's people who end up loving it just as much as me, and people who end up hating it. In all cases, the lovers were willing to break their habits and figure out how to adapt to Clojure's syntax and paradigms. They were eager and happy to learn something different. The haters were not, they felt compelled and forced to drive on a side of the road they weren't comfortable with, it scared them, caused them additional stress, and clearly they didn't care to be here, like they were dragged along a vacation to a place they had no interest for, and would have really just wanted to stay home instead, where things are easy and familiar.

The interesting part though, neither groups were slowed down, or failed to deliver their tasks and stories, because of Clojure. The overhead of having to learn Clojure is minimal. Its just that it's not painless, it's uncomfortable, and you need to think harder then you're used too, that's all.

So even though both groups end up successfully using Clojure professionally to deliver on tasks and stories. The ones who just never wanted to be there in the first place don't want to keep at it, they associated stress and difficulty with it. While the ones who enjoyed learning and being challenged want to use it everywhere and can't go back to Java.

I've learned that, if you give a lot of initial support to the people who aren't open to being put out of their comfort zone, especially help them setup a friendly IDE, show them how to use the REPL first and foremost, make sure they do use it, if they don't, remind them to use it, show them again how, address why they aren't, and help guide them a little through their first month, then you can push them over to the group that loves Clojure, or at least a neutral place.

I say neutral, because generally the latter group tends to have less passion about programming in general. This is a career for them, not a hobby. They may never fall in love with any language, there's just the one they're familiar and comfortable with, and the ones they're not.

Now, some positives to adopting Clojure is that if you are in a medium to large org, the passionate, curious to learn new things, programmers on other teams will be easy target to recruit to yours.

Initially, our manager was worried it would hurt our ability to find talent, but it has actually become our biggest selling point, allowing us to attract really motivated, passionate, curious and knowledgeable devs from other teams to move to ours, because they wanted the opportunity to learn and use Clojure.

Now, I've seen people in the latter group adopt Scala or Kotlin afterwards, this is because while Europe might be too far out your comfort zone, a trip down to Disneyland or Las Vegas where things are mostly the same, and you still feel safe and at home, while only being a little out left might still be enjoyable to them.

Also, languages like Scala and Kotlin can be treated like new versions of Java to some extent. It does make some of the more annoying parts of Java easier. So they're more like having bought a brand new car in a different category, like going to an SUV after a Sedan.

My observation on that is that for a very long time they do not learn about and use the more different parts of Kotlin and Scala, but because of how linear and smooth that curve can be, eventually they end up learning. So again, the learning was made easier and safer by just being amortized over a really long period of time, 6 months to 2 years.

So I've observed that people who go through 1 month of Clojure learn as much new paradigms as 6 months of Scala and Kotlin.

Not everyone is willing to push themselves hard from the start. Those that are, will pick up Clojure in no time and love it. Those that aren't will hate it, even though they managed to use it without issue. For those, you need to offer them really good coaching and support, to ease their pain. Or, you're better off with a FP language that has a more gradual curve like Kotlin or Scala or even Java 8 FP features.

Monday, January 30, 2017

When to use Elixir over Clojure?

Elixir and Clojure are very similar languages, almost akin to C# and Java, and you won't have much difficulty going from one to the other.

  • The syntax will change.
  • The libraries will not be the same.
  • The target VM will be different.

Appart from that, most things will be pretty similar.

So when should you choose to use Elixir over Clojure?

Whenever you want to use BEAM, or to put it in more details:

  • When you need highly distributed parallel processes with fault tolerance and high availability.
  • When you want soft real-time guarantees.

When should you choose Clojure over Elixir?

  • When you need to use the JVM for compatibility or library reasons.
  • When you need a single process and single machine performance matters.
  • When you want a coherent server and front-end language ecosystem: Clojure/ClojureScript
  • When you want a functional language instead of Java.
  • When you want a strong dynamic language instead of Java.
  • When you want a very interactive REPL workflow.
  • When you want to make heavy use of macros, and maybe create a lot of custom DSLs.

There's one case where the choice will be harder and that's when you need distributed processes, but don't need that many of them. In this case, I think you should pick the one who's syntax and libraries you prefer, since both can adequately deliver on that use case.

Friday, August 26, 2016

Common Naming Road-Blocks

As I program, I always encounter naming issues. What should I call this, and that? Sometimes, I spend more time thinking of a name then implementing. To help me let go of my name paralysis, I've come up with some naming patterns that seem to satisfy my endless pedantic-ism. This way, I can simply refer to these rules, choose a name, and move on with my life.

You have re-factored a method, but you need to keep the old one around, what do you call the new method?

This always happens to me. I re-factor a method, but need to keep the old one around. It's possible my new one has changed the behavior slightly, and things still depend on the old behavior, but I want all new code to adapt and depend on the new one instead. Until you re-factor everything, you need to keep the old one around.

SOLUTION

Add V followed by a numerical number to the new method.

If you have method(); you simply call the new one methodV2. If you had an even newer one, you'd name it methodV3(). Don't rename the first one to methodV1(), that's just more confusing and will force you to change the call sites.

Pros

  1. It does not require you to change the name of the original, saving you some re-factoring work on the callers side.
  2. It's clear which method is the newest one.
  3. It adds minimal amount of extra characters to the method names, keeping it short, sweet and to the point.
  4. The V clearly indicates that it does the same thing as the older ones, but is just the newer version.

Cons

  1. If the old method ever becomes unused, and you're able to get rid of it, you'll now be stuck with all your methods having version numbers. Which asks the question? Should I decrease the numbering of all the methods? I would say no, don't bother. And if you ever create an even newer one, start it at the end, even though there's room at the beginning.

You have a method that does more then one thing, how can the name reflect all the effects the method has?

I come upon this one occasionally. You've got a setter method for example, but it actually sets three fields from an Object you pass it. You could say it's just bad design, a method should not do more then one thing, but the truth is, these often are very convenient, and sometimes, you have to do it for performance.

SOLUTION

Name the method based on its primary effect, and append AndMore to it. Then document what the AndMore is with a comment.

So if you have setFirstNameAndLastNameAndAgeAndEthnicity(User user), just name it setFirstNameAndMore(User user). Don't forget to document the AndMore in a comment!

Pros

  1. It keeps the name short.
  2. It makes it clear what the primary effect and intent is.
  3. It's very clear that the method has more effects, and gives a way for them to be further detailed if the caller cares.

Cons

  1. If you want to know what the other effects are, you need to read the doc.
  2. Maybe you shouldn't have a method with more then one intended effect, you should double check that there's no way for you to split this out.

You have a single implementation of an interface, what should you name the interface and its single implementation?

You've created an Interface and you feel good about it. You then create a first implementation of it. Slowly you realize, your use case doesn't need any other implementation quite yet. What should you call the Interface and its Implementation?

SOLUTION

IInterface and Interface or Interface and DefaultInterface

There's two ways to go about it here. I prefer the first way, where you prefix the Interface with a capital I and you simply call the implementation with the same name but without the I. I like this because when you auto-complete, if you type I, you easily find all interfaces and generally, you want to type to the interface, not the implementation. Java conventions is to not put a I on your interface. In that case, if you want to follow convention, use the second way. Call the interface as it is Interface and prefix Default to the implementation: DefaultInterface.

If you have more then one implementation, but one implementation is more commonly used, I believe it should also then be called using the above naming scheme. While the other implementations should describe in their name what is different from the common one.

IDog Dog BigDog SmallDog HandicappedDog

Only if you have equally common implementations should you not have any of them be non descriptive of what sets them apart.

IAnimal Dog Cat Horse

Pros

  1. Descriptive when it needs to be, undescriptive when it doesn't.
  2. Becomes a known convention for which implementation is the original canonical one.

Cons

  1. Some people find having an interface when you only have one implementation is unnecessary code complexity. In which case, you can argue you should only have an interface when you have two or more implementation, and then they should always be more descriptive then the interface name.

This is an always evolving article, as I find more and more patterns, or improve on the ones already existing, I'll update this article to reflect that.