Sunday, May 6, 2018

The Various Kinds of IO - Blocking, Non-blocking, Multiplexed and Async.

I've found it was particularly hard to demystify the various kinds of IO that are now offered to software programmers. There is a lot of confusion out there about what are the differences between blocking, non-blocking, multiplexed and async IO. So I thought I'd give a shot at clarifying what each of these kind of IO entails.

In Hardware

On modern operating systems, IO, for input/output, is a way for data to be exchanged with peripherals. This includes reading or writing to disk, or SSD, sending or receiving data over the network, displaying data to the monitor, receiving keyboard and mouse input, etc.

The ways in which modern operating systems communicate with peripherals depends on the specific type of peripherals and their firmware and hardware capabilities. In general, you can assume that the peripherals are quite advanced, and they can handle multiple concurrent requests to write or read data. That is, gone are the days of serial communication. In that sense, all communications between the CPU and peripherals is therefore asynchronous at a hardware level.

This asynchronous mechanism is called hardware interrupt. Think of the simple case, where the CPU would ask the peripheral to read something, and it would then go in an infinite loop, where each time it would check with the peripheral if the data is now available for it to consume, looping until the peripheral gives it the data. This method is known as polling, since the CPU needs to keep checking on the peripheral. On modern hardware, what happens instead is that the CPU asks the peripheral to perform an action, and then forgets about it, continuing to process other CPU instructions. Once the peripheral is done, it will signal the CPU through a circuit interrupt. This happens in hardware, and makes it so the CPU never waits or checks on the peripheral, thus freeing it to perform other work, until the peripheral itself says it's done.

In Software

So now that we have an understanding of what happens in hardware, we can move on to the software side. This is where the IO is exposed in various kinds, such as blocking, non-blocking, multiplexed and async. Lets go over them one by one.

Blocking

Remember how a user program runs inside a process, and code is executed within the context of a thread? This is always the case, and thus say you are writing a program that needs to read from a file. With blocking IO, what you do is ask the operating system, from your thread, to put your thread to sleep, and wake it up only once the data from the file is available to be consumed by your thread.

That is, blocking IO is called blocking because the thread which uses it will block and sleep until the IO is done.

Non-blocking

The problem with blocking IO is that your thread is sleeping until the IO is done, and thus, it can't do anything else while it's waiting for the IO. Sometimes, there's nothing else your program could be doing, but if there was, it would be nice to be able to do that work concurrently, as it waits for the IO.

One way to do that is with was is called non-blocking IO. The idea is that when you make the call to read the file, instead of putting your thread to sleep, the OS simply returns to you either the file's content that it read, or a pending status that tells you the IO is not done. It will not block your thread, but it's still your job to check back at a later time to see if the IO is done. This means you are free to branch on the pending status, and go perform some other work, and when you need the IO again, you can call read for it once more, at which point if the IO is done, it will return the file's content, otherwise it will once again return a pending status, and you can again choose to go perform some other work.

Multiplexed

The problem with non-blocking IO is that it gets strange if the other work that you want to be doing while waiting for the IO is itself more IO.

In a good scenario, you ask the OS to read content from file A, and then you go perform some heavy computation, once you're done, you check if file A is done reading, if so, you do whatever you needed that file's content for, otherwise, you go do some more heavy processing, rinse and repeat. But in the bad scenario, you don't have heavy processing to do, in fact, you want to read file A, and you also want to read file B. So while you wait for file A's IO, you make a non-blocking call to read file B's. Now what will you do while waiting for both? There's nothing for you left to do, so your program has to go into an infinite polling loop, where you keep checking if A is done, and then if B is done, over and over. This will either consume the CPU simply to poll for the status of your non-blocking IO calls, or you'll have to manually add some arbitrary sleep time, meaning that you'll probably realize the IO is ready a little after it really was, slowing your program's throughput.

To avoid this problem, you can use multiplexed IO instead. What this does is that you will once again block on the IO, but instead of blocking on a single IO operation to be done and then the other, you are able to queue up all the IO operations you need done, and then block on all of them. The OS will wake you up when any one of them is done. Some implementations of multiplexed IO allow even more control, where you can specify that you want to be woken up only if some specified set of IO operations are done, like when file A and C or file B and D are done.

So you would make a non-blocking call to read file A, and then a non-blocking call to read file B, and finally you will tell the OS, put my thread to sleep, and wake it up when A and B's IO are both done, or when anyone of them is done.

Async

The problem with multiplexed IO is that you're still sleeping until IO is ready for you to handle. Once again, for many program that's fine, maybe you have nothing else to be doing while you wait for IO operations to be done. Sometimes though, you do have other things you could be doing. Maybe you're computing the digits of PI, while also summing up the values in a bunch of files. What you'd like to do is queue up all your file reads, and while you wait for them to be read, you would compute digits of PI. When a file is done reading, you'd sum up its values, and then go back to computing more digits of PI until another file is done reading.

For this to work, you will need a way for your digits of PI computation to be interrupted by the IO as it completes, and you'd need the IO to perform the interrupt when it completes.

This is done through event callbacks. The call to perform a read takes a callback, and returns immediately. On the IO completing, the OS will suspend your thread, and execute your callback. Once the callback is done executing, it will resume your thread.

Multi-threaded vs Single Threaded?

You'd have noticed that all kinds of IO that I described only speak of one single thread, which is your main application thread. The truth is, IO does not require a thread to be performed, because as I explained in the beginning, the peripherals all perform the IO asynchronously within their own circuitry. Thus it's possible to do blocking, non-blocking, multiplexed and async IO all within a single threaded model. Which is why concurrent IO can work without multi-threaded support.

Now, the processing that is done on the result of the IO operations, or which is requesting the IO operations can obviously be multi-threaded if you need it to be. This allows you to have concurrent computation on top of concurrent IO. So nothing prevents mixing multi-threaded and these IO mechanisms.

In fact, there is a very popular fifth kind of IO which does depend on multi-threading. It is often confusingly referred to as non-blocking IO or async IO also, because it presents itself with a similar interface as one or the other. In truth, it is faking true non-blocking or async IO. The way it works is simple, it uses blocking IO, but each blocking call is made in its own thread. Now depending on the implementation, it either takes a callback, or uses a polling model, like returning a Future.

In Closing

I hope this has clarified your understanding of the various kinds of IO. It's important to keep in mind that these are not all supported by all operating systems and for all peripherals. Similarly, not all programming languages expose an API for all kinds of IO the operating system supports.

There you go. All various kinds of IO explained.

Hope you enjoyed!

Further Reading

Disclaimer

I am not a system level programmer, and so I'm not an expert on all kinds of IO operating systems offer. This post is my best effort to sum up what I know, which I would say is probably intermediate level knowledge. Thus please correct me in the comments if you find that anything here is wrong.

Monday, April 16, 2018

Overview of ClojureScript Features

ClojureScript is the Clojure compiler for JavaScript. With it, you can use a dialect of Clojure, called ClojureScript, which is 99% identical in syntax and semantics to Clojure, to write your JavaScript code. So wherever you used to use JavaScript, you can now use Clojure instead!

Not sure what ClojureScript has to offer? Well this is the overview for you.

Please note that this list was put together at the time of ClojureScript 1.10. If you are using an older version, some features might be missing, and if you are using a newer version, there might be newer features not mentioned here. You can refer to the official changes list for all the gritty details.

ClojureScript offers some of the following features:

Proper modules

Language-level support for modules. This lets you define logical components with clear dependencies on one another. In ClojureScript, we call them namespaces, and you create them with ns. ClojureScript namespaces are compatible with Google Closure modules too, so you can use those as is. It also has native support to convert CommonJS, AMD, ES6 and Node modules to Google Closure modules, only a few lines of configs are required for the conversion, you can read more about it here.

(ns cljs-features-demo
  (:require [goog.string :as gstr]
            goog.string.format))

;; You can require other modules, such as the standard Google Closure
;; library, which is part of ClojureScript's standard library.

;; Other modules can now require the use of `cljs-features-demo`.

;; `:as` was used to create an alias, so we can use `gstr` instead of
;; `goog.string` to access functions in `goog.string`.

Lambda functions

Lambda functions are a way to declare a function inline, and with minimal syntax. Generally, they don't have names, though they can. They are similar to Java 8, C# and JavaScript arrow's syntax declared with =>, except they are declared with a different syntax in ClojureScript, respectively either using #() or fn. The former is shorter to type, but the latter has more features. Both support full closure, and will therefore capture the global and local context along with them.

;;; With shorthand `#()` syntax

(remove #(= % 10) [2 4 8 10 12])
;;=> [2 4 8 12]

;; %, %2, %3, %... are the arguments


;;; With custom named arguments using `fn` syntax

(remove (fn[v] (= v 10)) [2 4 8 10 12])
;;=> [2 4 8 12]

Protocol based OOP

ClojureScript is a functional programming language first and foremost, yet it allows you to do much the same things that you would do using JavaScript's objects and prototypes. Protocols define a new interface, and any type that implements them will have an implementation for the methods they define. Records are a named bundle of properties, much like JavaScript objects, which can implement Protocols. That said, they are immutable, and don't support inheritance, preferring composition over it.

;; All objects are interface based, extendable and immutable.


;; Notice how the interface is at the forefront, and is defined first
(defprotocol APerson
  (age-by [this years])
  (change-name [this new-name]))


;; The class is defined second
(defrecord Person
    [name age] ; The object fields

  APerson ; Define that Person implements the methods of APerson
  (age-by [this years] ; The first method implementation
    (assoc this :age (+ age years))) ; Adds years to the age of the object under this

  (change-name [this new-name] ; The second method implementation
    (assoc this :name new-name))) ; Replaces name of the object by new-name.


;; Create a new instance of Person
(def john (->Person "John" 22))
;;=> #cljs.user.Person{:name "John", :age 22}


;; Call methods on it
(age-by john 10)
;;=> #cljs.user.Person{:name "John", :age 32}


;; But everything is immutable, including objects
(:age john)
;;=> 22

Large standard library

ClojureScript comes bundled with more functions and objects than normal JavaScript. It always includes the vast set of functions and objects from the ClojureScript standard library, as well as the Google Closure library.

That's right, ClojureScript always includes the Google Closure library, making it an integral part of ClojureScript. Whenever you can't find what you want in the ClojureScript standard lib, look to the Google Closure library instead.

The Google Closure library is similar to jQuery and core.js, it includes a huge swat of functions to manipulate the DOM, for server communication, animation, data-structures, unit testing, text editing and more. It also comes with UI widgets and controls. It is used by Google in all of their web products such as Gmail, Search, Maps, Docs, Photos, etc. Its robust, highly optimized, complete, well tested, and it also will minimize itself to end up taking the least amount of space possible, making page load fast. Oh, and it does all that in a way that works across all browsers.

Yup, that means with ClojureScript, you don't need jQuery, or core.js, or babel, or UglifyJS2, etc. Since it's all handled through its first class support for the Google Closure compiler.

Multiline and template strings

Strings can span multiple lines, which will result in newline characters being part of the string exactly where a new line was added when defining the string in code. Templates allow you to interpolate a string with variables, for safer string creation from user input.

;;; A multiline string
"This string spans
multiple lines."
;;=> "This string spans\nmultiple lines."


;;; Template strings
(let [first-name "John"
      last-name "White"]
  (gstr/format "Hello %s, %s!" first-name last-name))
;;=> "Hello John, White!"

Destructuring

Destructuring allows binding using pattern matching. It supports matching for all ClojureScript data structures and records. Failed destructuring does not throw, instead what was not matched is set to nil. See Type extension for a way to extend ClojureScript so that JavaScript objects can also be destructured.

;;; List matching
(let [[a _ b] [1 2 3]]
  (println a)  ;=> 1
  (println b)) ;=> 3


;;; Object matching
(let [{:keys [name age]} john]
  (println name) ;=> "John"
  (println age)) ;=> 22


;;; Associative Map matching
(let [{:keys [a b]} {:a 10, :b 20}]
  (println a)  ;=> 10
  (println b)) ;=> 20


;;; Can be used in parameter position
(defn g[{:keys [name]}]
  (println name))

(g {:name "Marie"})


;;; Fail-soft destructuring
(let [[a] []]
  (println a)) ;=> nil


;;; Default values in associative destructuring
(let [{:keys [a] :or {a 10}} {}]
  (println a)) ;=> 10

Rest, Named, Spread and Overloaded arguments

Rest arguments lets you take a variable number of arguments. Named arguments allow you to pass arguments in any order. Spread arguments allow you to pass a sequence (list, vector, etc.) of elements as the arguments to a function. Finally, overloaded arguments allows to dispatch on a different implementation based on the number of arguments.

;;; Rest arguments
(defn f[x & rest]
  ;; rest is a sequence
  (* x (count rest))) ; Multiplies x by the length of values in rest

(f 3 "hello" true)
;;=> 6


;;; Named arguments
(defn f[& {:keys [x y]}]
  (- x y))

(f :y 5 :x 10)
;;=> 5
(f :x 10 :y 5)
;;=> 5


;;; Named arguments with default values
(defn f[& {:keys [x y] :or {x 100}}]
  (- x y))

(f :y 5)
;;=> 95


;;; Spread function calls
(defn f[x y z]
  (+ x y z))

;; Pass each element of sequence as argument
(apply f [1 2 3])
;;=> 6


;;; Overloaded function based on number of args
(defn f
  ([a b] (+ a b))
  ([a b c] (- a b c)))

(f 10 10) ;=> 20
(f 10 10 10) ;=> -10

Proper scopes

ClojureScript has proper scope, as everything is blocked scoped, supports nesting, and it all works as you'd expect. Inner scopes can see outer scopes, and they can also shadow outer scopes so names can be re-used. While outer scopes can not access inner scopes.

;; Define a global scope variable x
(def x -50)

;; Define a global scope function f
(defn f[x] ;; The argument x is defined in function scope
  (let [x 100] ;; Define a block scope variable x
    (let [x 0] ;; Nested block scope variable x
      (println x))
    (println x))
  (println x)
  (println cljs-features-demo/x))

(f 1)
;;=> 0
;;=> 100
;;=> 1
;;=> -50

;; Everything is always properly scoped.

;; `let` is also always immutable, so its similar to `const` in JS

;; Must wrap it in mutable container to get mutable variable

(let [a (atom 0)]
  (println @a)
  (reset! a 100)
  (println @a))
;;=> 0
;;=> 100

Iterators and Lazy Sequences

ClojureScript supports iterators, much like Python, Java, and ES6. Its iterators behave more like Generators in their ease of use. The difference being that ClojureScript iterators are immutable, thus carried over data must be passed along in a recursive style.

Lazy sequences allows computation to run only when needed. Pipelines can be created over collections, like in C#'s LINQ, or Java 8 streams. Making data manipulation a trivial task.

;;; Iterators
(defn fibonacci[]
  ;; Create an iterator that generates the next fib
  ;; and carries over the last fib to be used by the
  ;; next generation.
  (->> (iterate (fn [[a b]] [b (+ a b)]) [0 1])
       ;; Return the first element of all generated values
       ;; to drop all carried state used only for the next
       ;; generation
       (map first)))

(take 5 (fibonacci)) ; Return the first 5 generation of fib
;;=> (0 1 1 2 3)

;; Similar to Generators in Python, except they are immutable


;;; Lazy sequences
(->> (fibonacci)
     (take 10)
     (map inc)
     (remove even?))
;;=> (1 3 9 35)

;; Similar to C# LINQ expressions

Unicode support

ClojureScript supports unicode to an equal extent that the JavaScript target does.

;;; Single byte character syntax
(println \u263A)
;;=> ☺


;;; Code points aware count
(= 2 (count "𠮷"))
;;=> true


;;; Can define multibyte unicode chars
(= "\uD842\uDFB7" "𠮷")
;;=> true


;;; Can iterate over code points
(for [code-point (seq "\uD842\uDFB7")]
  (println code-point))
;;=> �
;;=> �

Immutable data structures

If you hadn't clued-in yet, almost everything in ClojureScript is immutable, that includes its data structures. The trick is that, they're also extremely fast, and consume a minimal amount of memory. Most of them also come with really convenient literate syntax to create them.

;;; Persistent list
'(1 2 3)
;;=> (1 2 3)

;; Conj (add in Clojure parlance) to the front
(conj '(1 2 3) 4)
;;=> (4 1 2 3)

;; Peek from the front
(peek '(1 2 3))
;;=> 1


;;; Persistent vector
[1 2 3]
;;=> [1 2 3]

;; Conj to the back
(conj [1 2 3] 4)
;;=> [1 2 3 4]

;; Peek from the back
(peek [1 2 3])
;;=> 3


;;; Persistent queue
#queue[1 2 3]
;;=> (1 2 3)

;; Conj to the back
(conj #queue[1 2 3] 4)
;;=> (1 2 3 4)

;; Peek from the front
(peek #queue[1 2 3])
;;=> 1


;;; Persistent hash Set
#{3 2 1}
;;=> #{3 2 1}


;;; Persistent sorted set
(sorted-set 3 2 1)
;;=> #{1 2 3}


;;; Persistent hash map
{:b 2 :a 1}
;;=> {:b 2, :a 1}


;;; Persistent sorted map
(sorted-map :b 2 :a 1)
;;=> {:a 1, :b 2}

Mutable data structures

ClojureScript also comes with mutable data structures, but their use is discouraged to only when performance demands it. That's why it simply delegates to the host's JavaScript mutable data structures. Feel free to use all ES6 mutable data-structures, since ClojureScript will polyfill them to older JavaScript version through its use of the Google Closure compiler. You can also use the mutable data structures in Google Closure lib, such as AvlTree, Heap, Pool, Queue, Trie, etc.

You'll never be short on data structures in ClojureScript.

;; Simply re-uses the ones offered by JavaScript


;;; JavaScript Object
#js{:a 1 :b 2}
;;=> #js {:a 1, :b 2}

;; Read object properties by prepending `.-` to their name
(.-a #js{:a 1 :b 2})
;;=> 1

;; Can mutate using `set!`
(def my-obj #js{:a 1 :b 2})
my-obj
;;=> #js {:a 1, :b 2}

(set! (.-a my-obj) "Mutated!")
my-obj
;;=> #js {:a "Mutated!", :b 2}


;;; Arrays
(make-array 3)
;;=> #js [nil nil nil]
#js["first-array-element" "second-array-element"]
;;=> #js ["first-array-element" "second-array-element"]

;; Can mutate using `aset`
(def my-array #js[1 2 3])
my-array
;;=> #js [1 2 3]

(aset my-array 1 "Mutated!")
my-array
;;=> #js [1 "Mutated!" 3]

;; Can access indexed value using aget
(aget my-array 1)
;;=> "Mutated!"


;;; Map
(js/Map.)
;;=> #object[Map [object Map]]

;; Can set and get using host interop
(def my-mutable-map (js/Map.))
(.set my-mutable-map "a" 1)
(.get my-mutable-map "a")
;;=> 1


;;; Set
(js/Set.)
;;=> #object[Set [object Set]]

;; Can add and has using host interop
(def my-mutable-set (js/Set.))
(-> my-mutable-set (.add 1) (.add 2))
(.has my-mutable-set 2)
;;=> true

Controlled mutable values

Every value container that is not from the Google Closure library, or the host JavaScript, or third party library that you have imported yourself is immutable in ClojureScript, except for Atoms. An Atom allows very explicit use of a mutable value, and supports adding validation on change, as well as watches events that trigger on mutation of the value. Letting you put the proper control in place to avoid the mutability to cause bugs in your code.

;;; Mutable values with `atom`
(def a-mutable-val (atom nil))
@a-mutable-val ; Get the current value with `@` prefix
;;=> nil
(reset! a-mutable-val "Not nil!")
@a-mutable-val
;;=> "Not nil!"

;; Atoms are the only mutable value objects, apart for JavaScript host
;; data-structures/objects seen in the previous section.


;;; Watches, trigger event callback every time value changes
(add-watch a-mutable-val
           :print-change-details
           (fn[key a old-val new-val]
             (println (gstr/format "Atom changed from %s to %s"
                                   old-val new-val))))

(reset! a-mutable-val "I damn well changed its value again!")
;;=> Atom changed from Not nil! to I damn well changed its value again!


;;; Validators, validate new values
(set-validator! a-mutable-val
                (fn[new-val]
                  (not= 0 new-val)))

(try
  (reset! a-mutable-val 0)
  (catch js/Error e
    (println e)))
;;=> #object[Error Error: Validator rejected reference state]

Type extension

Don't think ClojureScript has enough features out of the box for you? Well, you are in luck, because it gives you the ability to easily extend any type, be they ClojureScript or host JavaScript, to your will, quite easily and simply.

;; You can extend any ClojureScript and JavaScript object at any time

;; Lets add a reverse method to JavaScript Strings
;; First we declare the protocol, remember, interface first!
(defprotocol Reversable
  (reverse [this]))

;; Now we implement it for JavaScript strings
(extend-type string
  Reversable
  (reverse [this]
    (-> this (.split "") (.reverse) (.join ""))))

(reverse "cool")
;;=> "looc"


;; Lets extend all JavaScript objects so they can be destructured
(extend-type object
  ILookup
  (-lookup 
    ([this key] 
     (goog.object/get this (name key)))
    ([this key not-found] 
     (goog.object/get this (name key) not-found))))

(let [{:keys [a b]} #js{:a 10 :b 20}]
  (+ a b))
;;=> 30

Thanks to Mike Fikes for the destructuring extension.

Syntax extension

It's also possible to extend the ClojureScript syntax by using macros and tagged literals. With macros, completely new syntax and semantics can be created, and called upon by the invocation of the macro. You don't need to wait for the next version of ClojureScript to add any missing language feature, but be careful, macros are for the advanced users, a powerful weapon, that can easily be misused, so avoid it unless absolutely necessary. Tagged literals are simple ways to add new literal notations to construct objects or data-structures.

;;; Macros
;; Wish you could use infix notation for binary functions?
(defmacro infix
  [& code]
  `(~(second code)
    ~(first code)
    ~(last code)))

(infix 10 + 20)
;;=>30


;;; Tagged literals
;; Wish there was a way to create JavaScript mutable Maps with literal syntax?
;; In a data_readers.cljc file, you add custom literal tags
{mut/map custom-tags/make-js-map}

;; Which calls the function associated with it at read time
(ns custom-tags)
(defn make-js-map
  [m]
  (js/Map.
   (clj->js (into [] m))))

;; Letting you then use a literal syntax to create anything
(let [js-map #mut/map{:a 1 :b 2}]
  (println (.get js-map "a")))
;;=> 1

Many more

ClojureScript supports many more features than the ones I highlighted here. It's a pretty complete language, you probably won't find anything missing, and if you do, the extension capabilities will allow you to add it yourself, without waiting for a new version to come out.

So consider this list incomplete, but here's some of the added features I didn't cover:

  • Number literals: decimal, exponent, binary, octal, arbitrary base, etc.
  • Regex literal
  • EDN: JSON for ClojureScript
  • shebang support
  • Block comments
  • Default arguments
  • Asynchronous programming using Communicating sequential processes, same as in GO.
  • Other literals: uuid, instant, etc.
  • Metadata: Data about your data
  • Reader conditionals: Allows code to work simultaneously with Clojure and ClojureScript
  • Tail Call optimized recursion
  • Custom types
  • React support
  • Spec: Declarative Data Specification with auto validation and generation. Can be used for generative testing, and powerful runtime validation
  • Transients: Mutable escape hatch for immutable data structures when performance is required.
  • Transducers: Efficient and composable data transformation
  • Zippers: Navigational iteration through Tree and Graph data structures
  • Output source minification
  • Output source dead code elimination
  • Polyfill of ECMAScript 6+ to ECMAScript 5 or 3, so you can use newer APIs in older JavaScript environemnts
  • And so much more...

Saturday, March 3, 2018

My observations from evangelizing Clojure at work for the last year.

Learning Clojure is like going from driving a car on the right side of the road, to driving a car on the left side of the road.

If you can't figure it out in under a month, you're either a hopeless driver, or just didn't really care to try to adapt to the slightly unfamiliar.

I've evangelized Clojure at my work. I've seen many people having to learn and try it given different context.

There's people who end up loving it just as much as me, and people who end up hating it. In all cases, the lovers were willing to break their habits and figure out how to adapt to Clojure's syntax and paradigms. They were eager and happy to learn something different. The haters were not, they felt compelled and forced to drive on a side of the road they weren't comfortable with, it scared them, caused them additional stress, and clearly they didn't care to be here, like they were dragged along a vacation to a place they had no interest for, and would have really just wanted to stay home instead, where things are easy and familiar.

The interesting part though, neither groups were slowed down, or failed to deliver their tasks and stories, because of Clojure. The overhead of having to learn Clojure is minimal. Its just that it's not painless, it's uncomfortable, and you need to think harder then you're used too, that's all.

So even though both groups end up successfully using Clojure professionally to deliver on tasks and stories. The ones who just never wanted to be there in the first place don't want to keep at it, they associated stress and difficulty with it. While the ones who enjoyed learning and being challenged want to use it everywhere and can't go back to Java.

I've learned that, if you give a lot of initial support to the people who aren't open to being put out of their comfort zone, especially help them setup a friendly IDE, show them how to use the REPL first and foremost, make sure they do use it, if they don't, remind them to use it, show them again how, address why they aren't, and help guide them a little through their first month, then you can push them over to the group that loves Clojure, or at least a neutral place.

I say neutral, because generally the latter group tends to have less passion about programming in general. This is a career for them, not a hobby. They may never fall in love with any language, there's just the one they're familiar and comfortable with, and the ones they're not.

Now, some positives to adopting Clojure is that if you are in a medium to large org, the passionate, curious to learn new things, programmers on other teams will be easy target to recruit to yours.

Initially, our manager was worried it would hurt our ability to find talent, but it has actually become our biggest selling point, allowing us to attract really motivated, passionate, curious and knowledgeable devs from other teams to move to ours, because they wanted the opportunity to learn and use Clojure.

Now, I've seen people in the latter group adopt Scala or Kotlin afterwards, this is because while Europe might be too far out your comfort zone, a trip down to Disneyland or Las Vegas where things are mostly the same, and you still feel safe and at home, while only being a little out left might still be enjoyable to them.

Also, languages like Scala and Kotlin can be treated like new versions of Java to some extent. It does make some of the more annoying parts of Java easier. So they're more like having bought a brand new car in a different category, like going to an SUV after a Sedan.

My observation on that is that for a very long time they do not learn about and use the more different parts of Kotlin and Scala, but because of how linear and smooth that curve can be, eventually they end up learning. So again, the learning was made easier and safer by just being amortized over a really long period of time, 6 months to 2 years.

So I've observed that people who go through 1 month of Clojure learn as much new paradigms as 6 months of Scala and Kotlin.

Not everyone is willing to push themselves hard from the start. Those that are, will pick up Clojure in no time and love it. Those that aren't will hate it, even though they managed to use it without issue. For those, you need to offer them really good coaching and support, to ease their pain. Or, you're better off with a FP language that has a more gradual curve like Kotlin or Scala or even Java 8 FP features.

Monday, January 30, 2017

When to use Elixir over Clojure?

Elixir and Clojure are very similar languages, almost akin to C# and Java, and you won't have much difficulty going from one to the other.

  • The syntax will change.
  • The libraries will not be the same.
  • The target VM will be different.

Appart from that, most things will be pretty similar.

So when should you choose to use Elixir over Clojure?

Whenever you want to use BEAM, or to put it in more details:

  • When you need highly distributed parallel processes with fault tolerance and high availability.
  • When you want soft real-time guarantees.

When should you choose Clojure over Elixir?

  • When you need to use the JVM for compatibility or library reasons.
  • When you need a single process and single machine performance matters.
  • When you want a coherent server and front-end language ecosystem: Clojure/ClojureScript
  • When you want a functional language instead of Java.
  • When you want a strong dynamic language instead of Java.
  • When you want a very interactive REPL workflow.
  • When you want to make heavy use of macros, and maybe create a lot of custom DSLs.

There's one case where the choice will be harder and that's when you need distributed processes, but don't need that many of them. In this case, I think you should pick the one who's syntax and libraries you prefer, since both can adequately deliver on that use case.

Friday, August 26, 2016

Common Naming Road-Blocks

As I program, I always encounter naming issues. What should I call this, and that? Sometimes, I spend more time thinking of a name then implementing. To help me let go of my name paralysis, I've come up with some naming patterns that seem to satisfy my endless pedantic-ism. This way, I can simply refer to these rules, choose a name, and move on with my life.

You have re-factored a method, but you need to keep the old one around, what do you call the new method?

This always happens to me. I re-factor a method, but need to keep the old one around. It's possible my new one has changed the behavior slightly, and things still depend on the old behavior, but I want all new code to adapt and depend on the new one instead. Until you re-factor everything, you need to keep the old one around.

SOLUTION

Add V followed by a numerical number to the new method.

If you have method(); you simply call the new one methodV2. If you had an even newer one, you'd name it methodV3(). Don't rename the first one to methodV1(), that's just more confusing and will force you to change the call sites.

Pros

  1. It does not require you to change the name of the original, saving you some re-factoring work on the callers side.
  2. It's clear which method is the newest one.
  3. It adds minimal amount of extra characters to the method names, keeping it short, sweet and to the point.
  4. The V clearly indicates that it does the same thing as the older ones, but is just the newer version.

Cons

  1. If the old method ever becomes unused, and you're able to get rid of it, you'll now be stuck with all your methods having version numbers. Which asks the question? Should I decrease the numbering of all the methods? I would say no, don't bother. And if you ever create an even newer one, start it at the end, even though there's room at the beginning.

You have a method that does more then one thing, how can the name reflect all the effects the method has?

I come upon this one occasionally. You've got a setter method for example, but it actually sets three fields from an Object you pass it. You could say it's just bad design, a method should not do more then one thing, but the truth is, these often are very convenient, and sometimes, you have to do it for performance.

SOLUTION

Name the method based on its primary effect, and append AndMore to it. Then document what the AndMore is with a comment.

So if you have setFirstNameAndLastNameAndAgeAndEthnicity(User user), just name it setFirstNameAndMore(User user). Don't forget to document the AndMore in a comment!

Pros

  1. It keeps the name short.
  2. It makes it clear what the primary effect and intent is.
  3. It's very clear that the method has more effects, and gives a way for them to be further detailed if the caller cares.

Cons

  1. If you want to know what the other effects are, you need to read the doc.
  2. Maybe you shouldn't have a method with more then one intended effect, you should double check that there's no way for you to split this out.

You have a single implementation of an interface, what should you name the interface and its single implementation?

You've created an Interface and you feel good about it. You then create a first implementation of it. Slowly you realize, your use case doesn't need any other implementation quite yet. What should you call the Interface and its Implementation?

SOLUTION

IInterface and Interface or Interface and DefaultInterface

There's two ways to go about it here. I prefer the first way, where you prefix the Interface with a capital I and you simply call the implementation with the same name but without the I. I like this because when you auto-complete, if you type I, you easily find all interfaces and generally, you want to type to the interface, not the implementation. Java conventions is to not put a I on your interface. In that case, if you want to follow convention, use the second way. Call the interface as it is Interface and prefix Default to the implementation: DefaultInterface.

If you have more then one implementation, but one implementation is more commonly used, I believe it should also then be called using the above naming scheme. While the other implementations should describe in their name what is different from the common one.

IDog Dog BigDog SmallDog HandicappedDog

Only if you have equally common implementations should you not have any of them be non descriptive of what sets them apart.

IAnimal Dog Cat Horse

Pros

  1. Descriptive when it needs to be, undescriptive when it doesn't.
  2. Becomes a known convention for which implementation is the original canonical one.

Cons

  1. Some people find having an interface when you only have one implementation is unnecessary code complexity. In which case, you can argue you should only have an interface when you have two or more implementation, and then they should always be more descriptive then the interface name.

This is an always evolving article, as I find more and more patterns, or improve on the ones already existing, I'll update this article to reflect that.

Sunday, June 12, 2016

Inversion of Control [IoC] Vs. Dependency Injection [DI]

Inversion of Control [IoC] and Dependency Injection [DI]. They're both related, because DI is a pattern that applies the IoC principal, but they're not similar.

Inversion of Control [IoC]

This concept describes application control flow. Whenever the flow of control is inverted, that means that the responsibility for the order of execution of some code is relegated to a parent component, then you are in the presence of an inversion of control.

An event driven system is an example of IoC. Your methods are triggered by the events, but you are not in control of the order of these events, and when they trigger, meaning that you have lost control of the flow of your methods and delegated it to another component.

It doesn't have to be fancy, here's an example:

public class Plugin {
  public void onAction() {
    System.out.println("Action occured.");
  }
}

Now imagine you gave this class to some other component:

public void main() {
  Plugin plugin = new Plugin();
  MasterController master = new MasterController(plugin);
}

In this example, you don't know when onAction will execute, it's not under your control, the Plugin class is not in charge of its full flow anymore, the control is now in the hands of MasterController.

As in my example, IoC became prominent mostly as a way to produce a plugin architecture, where functionality could be extended at run-time, by dynamically loading new code that plugged into a framework at known extension points. And similarly, frameworks often rely completely on IoC, in fact, a lot of people including myself like to distinguish a library from a framework based on this fact. A library has you calling into it, when you want, but a framework calls you, when it wants. This is often known as the Hollywood principle, and is the basis of Inversion of Control.

Dependency Injection [DI]

This concept is even simpler to understand. Whenever the things a component depends on are passed to it, instead of having it acquire or create them, you've got DI.

public class Guy {
  private BestFriend bestFriend;

  public Guy(BestFriend bestFriend) {
    this.bestFriend = bestFriend;
  }

  public void makeImportantDecicion(String about) {
    if(bestFriend.thinksItsGoodIdea(about)) {
      ...
    }
  }
}

As you can see, Guy depends on BestFriend, but he does not create an instance of BestFriend or fetches one from anywhere, instead, it expects it to be passed in.

It doesn't matter how the dependencies are passed in, could be through the constructor, or through some setters methods, or any other means.

Note that the point is for the class to have all direct dependency be passed in to it. So its fine to have a Factory passed in, and then use the Factory to acquire instances of something else and use those, but it's not ok for the class to use a static factory method, since that would be a dependency that's not passed in and would go against DI.

Why they are often mistaken for one another?

The confusion between IoC and DI stems from the fact that Dependency Injection is a form of Inversion of Control. Think how a square is a rectangle, but a rectangle is not a square. It's the same thing here, DI is IoC, but IoC is not DI.

Let's revisit the Guy example. Does Guy have any say as to who his best friend is? Nope. Guy has no control over which friend he's going to ask for advice, the BestFriend is going to be passed in, and something else gets to decide what concrete instance of it to pass in. Guy has lost his control over what code to call into for the thinksItsGoodIdea method. A parent component is now in charge, which mean the control was inverted.

In frameworks like Spring, DI is used extensively as a method for implementing the plugin architecture and a form of IoC. You can choose between different components, and wire them all as you wish inside of Spring. Spring becomes responsible for a lot of the control flow decisions. Allowing you to plug in and out components.

Additional Readings

  1. https://github.com/google/guice/wiki/Motivation
  2. http://stackoverflow.com/questions/26378824/sample-ioc-that-is-not-di
  3. http://martinfowler.com/bliki/InversionOfControl.html

Saturday, June 11, 2016

Artificial Intelligence - Thoughts in Philosophy

Recently I've been reading more and more of prominent scientists and engineers that warns against the dangers of artificial intelligence. It got me wondering...

What is intelligence anyways?

When I dissect this to its minimal form, I feel like intelligence is simply the level at which one is able to change the physical world into a form that satisfies a certain need. Or at least to be able to conceive of a way to do so.

With this definition, we, as humans, definitely have a high level of intelligence, proven by how much we've managed to re-shape our world to our needs. Even a paralyzed man, with no means to physically alter the world, can still, if his mind is intact, conceive of ways to do so, Stephen Hawking is a good example.

So it could be said that intelligence is the level at which one can conceive of ways to physically alter the physical constructs of his environment.

Well then, could a computer ever be intelligent?

There's a missing piece of the puzzle here. You see, being able to conceive of ways to change your surroundings implies that you also have motives to do so. In fact, there would be no point to have this capacity, as without a need to be fulfilled, one would never exercise such capacity even if it had it.

You need an objective to decide what change must be made.

Without an objective, you'd be at best randomly conceiving ways to change things. There's things in our universe which seem to exhibit such properties: the wind, the planet's core, thunder, fire, etc. We tend to regard these as non intelligent phenomenon.

This means my definition is incomplete. Intelligence is the level at which one can conceive of ways to physically alter the physical constructs of his environment in the way he wants.

To want anything, one must have needs. Thus, you can not be fully intelligent if you don't have needs. Similarly, you can not be fully intelligent without the ability to conceive of ways to change the world.

Is that all?

Actually, no. I mentioned earlier that you needed not have the capacity to alter the world, simply to conceive of ways to do so, but, I don't think that's totally correct. To be truly qualified as intelligent, you'd need to be observably intelligent. Maybe plants have a hundred fold our capacity to conceive ways to change the world, and do possess needs of their own, but alas, with their limited ability to apply those conceptions, one would be hard pressed to ever qualify them as intelligent.

Finally I say, Intelligence is the level at which one can observably alter the physical constructs of his environment as to satisfy his needs.

Should we be worried about intelligent machines?

To create an intelligent machine, one would need to:

  1. Create a machine with needs.
  2. Create a machine that conceives of ways to fulfill needs.
  3. Create a machine that can physically apply preconceived ways to alter the environment.

With that perspective:

  • We should be very careful with #1.

The reason I say that is, if a machine had needs, chances are they would conflict with our own, and that's absolutely when you get in a dangerous spot. Really, we should avoid that one completely, it does not even provide us with any benefit.

  • We should be very careful combining #2 and #3.

This one is more subtle. Ideally, we'd want to tell the machine, this is what I need, and it would go off, think of a way to meet it, and go ahead and execute on it all by itself. That's when you get into those sci-fi movie tropes, of the machine that enslaves all humans because it figured out that was the best way to create world peace. So I'd say, we'd want to keep these two separate, so that we can audit all conceived ideas first, before giving it to another machine to execute.

Conclusion

In the end, I think that the scientists and engineers who are warning us of the potential dangers of AI are mostly right. Given we can and ever do create a machine with full observable intelligence, it would definitely have the potential to put us all at risk. Having said that, I think there's great opportunity for improving our lives if we could build independent machines that each had partial intelligence, and thus, it is probably worth it to keep research going towards these goals.