Transducer: Composition, Abstraction, Performance
deTransducer: Composition, Abstraction, Performance
Higher-order functions like map, fold, filter are indispensable in any
functional program. With their flexibility, they are the tool of choice for
operations on collections of all kinds. However, their scope of application is
not limited to classic lists or vectors. In this article, we examine more
fundamental properties of these operations and take a particular look at
so-called transducers in the Clojure programming language.
Additionally, we will see how we can achieve not only very good reusability but also higher performance with the help of meaningful abstraction.
Everything is a fold
Those who have engaged with functional programming in the past may already be
familiar with this statement. If you haven‘t had the opportunity yet (or want to
refresh your memory), let‘s first look at a variant of the usual definitions of
map and filter. (Note: All examples are written in Clojure, so from this
point on we‘ll call fold by its Clojure name reduce):
(defn map
  "Takes a function f and applies it to every element of xs."
  [f xs]
  (if (empty? xs)
    xs
    (cons (f (first xs)) (map f (rest xs)))))
(defn filter
  "Takes a predicate pred and returns a list with all elements of xs that
  satisfy pred."
  [pred xs]
  (if (empty? xs)
    xs
    (if (pred (first xs))
      (cons (first xs) (filter pred (rest xs)))
      (filter pred (rest xs)))))
Many definitions can be found like this in various standard libraries. As the
heading of this section already reveals, both functions can also be defined via
fold (or rather reduce). The signature of reduce, expressed in Haskell
notation, looks like this: (b -> a -> b) -> b -> [a] -> b (specialized for
lists in this case).
(defn mapping
  "Takes a function f and returns a function. The returned function takes a
  collection acc and an element x and appends x applied to f to the end of
  acc."
  [f]
  (fn [acc x] (conj acc (f x))))
(defn map
  "Takes a function f and applies it to every element of xs."
  [f xs]
  (reduce (mapping f) [] xs))
(defn filtering
  "Takes a predicate pred and returns a function. The returned function takes
  a collection acc and an element x and appends x to the end of acc if it
  satisfies pred."
  [pred]
  (fn [acc x] (if (pred x) (conj acc x) acc)))
(defn filter
  "Takes a predicate pred and returns a list with all elements of xs that
  satisfy pred."
  [pred xs]
  (reduce (filtering pred) [] xs))
If our programs consisted exclusively of list processing, we could stop here,
satisfied. However, something catches the eye: apart from the call to conj,
the definitions of mapping and filtering tell us nothing about them working
„only“ on collections!
From Collections to Processes
We have established that the connection to collections in our mapping and
filtering functions exists only through conj. Now let‘s go one step further
and see what happens when we abstract over conj. For this, it‘s helpful to
think not of lists but of sequences of steps. mapping and filtering take on
the role of „process modifications“: they receive a step and deliver a modified
version of this step. So let‘s define a version of our functions parameterized
over this „step“:
(defn mapping
  [f]
  (fn [step]
    (fn [acc x] (step acc (f x)))))
(defn map
  [f xs]
  (reduce ((mapping f) conj) [] xs))
(defn filtering
  [pred]
  (fn [step]
    (fn [acc x] (if (pred x) (step acc x) acc))))
(defn filter
  [pred xs]
  (reduce ((filtering pred) conj) [] acc))
Looking at the resulting definitions of mapping and filtering, we notice
that the data structure on which they now operate is determined by the step
parameter. This means we are no longer dependent on classic collections, but can
(as we will see later) work with more or less arbitrary data structures and
process them using mapping and filtering. However, one consideration for this
step is still outstanding: while lists and vectors make it easy to think about a
„beginning“ and an „end,“ which we use in map and filter, it‘s different
with data structures like streams or signals. Instead of relying on the data
structure, we take a different path:
- Our transformers (
mapping,filtering) should have a concept of „beginning“ and „end“ by themselves. - We expect from our 
stepfunction not only that it is binary, but also that when called with no or one argument, it produces a „null“ value. Examples in Clojure would be(conj) => [], (conj [1]) => [1]or(+) => 0, (+ 1) => 1, etc. 
We‘ll only address point 1 here. First, we‘ll propose an implementation of mapping and filtering. In the code example below, we make use of Clojure‘s syntax for „arity overloading.“ This means we can offer multiple implementations in one function for different numbers of arguments (more on this at Clojure - Functional Programming). Clojure will choose the corresponding implementation depending on the number of arguments passed.
(defn mapping
  [f]
  (fn [step]
    (fn
      ([] (step)) ; 1.
      ([acc] (step acc)) ; 2.
      ([acc x] (step acc (f x))) ; 3.
      )))
(defn filtering
  [pred]
  (fn [step]
    (fn
      ([] (step))
      ([acc] (step acc))
      ([acc x] (if (pred x)
        (step acc x)
        acc)))))
At first glance, this may look somewhat unintuitive, but it‘s quickly explained
using the example of mapping. The numbering here corresponds to the numbers in
the code example above.
- This case covers the „beginning“ of a process. Nothing has been calculated
yet and there is no „next“ calculation. In this case, we want a „neutral“
element from our 
stepfunction with which we can kick off the calculation. - Here we signal the „end“ of a process. No more elements are coming, so we take the last step with the already available „result.“
 - Finally, there‘s our case that processes the current result and an element together.
 
These functions are now parameterized to such an extent that we can express somewhat arbitrary processes with them and, probably more importantly, execute (or compose) multiple such process modifications in sequence! What does this look like in reality?
Process Modifiers
In the following, we give two examples of how all this helps us in the real world and when we might prefer this approach to regular chaining of list operations.
In both examples, we deal with the following problem: The semester is over and
the professor wants to know how good the average performance of their master‘s
students was in the exercise sessions. Using
Clojure-Spec, we define some sample data. The
first four lines define the „shape“ of our data. For example, ::title is a
value that satisfies the string?-predicate; in the fifth line, we specify
::exercise as a map with the keys defined above. The output of sample has
been prettified here a bit; the actual result would probably look somewhat more
obviously random. Those who want to learn more about Spec can do so in an older
blog article (German language
only).
(ns my.namespace
  (:require [clojure.spec.alpha :as s]
            [clojure.spec.gen.alpha :as sgen]))
(s/def ::title string?)
(s/def ::student string?)
(s/def ::points (set (range 31)))
(s/def ::degree #{::msc ::bsc})
(s/def ::exercise (s/keys :req [::title ::student ::points ::degree]))
;; `sample` takes a generator for our `spec` and returns some random data that
;; matches this spec.
(sgen/sample (s/gen ::exercise))
;; =>
;; [{::name "Marco"
;;   ::title "Exercise 1"
;;   ::points 29
;;   ::degree ::msc}
;;  {::name "Mathias"
;;   ::title "Exercise 1"
;;   ::points 28
;;   ::degree ::bsc}
;;  ...]
Back to our task. This could be solved with regular list functions, for example, like this:
(defn sum-of-msc
  [exercises]
  (reduce + 0 (->> exercises
                   (filter #(= ::msc (::degree %)))
                   (map ::handins))))
Applied to a list of exercises, it gives us the correct result. However, there
is a problem: Each call to filter, map, and reduce computes a new list!
With large amounts of data, this can, as we‘ll see shortly, certainly lead to
problems.
Next, we implement the same functionality expressed via our newly defined operators.
The first step will be a slightly modified version of reduce named
reduce-with. This works very similarly to reduce, except that in addition to
the reduction function (here step), it expects a parameter xf, which
represents a composition of our functions.
(defn reduce-with
  "Takes a function composed of process modifications xf, a step function, an 
  initial value and a sequence of operations. Applies xf with a step to every x
  in xs."
  [xf step init xs]
  (let [f (xf step)]
    (f (reduce f init xs))))
Inside the function, we call reduce as usual. However, the composition
function xf is used as the reduction function, which is specialized for
evaluation via the step parameter (itself a function) and bound to f. Then
reduction occurs and the result is called once more with f to produce a final
result. But what must xf look like? This function, quite analogously to
sum-of-msc above, is also a composition of filtering and mapping. We call it
xform here (a convention in the Clojure world).
;; Compose two process modificators into one.
(def xform
  (comp (filtering #(= ::msc (::degree %)))
        (mapping ::points)))
You may rightly wonder why the order of calls to filtering and mapping
doesn‘t change, since regular composition (as with Clojures comp function) is
evaluated „from right to left,“ but the threading in our sum-of-msc happens
„from left to right“ (or top to bottom). At this point, the complete answer
would exceed the scope. For the moment, let‘s just note that the composition of
this type of function also evaluates „from right to left.“ You can easily find
this out by completely reducing the composition of the functions by hand once.
(defn sum-of-msc-2
  [exercises]
  (reduce-with xform + 0 exercises))
This solution also delivers the correct result, also revealing a major speedup:
(let [exercises (sgen/sample (s/gen ::exercises) 1000)]  ; generate a random sample of 1000 exercises
  (= (time (sum-of-msc exercises))
     (time (sum-of-msc-2 exercises))))
;; => true
;; "Elapsed time: 407.378092 msecs"
;; "Elapsed time: 2.144632 msecs"
Upon closer inspection, the reason becomes obvious: Instead of computing a new list for each step, the processes are executed sequentially for each element. This saves us long intermediate results and gains us throughput.
Note: Obviously, the timing varies a lot by the actual kind of inputs we
generate. Also, time is not really intended to be used as a measuring tool.
However, the tendency that the transducer-version is vastly faster than the
‚standard‘ implementation still stands, even over repeated observations.
This concept is so practical that it is now part of the Clojure standard library
under the name transducer. Many functions are already set up for use as
transducers (including the well-known map, filter, mapcat, …). Our
function reduce-with is defined there as transduce.
More Than Lists
Finally, the promised second example. The problem is the same, but this time we
don‘t want to use lists but channels (via clojure.core.async). Here we write
all values to a channel c and observe how the same xform process
modification is applicable to these very channels. We also now use the built-in
function transduce, which takes over the work of sum-of-msc-2.
(ns my.namespace
  (:require
    [clojure.core.async :as async]
    [clojure.spec.alpha :as s]
    [clojure.spec.gen.alpha :as sgen]))
(let [exercises (sgen/sample (s/gen ::exercises))
      ;; Create a channel using the xform defined above.
      c (async/chan 1 xform)]
      ;; Put all our elements onto the channel.
    (async/go (async/onto-chan c exercises))
    (let [chan-res (loop [;; Read one element from the channel.
                          n (async/<!! c)
                          res 0]
                     (if-not n
                       res
                       (recur (async/<!! c)  ; Read the next element from the channel.
                              (+ res n)  ; Accumulate the result.
                              )))
          list-res (transduce xform + 0 exercises)]
      (= chan-res list-res)))
;; => true
Here we first put all elements into the channel (onto-channel). Then we read
step by step from this channel until no more elements are present in it (<!!).
As we see from the result, our xform does exactly the same here as it did with
lists. Since our xform doesn‘t care what data structure it operates on (as
long as it provides the infrastructure mentioned above), it can be reused
without any changes and without problems!
Conclusion
Conclusion Transducers are the natural continuation of widely used list functions. In this article, we engaged with consistent abstraction and learned the concept of transducer along the way. Not only do transducers give us a powerful description of data transformations that is purely based on function composition and can be used in many ways, but they also bring crucial performance improvements.“