Funktionale Programmierung - Build Systems with Haskell

Welcome to the New Year! We‘ll get started right with the first article of 2014 on a build system written in Haskell called shake.

Larger software projects (almost) all use a build system to automatically create a finished software product from source code. This includes, for example, compiling source files, linking object files, generating documentation, or assembling distribution archives.

This blog article provides an introduction to the shake build system, written in Haskell. This system has the advantage that dependencies between build artifacts can arise dynamically, i.e., while the build system is running. With make, probably the best-known build system, dependencies must be known before invoking the build system, which in practice often leads to limitations and problems.

We use shake at our company to compile our product Checkpad MED. Here shake plays to its full strengths, as an important component of the Checkpad infrastructure is code generation. Thanks to dynamic dependencies, it is possible to compile the program that generates the code, generate the code itself, and compile and link the generated code with a single invocation of the build system.

Neil Mitchell, the author of shake, developed a variant of the tool for use at Standard Chartered to efficiently compile really large software projects. Details on this as well as detailed information on shake‘s internal architecture can be found in this article.

As an example, we develop a build system as an example for a project written in the C language. To make the example more interesting, one of the involved .h files is generated, with the source code of the generator itself being part of the project. This roughly corresponds to the setup we have in the Checkpad project. For simplification, we‘re using a project written in C here, since the dependencies when compiling Haskell source code are significantly more complicated than for C source code. shake itself has no assumptions about the programming language(s) of the project being built. For example, it is also possible to compile Java projects with shake. However, such an endeavor would be made more difficult by the fact that when compiling Java files, it is difficult to predict which .class files will be generated from a .java file.

Let‘s start with a very simple helper function for extracting #include files from C source files. The function reads a .c file and returns the filenames of all lines beginning with #include ".

cIncludes :: FilePath -> Action [FilePath]
cIncludes x =
    do s <- readFile' x
       return $ mapMaybe parseInclude (lines s)
    where
      parseInclude line =
          do rest <- List.stripPrefix "#include \"" line
             return $ takeWhile (/= '"') rest

The type Action is a monad defined in Shake in which all build actions are executed. An essential aspect of the Action monad is the tracking of dependencies. For example, the readFile' function used above is a wrapper around the readFile function from the standard library, which in addition to reading the file also remembers a dependency on the file.

Now we come to the actual rules of our build system. Rules are defined in the Rules monad. Usually the Shake *> operator is used for this, which has the following type signature:

(*>) :: FilePattern -> (FilePath -> Action ()) -> Rules ()

We‘ll see this operator in action right away:

rules :: Rules ()
rules =
    do "*.o" *> \out ->
           do let c = replaceExtension out "c"
              need (cIncludes c)
              system' "gcc" ["-o", out, "-c", c]

The above rule describes how to create a .o file. The first argument of *> is the pattern that must match the output file, the second argument is a function that creates this output file. In the above rule we see that the .c file belonging to the .o is compiled by gcc. Beforehand, dependencies on the header files referenced in the .c file are dynamically introduced using need. If you are familiar with make, you have surely noticed that something like this doesn‘t work directly in make but only with tricks, and these tricks often have their price.

Usually, dependencies on header files are satisfied trivially, since header files are typically not generated by the build system. For our example, however, we assume that there is a header file Auto.h that is generated by a program Codegen:

       "Auto.h" *> \out ->
           do need ["Codegen"]
              system' "./Codegen" [out]

Finally, there are two more rules for the two binaries Codegen and Main:

       "Main" *> \out -> buildBinary out ["Hello.c", "Main.c"]
       "Codegen" *> \out -> buildBinary out ["Codegen.c"]

Below is the buildBinary function that creates a binary from given .c files. First, a dependency on the corresponding .o files is specified using need, which causes the .c files to be compiled into .o files using our very first rule. Then the .o files are linked by invoking gcc.

buildBinary :: FilePath -> [FilePath] -> Action ()
buildBinary out cs =
    do let os = map (\c -> replaceExtension c "o") cs
       need os
       system' "gcc" (["-o", out] ++ os)

Note that, unlike, say, make, shake doesn‘t use its own language to express dependencies and build rules. Instead, Haskell is simply used for both purposes! Two properties of Haskell are particularly helpful here: monads and user-defined operators like *>. In other words: to specify build rules and dependencies, we use in shake a small special-purpose language embedded in Haskell, i.e., a DSL. The special-purpose language is only special insofar as it uses certain operators and monads; it is still Haskell. Therefore, it is also easily possible to implement, for example, the extraction of header files included via #include using „normal“ Haskell.

Finally, here‘s the main function, which essentially specifies the targets to be built by calling the want function and passes the just-specified set of rules to shake.

main :: IO ()
main =
    do args <- getArgs
       shake shakeOptions $
          do let targets = if null args then ["Main"] else targets
             rules
             want targets

If you want to try out our little build system, you can find an example project here. Have fun experimenting!

For all Haskell programmers among you who have built your projects with cabal until now: stick with cabal if cabal‘s functionality has been sufficient for you so far. The use cases of shake and cabal are actually quite different: cabal can build Haskell projects and can invoke all the necessary tools for that. As a developer, you have little work with it, although some things like code generation are difficult to realize with cabal. shake, on the other hand, is a programming language-independent framework for build systems, so you have to define all the rules completely yourself, which is not necessary at all with cabal. In return, with shake you have significantly more flexibility than with cabal.

So, that‘s it for today. We have seen, at least to some extent, that shake allows the creation of very powerful build systems. As long as your projects don‘t require complicated build rules, using shake is certainly overkill. But as soon as things get a bit more complicated, its use can be worthwhile. In the paper mentioned above, you can find many more details about shake, in particular a comparison with other build systems.