Build Systems with Haskell
deWelcome to the New Year!
We‘ll get started right with the first article of 2014 on a build
system written in Haskell called shake
.
Larger software projects (almost) all use a build system to automatically create a finished software product from source code. This includes, for example, compiling source files, linking object files, generating documentation, or assembling distribution archives.
This blog article provides an introduction to the shake build system, written in Haskell. This system has the advantage that dependencies between build artifacts can arise dynamically, i.e., while the build system is running. With make, probably the best-known build system, dependencies must be known before invoking the build system, which in practice often leads to limitations and problems.
We use shake at our company to compile our product Checkpad MED. Here shake plays to its full strengths, as an important component of the Checkpad infrastructure is code generation. Thanks to dynamic dependencies, it is possible to compile the program that generates the code, generate the code itself, and compile and link the generated code with a single invocation of the build system.
Neil Mitchell, the author of shake, developed a variant of the tool for use at Standard Chartered to efficiently compile really large software projects. Details on this as well as detailed information on shake‘s internal architecture can be found in this article.
As an example, we develop a build system as an example
for a project written in the C language. To make the example more
interesting, one of the involved .h
files is generated, with the
source code of the generator itself being part of the project. This
roughly corresponds to the setup we have in the Checkpad
project. For simplification, we‘re using a project written in C here,
since the dependencies when compiling Haskell source code are
significantly more complicated than for C source code. shake itself
has no assumptions about the programming language(s) of the project
being built. For example, it is also possible to compile Java projects
with shake. However, such an endeavor would be made more difficult by
the fact that when compiling Java files, it is difficult to predict
which .class
files will be generated from a .java
file.
Let‘s start with a very simple helper function for extracting
#include
files from C source files. The function reads a .c
file
and returns the filenames of all lines beginning with #include "
.
cIncludes :: FilePath -> Action [FilePath]
cIncludes x =
do s <- readFile' x
return $ mapMaybe parseInclude (lines s)
where
parseInclude line =
do rest <- List.stripPrefix "#include \"" line
return $ takeWhile (/= '"') rest
The type Action
is a monad
defined in Shake in which all build actions are executed. An essential
aspect of the Action
monad is the tracking of dependencies. For
example, the readFile'
function used above is a wrapper around the
readFile
function from the standard library, which in addition to
reading the file also remembers a dependency on the file.
Now we come to the actual rules of our build system. Rules are defined
in the Rules
monad. Usually the Shake *>
operator is
used for this, which has the following type signature:
(*>) :: FilePattern -> (FilePath -> Action ()) -> Rules ()
We‘ll see this operator in action right away:
rules :: Rules ()
rules =
do "*.o" *> \out ->
do let c = replaceExtension out "c"
need (cIncludes c)
system' "gcc" ["-o", out, "-c", c]
The above rule describes how to create a .o
file. The first
argument of *>
is the pattern that must match the output file, the
second argument is a function that creates this output file. In the
above rule we see that the .c
file belonging to the .o
is compiled
by gcc
. Beforehand, dependencies on the header files referenced in
the .c
file are dynamically introduced using need
. If you are
familiar with make
, you have surely noticed that something like this
doesn‘t work directly in make
but only with tricks, and these tricks
often have their price.
Usually, dependencies on header files are satisfied trivially, since
header files are typically not generated by the build system. For our
example, however, we assume that there is a header file Auto.h
that
is generated by a program Codegen
:
"Auto.h" *> \out ->
do need ["Codegen"]
system' "./Codegen" [out]
Finally, there are two more rules for the two binaries Codegen
and
Main
:
"Main" *> \out -> buildBinary out ["Hello.c", "Main.c"]
"Codegen" *> \out -> buildBinary out ["Codegen.c"]
Below is the buildBinary
function that creates a binary from given
.c
files. First, a dependency on the corresponding .o
files is
specified using need
, which causes the .c
files to be compiled
into .o
files using our very first rule. Then the .o
files are
linked by invoking gcc
.
buildBinary :: FilePath -> [FilePath] -> Action ()
buildBinary out cs =
do let os = map (\c -> replaceExtension c "o") cs
need os
system' "gcc" (["-o", out] ++ os)
Note that, unlike, say, make
, shake
doesn‘t use its
own language to express dependencies and build rules. Instead, Haskell
is simply used for both purposes! Two properties of Haskell are
particularly helpful here: monads
and user-defined operators like *>
. In other words: to specify build
rules and dependencies, we use in shake
a small special-purpose
language embedded in Haskell, i.e., a
DSL. The
special-purpose language is only special insofar as it uses certain
operators and monads; it is still Haskell. Therefore, it is also
easily possible to implement, for example, the extraction of header
files included via #include
using „normal“ Haskell.
Finally, here‘s the main
function, which essentially specifies the
targets to be built by calling the want
function and passes the
just-specified set of rules to shake
.
main :: IO ()
main =
do args <- getArgs
shake shakeOptions $
do let targets = if null args then ["Main"] else targets
rules
want targets
If you want to try out our little build system, you can find an example project here. Have fun experimenting!
For all Haskell programmers among you who have built your projects
with cabal
until now: stick with cabal
if cabal‘s functionality
has been sufficient for you so far. The use cases of shake and cabal
are actually quite different: cabal can build Haskell projects and can
invoke all the necessary tools for that. As a developer, you have
little work with it, although some things like code generation are
difficult to realize with cabal. shake, on the other hand, is a
programming language-independent framework for build systems, so you
have to define all the rules completely yourself, which is not
necessary at all with cabal. In return, with shake you have
significantly more flexibility than with cabal.
So, that‘s it for today. We have seen, at least to some extent, that
shake
allows the creation of very powerful build systems. As long as
your projects don‘t require complicated build rules, using shake
is
certainly overkill. But as soon as things get a bit more complicated,
its use can be worthwhile. In the
paper
mentioned above, you can find many more details about shake
,
in particular a comparison with other build systems.