· text
For the last seven years, all my code has lived in a monorepo. This article documents some lessons Iʼve learned building software that way. The emphasis is build tools; not a cost-benefit tradeoffs monorepos in general.
My monorepo has a single, monolithic Makefile
at its root that describes: all the repositoryʼs outputs, the scripts that produce them, and the inputs they need. (Itʼs not literally a Makefile
, but for the sake of this article we can pretend). A program (specific to this repository) called gen-make
produces the Makefile
.
I run gen-make
manually whenever I add or remove files from the tree, or change dependencies between packages. It walks the repository, locates files of interest, analyzes dependencies between projects, and produces a Makefile
describing what it found. gen-make
takes 2 seconds to run.
When I change any file in the repository, I run make
(letʼs pretend) to rebuild the entire repository. A typical no-op build takes about 300 ms. Thatʼs the overhead for checking which of 45,000 files have changed (including file checksums if necessary). Beyond that overhead, a build takes however long go build
or clang
would take if I had run them manually; usually less than 1 second.
I love that I can make a small change somewhere and automatically rebuild everything it touched. Itʼs especially helpful when updating libraries. If my change breaks a consumer, I know it right away.
I never build individual outputs. I always build the entire repository. This wouldnʼt be possible if I had to rely on go build
or clang
to check dependencies. The monolithic Makefile
has a global view letting it skip huge chunks of the build with just a few stat(2) calls. Itʼs amazing how much duplicate effort is performed when running two builds independently. A global view avoids most of that.
I also love how this build process works gracefully for every output I care about: executables, libraries, static HTML files, man pages, etc. A few changes to gen-make
define a new output format.
Finally, file checksums (not part of make(1)) are vital. I can run scripts, like date(1) or git fetch
, and only perform builds if the output changes, not the timestamp.
Unfortunately, gen-make
has too much responsibility. When I first created gen-make
, every project was the same: run go build
to produce a single binary.
Thatʼs not the case anymore. Just within the Go ecosystem some subprojects need go generate, some want go run, some produce multiple executables, some output JavaScript via gopherjs. Of course, there are also C projects, OCaml projects, static site generators, scripts that check the Internet for updated data files, etc. This is how build systems grow into unwieldy monsters.
No matter how much effort I put into simplifying gen-make
, its domain is inherently too complex, so its code remains complex. Configuration for one project sometimes interferes with configuration for another project because gen-make
acts like shared, mutable state for them both. That leads to exponential complexity.
Wishlist: separate configuration for each subproject. I still want the benefits of a global Makefile
, but each subproject should define its own needs. Maybe each subproject has its own gen-make
script whose output is merged into the main Makefile
. Presumably subprojects with a standard layout would call out to gen-make-go
or gen-make-c
from their local gen-make
. This layout would would avoid interference between Go rules and C rules, for example.
To nobodyʼs surprise, I often forget to run gen-make
when I add a new file or change dependencies. Sometimes I notice this immediately because make
insists that thereʼs nothing to build. Other times I notice it the next day when I run gen-make
for something unrelated and see the Makefile
change unexpectedly.
Wishlist: automatically run gen-make
if doing so would change Makefile
. Since it takes 2 seconds, itʼs too slow to run it on every build (like I did when the monorepo was small).
Recursive make is tempting here (pause to let redo fans have their moment). If the semantics were rich enough, a local gen-make
script could produce a local Makefile
that regenerates itself when directories change (new file, removed file). Then you could merge each subprojectʼs Makefile
into a global Makefile
. On a given build, most gen-make
scripts wouldnʼt run so this should be fast.
Building Hello World with go build
depends on the obvious things like hello.go
but it also depends on /usr/local/bin/go
and the absence (or presence) of go.mod
in the current directory or its ancestors. Depending on how tools change between releases, these dependencies can change in unexpected ways. I try to teach gen-make
about these dependencies, but I miss one often enough to notice.
Wishlist: use ktrace(1) and kdump(1) to see everything that a build touches or considers touching. If any of those system calls might answer differently, the build needs to run again. I have a prototype of this and it works better than I expected.
Itʼs common practice to have a single, standardized file describing all third-party libraries. When this file changes, a tool like npm
downloads the updated libraries to a local cache. Unfortunately, a small change to package.json
(for example) often triggers a rebuild of everything in the repository.
Wishlist: let a Makefile
say something like: if this file is being rebuilt, wait for it to finish but I donʼt actually care about its content.
Because my make tool does file checksums, I can fake it with an empty file package.json.done
thatʼs created after downloading third-party libraries. A build rule lists package.json.done
as an input. The build rule for package.json.done
lists package.json
as an input. This causes downstream builds to wait until new libraries are available but doesnʼt trigger a rebuild by itself.
This feels like a hack, but maybe itʼs cleaner than adding these semantics to a build tool.