For the last seven years, all my code has lived in a monorepo. This article documents some lessons Iʼve learned building software that way. The emphasis is build tools; not a cost-benefit tradeoffs monorepos in general.
My monorepo has a single, monolithic
Makefile at its root that describes: all the repositoryʼs outputs, the scripts that produce them, and the inputs they need. (Itʼs not literally a
Makefile, but for the sake of this article we can pretend). A program (specific to this repository) called
gen-make produces the
gen-make manually whenever I add or remove files from the tree, or change dependencies between packages. It walks the repository, locates files of interest, analyzes dependencies between projects, and produces a
Makefile describing what it found.
gen-make takes 2 seconds to run.
When I change any file in the repository, I run
make (letʼs pretend) to rebuild the entire repository. A typical no-op build takes about 300 ms. Thatʼs the overhead for checking which of 45,000 files have changed (including file checksums if necessary). Beyond that overhead, a build takes however long
go build or
clang would take if I had run them manually; usually less than 1 second.
I love that I can make a small change somewhere and automatically rebuild everything it touched. Itʼs especially helpful when updating libraries. If my change breaks a consumer, I know it right away.
I never build individual outputs. I always build the entire repository. This wouldnʼt be possible if I had to rely on
go build or
clang to check dependencies. The monolithic
Makefile has a global view letting it skip huge chunks of the build with just a few stat(2) calls. Itʼs amazing how much duplicate effort is performed when running two builds independently. A global view avoids most of that.
I also love how this build process works gracefully for every output I care about: executables, libraries, static HTML files, man pages, etc. A few changes to
gen-make define a new output format.
Finally, file checksums (not part of make(1)) are vital. I can run scripts, like date(1) or
git fetch, and only perform builds if the output changes, not the timestamp.
gen-make has too much responsibility. When I first created
gen-make, every project was the same: run
go build to produce a single binary.
No matter how much effort I put into simplifying
gen-make, its domain is inherently too complex, so its code remains complex. Configuration for one project sometimes interferes with configuration for another project because
gen-make acts like shared, mutable state for them both. That leads to exponential complexity.
Wishlist: separate configuration for each subproject. I still want the benefits of a global
Makefile, but each subproject should define its own needs. Maybe each subproject has its own
gen-make script whose output is merged into the main
Makefile. Presumably subprojects with a standard layout would call out to
gen-make-c from their local
gen-make. This layout would would avoid interference between Go rules and C rules, for example.
To nobodyʼs surprise, I often forget to run
gen-make when I add a new file or change dependencies. Sometimes I notice this immediately because
make insists that thereʼs nothing to build. Other times I notice it the next day when I run
gen-make for something unrelated and see the
Makefile change unexpectedly.
Wishlist: automatically run
gen-make if doing so would change
Makefile. Since it takes 2 seconds, itʼs too slow to run it on every build (like I did when the monorepo was small).
Recursive make is tempting here (pause to let redo fans have their moment). If the semantics were rich enough, a local
gen-make script could produce a local
Makefile that regenerates itself when directories change (new file, removed file). Then you could merge each subprojectʼs
Makefile into a global
Makefile. On a given build, most
gen-make scripts wouldnʼt run so this should be fast.
Building Hello World with
go build depends on the obvious things like
hello.go but it also depends on
/usr/local/bin/go and the absence (or presence) of
go.mod in the current directory or its ancestors. Depending on how tools change between releases, these dependencies can change in unexpected ways. I try to teach
gen-make about these dependencies, but I miss one often enough to notice.
Wishlist: use ktrace(1) and kdump(1) to see everything that a build touches or considers touching. If any of those system calls might answer differently, the build needs to run again. I have a prototype of this and it works better than I expected.
Itʼs common practice to have a single, standardized file describing all third-party libraries. When this file changes, a tool like
npm downloads the updated libraries to a local cache. Unfortunately, a small change to
package.json (for example) often triggers a rebuild of everything in the repository.
Wishlist: let a
Makefile say something like: if this file is being rebuilt, wait for it to finish but I donʼt actually care about its content.
Because my make tool does file checksums, I can fake it with an empty file
package.json.done thatʼs created after downloading third-party libraries. A build rule lists
package.json.done as an input. The build rule for
package.json as an input. This causes downstream builds to wait until new libraries are available but doesnʼt trigger a rebuild by itself.
This feels like a hack, but maybe itʼs cleaner than adding these semantics to a build tool.