New make --shuffle mode
TL;DR:
I implemented new --shuffle
option for GNU make to simulate
non-deterministic build order in parallel makefiles.
UPDATE: --shuffle
mode was released as part of GNU make 4.4 and
can be used as:
make --shuffle ...
from the command line.GNUMAKEFLAGS=--shuffle
via environment variables for wider scale builds.
Options Summary has more details.
It already found bugs in 30+ packages
like gcc
, vim
, ghc
, subversion
, strace
, ispell
and others.
Background
About 11 years ago I was a year old Gentoo dev who just started getting
downstream bug reports on mysterious ghc
build failures like
https://bugs.gentoo.org/326347.
The symptoms were seemingly simple: some file was inaccessible while it was being written to, or executed.
Years later I mastered the intricacies of ghc
’s build system on how
to debug it effectively. But at that time I did not really know what to
do. My main working machine was a Core 2 duo HP laptop which could not do
more than -j2
. And even that required a bit of swap for ghc
’s
linking stage. Throwing more parallelism was not really an option to
trigger such bugs.
Makefile target ordering
Makefile
s are fundamentally simple: it’s a graph of dependencies
with a sequence of shell commands attached to a node. There are numerous
caveats, but they should not break this model too much.
In theory you can topologically sort the graph and execute the dependencies in various conforming orders and expect the same result. Modulo missing dependencies in the graph.
In practice GNU make happens to traverse the graph in very specific topological order: it maintains syntactic order as much as possible.
Here is an example Makefile
:
# cat Makefile
all: a b c
b: b1 b2 b3
a: a1 a2 a3
c: c1 c2 c3
a b c a1 a2 a3 b1 b2 b3 c1 c2 c3:; @echo $@ && sleep 1
(I added sleep 1 to make it more visible when next goal schedules).
Here is the sequential execution by GNU make:
$ make
a1
a2
a3
a
b1
b2
b3
b
c1
c2
c3
c
The seen order is exactly all
’s prerequisites left-to-right
recursively.
Adding parallelism does not change the order too much: make
still
traverses prerequisites in the same order and starts as many targets
with satisfied dependencies as possible.
Parallel example:
$ make -j4
a1
a2
a3
b1
b2
b3
c1
c2
c3
a
b
c
I added newlines where 1-second pauses visibly happen.
Note that in this example a1
does not depend on
c2
. But c2
practically always starts execution after a1
finishes.
The “only” way to run a1
and c2
in parallel is to run
make
with at least -j8
. Which is a lot.
Or do something with the system that stalls task execution for indefinite amount of time (like, adding various nice levels or put system under high memory or CPU pressure).
Very occasionally already stressed system naturally gets into
unusual task execution order. You get the one-off failure and
struggle to repeat it ever again. Which makes it very hard to
test the fix unless you know where exactly to put the sleep
command to make it more reproducible.
An old idea
Even then it was clear that CPU count per device will only increase
and it will be increasingly painful to work with sequentially built
projects :) Bugs will come back again and again on you the more cores
you throw at the Makefile
s.
I had a silly idea back then (post in russian):
what if we arbitrarily reorder the prerequisites in Makefile
s? Or
maybe even trace spawned processes to know for sure what files targets
access? That might allow us to weed out most of the parallel bugs with
some sort of stress test on a low-core machine.
Fast forward 11 years I attempted to enable build parallelism by default in nixpkgs. A few packages still had some issues.
I recalled the idea and tried to implement target random shuffle within
GNU make
!
Better reproducer: make --shuffle
Initial idea was very simple: pick target order at Makefile
parse time and reshuffle the lists randomly. To pick an example
above one of the example shuffles would be:
$ cat Makefile
all: c b a
b: b2 b1 b3
a: a3 a2 a1
c: c1 c2 c3
a b c a1 a2 a3 b1 b2 b3 c1 c2 c3:; @echo $@ && sleep 1
I wrote the proof of concept and proposed it to GNU make community.
The example run of patched make
shows less determinism now:
$ ~/dev/git/make/make --shuffle -j4
c2
c3
c1
a2
a1
b2
a3
b3
b1
c
b
a
Paul did not seem to object too much to the idea and pointed out
that implementation will break more complex Makefile
s as there
is a simple way to refer to individual prerequisites by number.
To pick Paul’s example:
%.o : %.c
$(CC) $(CFLAGS) -c -o $@ $<
foo.o: foo.c foo.h bar.h baz.h
#
foo%: arg%1 arg%2 arg%3 arg%4
$< $(word 3,$^) $(word 2,$^) $(word 4,$^) bld
In both cases syntactic reshuffling breaks the build rules by passing wrong file name.
To fix it I came up with a way to store two orders at the same time: syntactic and shuffled and posted patch as https://lists.gnu.org/archive/html/bug-make/2022-02/msg00042.html.
Running make --shuffle
on real projects
While I was waiting for the feedback I ran the build tests against
nixpkgs
packages.
First, I almost instantly got build failures on the projects that
already explicitly disable parallel builds in nixpkgs
to avoid known
failures: groff
, source-highlight
, portaudio
, slang
, gnu-efi
,
bind
, pth
, libomxil
, dhcp
, directfb
, doxygen
, gpm
, judy
and a few others. That was a good sign.
A bit later I started getting failures I did not encounter before in
ghc
(!), gcc
(!!), automake
(!!!), pulseaudio
,
libcanberra
, many ocaml
and some perl
packages.
All the failures looked genuine missing dependencies. For example
gcc
’s libgfortran
is missing a libquadmath
build dependency.
It is natural not to encounter it in real world as libquadmath
is
usually built along with other small runtimes way before g++
or
gfortran
is ready.
Fun fact: while running the build I stumbled on a GNU make
bug
not related to my change:
https://lists.gnu.org/archive/html/bug-make/2022-02/msg00037.html.
The following snippet tricks GNU make
to loop for a while until
it crashes with argument list exhaustion (or inode exhaustion in
/tmp
):
$ printf 'all:\n\techo $(CC)' | ./make -sf -
<hung>
This bug is not present in any releases yet. And hopefully will not be.
I’d like to land the --shuffle
change upstream in some form before
sending bug reports and trivial fixes to upstream projects.
How you can test it
If you are keen to try this shuffling mode on your make
-based
projects (be it manually written, automake
-based or cmake
-based)
here is a rough instruction to do it:
- install GNU make 4.4
- use it as
make --shuffle <your-typical-make-arguments>
against your project (or set it viaGNUMAKEFLAGS=--shuffle
environment variable) - check if the build succeeds, run it a few times
Both sequential and parallel modes should work fine. I suggest trying both. The shuffling overhead should be negligible.
How do failures look like
When build fails it reports the shuffling mode and seed used. Let’s try
it on a concrete cramfsswap
example:
$ git clone https://github.com/julijane/cramfsswap.git
$ cd cramfsswap
$ ~/dev/git/make/make clean && ~/dev/git/make/make
rm -f cramfsswap
gcc -Wall -g -O -o cramfsswap -lz cramfsswap.c
strip cramfsswap
$ ~/dev/git/make/make clean && ~/dev/git/make/make
rm -f cramfsswap
strip cramfsswap
strip: 'cramfsswap': No such file
make: *** [Makefile:10: strip] Error 1 --shuffle=1645603370
Here we see a successful run and a failed run. Failed run reports
specific seed that might trigger the failure: --shuffle=1645603370
.
We can use this seed explicitly:
$ ~/dev/git/make/make --shuffle=1645603370
strip cramfsswap
strip: 'cramfsswap': No such file
make: *** [Makefile:10: strip] Error 1 --shuffle=1645603370
$ ~/dev/git/make/make --shuffle=1645603370
strip cramfsswap
strip: 'cramfsswap': No such file
make: *** [Makefile:10: strip] Error 1 --shuffle=1645603370
$ ~/dev/git/make/make --shuffle=1645603370
strip cramfsswap
strip: 'cramfsswap': No such file
make: *** [Makefile:10: strip] Error 1 --shuffle=1645603370
Note how ordering is preserved across the runs with fixed seed.
Parting words
Implementing the shuffling idea took a weekend. I should have tried it earlier :) The result instantly found existing and new missing dependencies in a small subset of real projects. Some of these failures are very hard to trigger otherwise.
UPDATE: --shuffle
was upstreamed and released as part of
GNU make 4.4 release (see announcement).
Have fun!