Bisects all the way down
Bisect is a great tool to nail down regression in a project like
linux where you usually have no slightest idea what broke your
suspend,
boot,
video,
more video,
audio,
TCP,
tun,
toaster and
whatnot.
Examples of other complex projects are firefox, gcc, glibc
and … complete linux distributions!
Rolling release linux distributions tend to have frequent incremental
updates where each update produces mostly working system. And when
update breaks roll back is cheap and feedback loop to upstream is fast.
The key for rolling release system to work smoothly is to be able to
narrow down quickly on faulty component to be able to isolate it.
A typical example of distribution regression would be firefox failure
to render fonts correctly. What package update caused the regression?
Sometimes we might guess easily if font-related package updated recently
and we might try rolling it back to verify.
But sometimes it’s a compiler or even build environment bug (like bash
miscompilation caused by a gcc bug). In
that case it will take a while until we get to the culprit. Or it might
be a glibc regression which is not trivial to rollback at all.
Wouldn’t it be nice to mechanically bisect package repository the same
way we do projects like linux?
Today’s real world example is
nixpkgs repository. nixpkgs is a
package tree available via nix tool.
One of nix fancy features is the ability to install packages (and
it’s dependencies) as an unprivileged user. Or even fetch the package
into local cache for one-off use without installing it. Another feature
is precise hermetic dependencies tracking.
Example one-off package usage session:
# existing systemwide package (installed outside nixpkgs)
$ re2c -v
re2c 2.2
# fetching version from nixpkgs:
$ nix-shell -p re2c
[nix-shell:~]$ re2c -v
re2c 2.1.1
nixpkgs repository has a few branches of different stability. Most
frequently encountered are:
staging: things are truly bleeding edge there, no binary cache, very fresh versionsstaging-next: things are being stabilized here to build and pass tests before merge tomastermaster: things mostly build there and have binary cache of most packages
More branching details are at https://github.com/NixOS/rfcs/blob/master/rfcs/0026-staging-workflow.md.
Back to the example at hand. I was foolish enough to try building
ccache out of staging branch (I planned to update it’s version
there) and got the build failure:
# fetch repository:
$ cd /tmp
$ git clone https://github.com/NixOS/nixpkgs.git
$ cd nixpkgs
$ git checkout staging
# build ccache:
$ nix-build -A ccache
... <a few minutes later>
make[1]: *** [Makefile:998: fig2dev/Makefile] Error 1
make[1]: Leaving directory '/build/transfig.3.2.4'
make: [Makefile:1006: Makefiles] Error 2 (ignored)
make includes
including in ./fig2dev...
make[1]: Entering directory '/build/transfig.3.2.4/fig2dev'
make[1]: *** No rule to make target 'includes'. Stop.
make[1]: Leaving directory '/build/transfig.3.2.4/fig2dev'
make: *** [Makefile:1064: includes] Error 2
error: builder for '/nix/store/149n49648mzf1c9g199jhq9qi6x35c9v-transfig-3.2.4.drv' failed with exit code 2;
Here we fetched git repository of the whole nixpkgs and tried
to build ccache along with all of it’s dependencies. One of them
(transfig) failed to build.
transfig happens to use imake build system. I knew nothing about
it and had no idea how to debug it. I looked at the generated
Makefile and still had no idea why (or if) things are wrong there.
Having failed at understanding the failure mode I checked if master
branch was able to build transfig (most packages ar normally expected
to build on master):
$ git checkout master
$ nix-build -A transfig
/nix/store/pfzhccslyzgl0wl127yahrk902gj54xs-transfig-3.2.4
$ nix-build -A transfig --check
... <build log>
/nix/store/pfzhccslyzgl0wl127yahrk902gj54xs-transfig-3.2.4
Built fine. --check forces local rebuild instead of using binary
available in cache. I used it to get a build log from successful package
and to make sure I don’t have something else horribly broken in my
build environment.
Now I could bisect against master and staging range:
$ git bisect start staging master
$ git bisect run nix-build -A transfig
... < a few minutes later>
commit 8675ca0e947f7e847d82828e6bfd4d08822c489c
Date: Wed Aug 4 08:29:53 2021 +0000
xorg.xorgcffiles: 1.0.6 -> 1.0.7
https://lists.x.org/archives/xorg-announce/2021-August/003105.html
Just two shell commands and we are there! The commit looks vaguely
related to imake. Reverting:
$ git bisect reset
$ git checkout staging
$ git revert 8675ca0e947f7e847d82828e6bfd4d08822c489c # minor conflict fix
$ nix-build -A transfig
...
/nix/store/7z7q1a9176cy0adcs98l4dc8rh9ks4ki-transfig-3.2.4
Revert worked. I looked at the difference between 1.0.6 and
1.0.7 sources and found nothing obviously broken. I still had no
idea what I was looking at.
We can bisect xorg-cf-files project as well! For that we can point
our xorg.xorgcffiles package to local checkout we could modify:
--- a/pkgs/servers/x11/xorg/overrides.nix
+++ b/pkgs/servers/x11/xorg/overrides.nix
@@ -841,6 +841,7 @@ self: super:
});
xorgcffiles = super.xorgcffiles.overrideAttrs (attrs: {
+ src = /tmp/cf; # added line
postInstall = lib.optionalString stdenv.isDarwin ''
substituteInPlace $out/lib/X11/config/darwin.cf --replace "/usr/bin/" ""
'';Let’s prepare source tree in /tmp/cf as if it was just from
tarball:
$ cd /tmp
$ git clone https://gitlab.freedesktop.org/xorg/util/cf.git
$ cd cf
$ ./autogen.sh
Now we can build transfig against local checkout:
$ nix-build /tmp/nixpkgs -A transfig
... a few seconds later
make: *** No rule to make target 'install'. Stop.
Same problem.
nix will rebuild xorg-cf-files from local checkout and then will
rebuild all dependencies that need to change automatically. No need to
manually calculate the effect of the update. Sometimes it means a lot of
rebuilds (say, if you bisect gcc itself). But in our case
xorg-cf-files dependencies are just imake and transfig:
$ nix why-depends -f . --derivation transfig xorg.xorgcffiles
/nix/store/...-transfig-3.2.4.drv
→ /nix/store/...-imake-1.0.8.drv
→ /nix/store/...-xorg-cf-files-1.0.7.drv
Both are tiny packages. Bisecting:
$ git bisect start xorg-cf-files-1.0.7 xorg-cf-files-1.0.6
$ git bisect run nix-build /tmp/nixpkgs -A transfig
... a second later
commit d47131ed97ee491bb883c29ec0b106e8d5acfcd3
Date: Thu Jul 5 10:42:09 2018 -0400
linux: Update LinuxDistribution == LinuxRedHat section
That was simpler than I thought! But still very confusing :) The
upstream
commit
is literally a few defines under seemingly unrelated #if:
--- a/linux.cf
+++ b/linux.cf
@@ -190,7 +190,13 @@ InstallNamedTargetNoClobber(install,file.ad,$(INSTAPPFLAGS),$(XAPPLOADDIR),class
#endif /* LinuxDebian */
#if LinuxDistribution == LinuxRedHat
-#define FSUseSyslog YES
+# define FSUseSyslog YES
+# define BuildRman NO
+# define BuildHtmlManPages NO
+# define ProjectRoot /usr
+# define ManPath /usr/share/man
+# define XAppLoadDir /usr/share/X11/app-defaults
+# define ConfigDir /usr/share/X11/config
#endif
#ifndef HasDevRandomnix does not use /usr host OS hierarchy (in my case host OS is
gentoo) and always uses /nix/store path instead. Thus I would expect
LinuxDistribution to be something different from LinuxRedHat
(unless it’s a way for cf to say “any linux”).
Let’s check how LinuxDistribution gets defined. It’s hidden in
imake itself. We can extract unpatched and patched imake right
from nixpkgs:
$ cd /tmp/nixpkgs
$ nix-shell -A xorg.imake
# unpack vanilla source:
$$ unpackPhase
unpacking source archive /nix/store/dfjcsfxf15zxrbcw62ml1zbczm8zf7d0-imake-1.0.8.tar.bz2
source root is imake-1.0.8
setting SOURCE_DATE_EPOCH to timestamp 1552778797 of file imake-1.0.8/INSTALL
# apply nixkpgs-specific patches:
$$ cd imake-1.0.8
$$ patchPhase
applying patch /nix/store/9hl5c2sg2n6yfia0hy06wdf7yiry4arq-imake.patch
patching file imake.c
applying patch /nix/store/kmhjr434iv05bgazd5xbzwygn59pl9k0-imake-cc-wrapper-uberhack.patch
patching file imake.c
Here is the unpatched bit of LinuxRedHat definition from
https://gitlab.freedesktop.org/xorg/util/imake/-/blob/master/imake.c#L1046:
#if defined CROSSCOMPILE || defined linux || defined(__GLIBC__)
static void
get_distrib(FILE *inFile)
{
struct stat sb;
static const char* suse = "/etc/SuSE-release";
static const char* redhat = "/etc/redhat-release";
static const char* debian = "/etc/debian_version";
fprintf (inFile, "%s\n", "#define LinuxUnknown 0");
fprintf (inFile, "%s\n", "#define LinuxSuSE 1");
fprintf (inFile, "%s\n", "#define LinuxCaldera 2");
fprintf (inFile, "%s\n", "#define LinuxCraftworks 3");
fprintf (inFile, "%s\n", "#define LinuxDebian 4");
fprintf (inFile, "%s\n", "#define LinuxInfoMagic 5");
fprintf (inFile, "%s\n", "#define LinuxKheops 6");
fprintf (inFile, "%s\n", "#define LinuxPro 7");
fprintf (inFile, "%s\n", "#define LinuxRedHat 8");
fprintf (inFile, "%s\n", "#define LinuxSlackware 9");
fprintf (inFile, "%s\n", "#define LinuxTurbo 10");
fprintf (inFile, "%s\n", "#define LinuxWare 11");
fprintf (inFile, "%s\n", "#define LinuxYggdrasil 12");
# ifdef CROSSCOMPILE
if (CrossCompiling) {
fprintf (inFile, "%s\n",
"#define DefaultLinuxDistribution LinuxUnknown");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName Unknown");
return;
}
# endif
if (lstat (suse, &sb) == 0) {
fprintf (inFile, "%s\n", "#define DefaultLinuxDistribution LinuxSuSE");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName SuSE");
return;
}
if (lstat (redhat, &sb) == 0) {
fprintf (inFile, "%s\n", "#define DefaultLinuxDistribution LinuxRedHat");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName RedHat");
return;
}
if (lstat (debian, &sb) == 0) {
fprintf (inFile, "%s\n", "#define DefaultLinuxDistribution LinuxDebian");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName Debian");
/* You could also try to get the version of the Debian distrib by looking
* at the content of /etc/debian_version */
return;
}
/* what's the definitive way to tell what any particular distribution is? */
fprintf (inFile, "%s\n", "#define DefaultLinuxDistribution LinuxUnknown");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName Unknown");
/* would like to know what version of the distribution it is */
}Distribution flavor is defined by presence of /etc/redhat-release
file on disk. But I don’t have it! I should have gotten LinuxUnknown.
The culprit is in that suspicious
/nix/store/9hl5c2sg2n6yfia0hy06wdf7yiry4arq-imake.patch patch we see
in patchPhase log. It turns the code above to the following:
#if defined CROSSCOMPILE || defined linux || defined(__GLIBC__)
static void
get_distrib(FILE *inFile)
{
#if 0
struct stat sb;
static const char* suse = "/etc/SuSE-release";
static const char* redhat = "/etc/redhat-release";
static const char* debian = "/etc/debian_version";
fprintf (inFile, "%s\n", "#define LinuxUnknown 0");
fprintf (inFile, "%s\n", "#define LinuxSuSE 1");
fprintf (inFile, "%s\n", "#define LinuxCaldera 2");
fprintf (inFile, "%s\n", "#define LinuxCraftworks 3");
fprintf (inFile, "%s\n", "#define LinuxDebian 4");
fprintf (inFile, "%s\n", "#define LinuxInfoMagic 5");
fprintf (inFile, "%s\n", "#define LinuxKheops 6");
fprintf (inFile, "%s\n", "#define LinuxPro 7");
fprintf (inFile, "%s\n", "#define LinuxRedHat 8");
fprintf (inFile, "%s\n", "#define LinuxSlackware 9");
fprintf (inFile, "%s\n", "#define LinuxTurbo 10");
fprintf (inFile, "%s\n", "#define LinuxWare 11");
fprintf (inFile, "%s\n", "#define LinuxYggdrasil 12");
# ifdef CROSSCOMPILE
if (CrossCompiling) {
fprintf (inFile, "%s\n",
"#define DefaultLinuxDistribution LinuxUnknown");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName Unknown");
return;
}
# endif
if (lstat (suse, &sb) == 0) {
fprintf (inFile, "%s\n", "#define DefaultLinuxDistribution LinuxSuSE");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName SuSE");
return;
}
if (lstat (redhat, &sb) == 0) {
fprintf (inFile, "%s\n", "#define DefaultLinuxDistribution LinuxRedHat");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName RedHat");
return;
}
if (lstat (debian, &sb) == 0) {
fprintf (inFile, "%s\n", "#define DefaultLinuxDistribution LinuxDebian");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName Debian");
/* You could also try to get the version of the Debian distrib by looking
* at the content of /etc/debian_version */
return;
}
#endif
/* what's the definitive way to tell what any particular distribution is? */
fprintf (inFile, "%s\n", "#define DefaultLinuxDistribution LinuxUnknown");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName Unknown");
/* would like to know what version of the distribution it is */
}Note now #if 0 removes not just #define DefaultLinuxDistName LinuxRedHat but also #define LinuxUnknown 0 and #define LinuxRedHat 8. In diff form imake output change is:
@@ -1,3 +1 @@
-#define LinuxUnknown 0
-#define LinuxRedHat 8
#define DefaultLinuxDistName UnknownIs it a big deal? How does it change #if LinuxDistribution == LinuxRedHat condition? Let’s try two following examples:
$ printf "#define a 1\n#define b 2\n#if a == b\n EQUAL\n#else\n DIFFERENT\n#endif\n"
#define a 1
#define b 2
#if a == b
EQUAL
#else
DIFFERENT
$ printf "#if a == b\n EQUAL\n#else\n DIFFERENT\n#endif\n"
#if a == b
EQUAL
#else
DIFFERENT
Running the preprocessor:
$ printf "#define a 1\n#define b 2\n#if a == b\n EQUAL\n#else\n DIFFERENT\n#endif\n" | gcc -E -
DIFFERENT
$ printf "#if a == b\n EQUAL\n#else\n DIFFERENT\n#endif\n" | gcc -E -
EQUAL
According to great imake intro at
http://www.snake.net/software/imake-stuff/config-X11R4.pdf it’s one
of the common imake pitfalls: in integer evaluation contexts unknown
symbols get turned onto zeros.
$ printf "#if undef == 0\n ZERO\n#endif\n"
#if undef == 0
ZERO
#endif
$ printf "#if undef == 0\n ZERO\n#endif\n" | gcc -E -
ZERO
Thus the fix is trivial: don’t omit any enum definition as other
packages using imake actually rely on them being present. Possible
fix:
--- a/imake.c
+++ b/imake.c
@@ -998,121 +998,121 @@ get_libc_version(FILE *inFile)
#if defined CROSSCOMPILE || defined linux || defined(__GLIBC__)
static void
get_distrib(FILE *inFile)
{
-#if 0
struct stat sb;
static const char* suse = "/etc/SuSE-release";
static const char* redhat = "/etc/redhat-release";
static const char* debian = "/etc/debian_version";
fprintf (inFile, "%s\n", "#define LinuxUnknown 0");
fprintf (inFile, "%s\n", "#define LinuxSuSE 1");
fprintf (inFile, "%s\n", "#define LinuxCaldera 2");
fprintf (inFile, "%s\n", "#define LinuxCraftworks 3");
fprintf (inFile, "%s\n", "#define LinuxDebian 4");
fprintf (inFile, "%s\n", "#define LinuxInfoMagic 5");
fprintf (inFile, "%s\n", "#define LinuxKheops 6");
fprintf (inFile, "%s\n", "#define LinuxPro 7");
fprintf (inFile, "%s\n", "#define LinuxRedHat 8");
fprintf (inFile, "%s\n", "#define LinuxSlackware 9");
fprintf (inFile, "%s\n", "#define LinuxTurbo 10");
fprintf (inFile, "%s\n", "#define LinuxWare 11");
fprintf (inFile, "%s\n", "#define LinuxYggdrasil 12");
+#if 0
# ifdef CROSSCOMPILE
if (CrossCompiling) {
fprintf (inFile, "%s\n",
"#define DefaultLinuxDistribution LinuxUnknown");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName Unknown");
return;
}
# endif
if (lstat (suse, &sb) == 0) {
fprintf (inFile, "%s\n", "#define DefaultLinuxDistribution LinuxSuSE");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName SuSE");
return;
}
if (lstat (redhat, &sb) == 0) {
fprintf (inFile, "%s\n", "#define DefaultLinuxDistribution LinuxRedHat");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName RedHat");
return;
}
if (lstat (debian, &sb) == 0) {
fprintf (inFile, "%s\n", "#define DefaultLinuxDistribution LinuxDebian");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName Debian");
/* You could also try to get the version of the Debian distrib by looking
* at the content of /etc/debian_version */
return;
}
#endif
/* what's the definitive way to tell what any particular distribution is? */
fprintf (inFile, "%s\n", "#define DefaultLinuxDistribution LinuxUnknown");
fprintf (inFile, "%s\n", "#define DefaultLinuxDistName Unknown");
/* would like to know what version of the distribution it is */
}We move #if 0 below to always define #define LinuxRedHat 8 and
friends.
Original
imake.patch
was added in 2006. This makes it 15 years old bug.
The fix is pending at https://github.com/NixOS/nixpkgs/pull/135414
pull request. Fixing imake immediately broke xcruiser, xvkbd
and xxkb packages (reviewers++). It was failing for the lack of path
overrides that were now exposed on non-LinuxRedHat systems. We will
probably see more subtle breakages. I hope future breakages will not be
as magic as this one.
Now I can test my ccache update against nixpkgs/staging.
Imake doc
shares a few amusing facts and subtle tips on how to work around certain
C preprocessor behaviours to force it to generate valid Makefile. For
example if you want cpp to print '#' you need to prepend it with …
a c comment!
$ printf '# Makefile comment\n'
# Makefile comment
$ printf '# Makefile comment\n' | gcc -traditional -E -
<stdin>:1:3: error: invalid preprocessing directive #Makefile
$ printf '/**/# Makefile comment\n' | gcc -traditional -E -
# Makefile comment
Parting words
nixpkgs makes it trivial to bisect faulty package updates on a
package level as you would normally do it on project level.
I found a few new things along the way:
nixdependency “resolution” is instant. Constructing dependency graph is so much faster than trying to search a path in existing graph (likegentoosportagedoes).nix-shellis a nice way to poke at package unpacking, building and installation steps.imakeis both fun and scary way to (ab)usecpreprocessor to generateMakefiles.
Have fun!