Unexpected runtime dependencies in nixpkgs
Intro
nix uses a bit unusual way to detect runtime dependencies: it scans
the build result for its hash inputs used during build and does not
require manual specification of the dependencies.
For example, if we build a trivial package that just prints a string into
the file it automatically pulls in that strings into runtime closure:
$ nix build --impure --expr 'with import <nixpkgs> {}; runCommand "foo" {} \
"echo ${re2c}/bin/re2c > $out"'
This command builds a package that consists of a single $out file
(symlinked to ./result) which contains absolute path to re2c binary.
It’s not a complicated package:
$ cat result
/nix/store/pj9cdgj07iz3cj88rywapx2lfxfmdqd3-re2c-3.1/bin/re2c
And yet if we look at full closure of it’s inferred dependencies it
already has re2c and all it’s runtime dependencies:
$ nix path-info -r ./result
/nix/store/gnzwqa9df994g01yw5x75qnbl1rhp9ds-libunistring-1.1
/nix/store/h3aw16j1c54jv8s39yvdhpfcx3538jwi-libidn2-2.3.4
/nix/store/kv0v4h5i911gj39m7n9q10k8r8gbn3sa-xgcc-12.3.0-libgcc
/nix/store/905gkx2q1pswixwmi1qfhfl6mik3f22l-glibc-2.37-8
/nix/store/s2pgr9iqj60mfnmabixnqacxl4bzb408-gcc-12.3.0-libgcc
/nix/store/gi26p79iq8jrw51irq5x82c2cqlgicxi-gcc-12.3.0-lib
/nix/store/pj9cdgj07iz3cj88rywapx2lfxfmdqd3-re2c-3.1
/nix/store/amiqn0hvnmrfcz2s8b47fb770v8hy9ny-foo
Such automatic scanning method has both false positives and false negatives. Say, if we are to compress the file reference might disappear:
$ nix build --impure --expr 'with import <nixpkgs> {}; runCommand "foo" {} \
"echo ${re2c}/bin/re2c | bzip2 > $out"'
$ nix path-info -r ./result
/nix/store/lwx722djnam7yjy439b9k6czb55h707q-foo
Missing reference detection is a bug here (false negative). False negatives can be worked around by explicitly adding plain text references into the file just like we did in the original example.
But I would like to talk about false positives today.
The problem
Let’s jump start from the motivating example: right now nix package
has header only nlohmann/json dependency in its runtime closure:
$ nix path-info -r nixpkgs#nix | fgrep nlohmann_json
/nix/store/5xih6daf5g3hpa0wc5vs2cgrhakn4s0j-nlohmann_json-3.11.2
There is nothing in nlohmann/json useful for nix runtime:
$ find /nix/store/5xih6daf5g3hpa0wc5vs2cgrhakn4s0j-nlohmann_json-3.11.2 -type f | unnix
/<<NIX>>/nlohmann_json-3.11.2/include/nlohmann/detail/conversions/from_json.hpp
/<<NIX>>/nlohmann_json-3.11.2/include/nlohmann/detail/conversions/to_chars.hpp
/<<NIX>>/nlohmann_json-3.11.2/include/nlohmann/detail/conversions/to_json.hpp
...
/<<NIX>>/nlohmann_json-3.11.2/include/nlohmann/ordered_map.hpp
/<<NIX>>/nlohmann_json-3.11.2/share/cmake/nlohmann_json/nlohmann_jsonTargets.cmake
/<<NIX>>/nlohmann_json-3.11.2/share/cmake/nlohmann_json/nlohmann_jsonConfig.cmake
/<<NIX>>/nlohmann_json-3.11.2/share/cmake/nlohmann_json/nlohmann_jsonConfigVersion.cmake
/<<NIX>>/nlohmann_json-3.11.2/share/pkgconfig/nlohmann_json.pc
These are a few headers and cmake and pkg-config plumbing.
So why does nix retain those then?
Debugging the details
To figure out where the references come from we can grep the package for a raw string and see how it gets there:
$ LANG=C grep -R $(nix-build --no-link '<nixpkgs>' -A nlohmann_json) $(nix-build --no-link '<nixpkgs>' -A nix.out)
grep: /nix/store/vxx4c6gc2zgfw870b40f06dmli6ljp34-nix-2.17.0/bin/nix: binary file matches
grep: /nix/store/vxx4c6gc2zgfw870b40f06dmli6ljp34-nix-2.17.0/bin/nix-build: binary file matches
grep: /nix/store/vxx4c6gc2zgfw870b40f06dmli6ljp34-nix-2.17.0/bin/nix-channel: binary file matches
grep: /nix/store/vxx4c6gc2zgfw870b40f06dmli6ljp34-nix-2.17.0/bin/nix-collect-garbage: binary file matches
grep: /nix/store/vxx4c6gc2zgfw870b40f06dmli6ljp34-nix-2.17.0/bin/nix-copy-closure: binary file matches
...
Here we see that even nix binary itself retains nlohmann/json
reference. With strings tool from GNU binutils we can check how the
reference looks like:
$ nix shell nixpkgs#binutils-unwrapped
$$strings $(nix-build --no-link '<nixpkgs>' -A nix.out)/bin/nix | grep $(nix-build --no-link '<nixpkgs>' -A nlohmann_json)
/nix/store/5xih6daf5g3hpa0wc5vs2cgrhakn4s0j-nlohmann_json-3.11.2/include/nlohmann/json.hpp
/nix/store/5xih6daf5g3hpa0wc5vs2cgrhakn4s0j-nlohmann_json-3.11.2/include/nlohmann/detail/output/serializer.hpp
/nix/store/5xih6daf5g3hpa0wc5vs2cgrhakn4s0j-nlohmann_json-3.11.2/include/nlohmann/detail/conversions/to_chars.hpp
/nix/store/5xih6daf5g3hpa0wc5vs2cgrhakn4s0j-nlohmann_json-3.11.2/include/nlohmann/detail/input/lexer.hpp
/nix/store/5xih6daf5g3hpa0wc5vs2cgrhakn4s0j-nlohmann_json-3.11.2/include/nlohmann/detail/iterators/iter_impl.hpp
/nix/store/5xih6daf5g3hpa0wc5vs2cgrhakn4s0j-nlohmann_json-3.11.2/include/nlohmann/detail/input/json_sax.hpp
/nix/store/5xih6daf5g3hpa0wc5vs2cgrhakn4s0j-nlohmann_json-3.11.2/include/nlohmann/detail/iterators/iteration_proxy.hpp
/nix/store/5xih6daf5g3hpa0wc5vs2cgrhakn4s0j-nlohmann_json-3.11.2/include/nlohmann/detail/input/parser.hpp
Note how absolute header paths are embedded into nix binary. It
happens via __FILE__ macro expansion in C++ code.
//// in nlohmann_json/include/nlohmann.json.h
class basic_json {
// ...
template<typename T, typename... Args> static T* create(Args&& ... args)
{
// ...
JSON_ASSERT(obj != nullptr);
// ...
}
//// in nlohmann_json/include/nlohmann/detail/macro_scope.hpp
// allow overriding assert
#if !defined(JSON_ASSERT)
#include <cassert> // assert
#define JSON_ASSERT(x) assert(x)
#endif
// in glibc/include/assert.h
# if defined __cplusplus
# define assert(expr) \
(static_cast <bool> (expr) \
? void (0) \
: __assert_fail (#expr, __FILE__, __LINE__, __ASSERT_FUNCTION))
// ...In this case any code that happens to instantiate basic_json::create()
function will embed __FILE__ definition as part of the assert() call
and will embed absolute path to the store.
Absolute path to the store will retain nlohmann/json in runtime closure.
It’s a completely redundant runtime dependency.
The workaround
Initially I though of using something like nukeReferences
file-mangling tool to wipe out unexpected references. But I was not sure
where should I plug this hammer: ideally any user of nlohmann_json
package should run it just in case. And patching files after-the-fact is
always prone to break something: be it broken file checksums, broken
sort ordering, unrelated string sharing due to identical code folding.
I wanted something milder: ideally tell gcc not to emit problematic
paths at all. And gcc provides exactly that mechanism!
gcc has a way to slightly mangle absolute paths used by __FILE__ via
-fmacro-prefix-map=old=new set of options: https://gcc.gnu.org/onlinedocs/gcc/Preprocessor-Options.html.
Its main use case is to untangle final binaries from the temporary
directory sources are built against:
-fmacro-prefix-map=/tmp/autogenerated/foo=/usr/src/foo.
The problem feels vaguely similar: we want to avoid any mention of
source directories in the final output. I tried to inject
-fmacro-prefix-map= for every single build input used by nixpkgs as:
--- a/pkgs/build-support/cc-wrapper/setup-hook.sh
+++ b/pkgs/build-support/cc-wrapper/setup-hook.sh
@@ -65,15 +65,29 @@
# function is guaranteed to be exactly the same.
ccWrapper_addCVars () {
# See ../setup-hooks/role.bash
- local role_post
+ local role_post mangled_store map_flag var
getHostRoleEnvHook
+ var=NIX_CFLAGS_COMPILE${role_post}
+
if [ -d "$1/include" ]; then
- export NIX_CFLAGS_COMPILE${role_post}+=" -isystem $1/include"
+ export $var+=" -isystem $1/include"
fi
if [ -d "$1/Library/Frameworks" ]; then
- export NIX_CFLAGS_COMPILE${role_post}+=" -iframework $1/Library/Frameworks"
+ export $var+=" -iframework $1/Library/Frameworks"
+ fi
+
+ # Try hard to avoid hardcoding of -dev outputs via __FILE__.
+ # THe typical examples are: asserts in nlohmann_json leaking into
+ # nix executable closure, asserts from lttng-ust leaking into
+ # pipewire.
+ mangled_store=$(printf "%s" "$1" | sed -e "s|$NIX_STORE/[a-z0-9]\{32\}-|$NIX_STORE/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-|g")
+ map_flag="-fmacro-prefix-map=$1=$mangled_store"
+ # As it's a long flag try hard not to introduce duplicates as
+ # environment gets exhausted otherwise for large packages like qemu.
+ if [[ ${!var-} != *" $map_flag"* ]]; then
+ export $var+=" $map_flag"
fi
}The change adds a bunch of compiler options on form of:
-fmacro-prefix-map=/nix/store/5xih6daf5g3hpa0wc5vs2cgrhakn4s0j-nlohmann_json-3.11.2=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-nlohmann_json-3.11.2
That way __FILE__ values still have a reasonable form: we can see which
package they come from. But we can’t use the path in any meaningful way
as they don’t refer real files anymore. Does not look too bad.
I also had to avoid attempts at inserting multiple identical values into
the final NIX_CFLAGS_COMPILE variable. Otherwise, some particularly
large packages like qemu exhaust all the environment space and fail to
run executables. We might want to do the same for the rest of variables:
we add most of the options options thrice to NIX_CFLAGS_COMPILE.
Proposed the workaround as https://github.com/NixOS/nixpkgs/pull/255192.
The results
Did the workaround help? I placed the patch into ~/n checkout of
nixpkgs repo.
$ LANG=C grep -R $(nix-build --no-link ~/n -A nlohmann_json) $(nix-build --no-link ~/n -A nix.out)
<empty>
Yay! At the very least we got rid of the unneeded reference.
nix was a good example of a superfluous and small harmless dependency.
Are there packages where this change would be more pronounced? My
ulterior motive was to fix something similar in lttng-ust.dev. That
one was not just a set of headers. It also contains python interpreter
as a dependency:
$ nix path-info -r $(nix-build '<nixpkgs>' -A lttng-ust.dev) |& unnix
/<<NIX>>/libunistring-1.1
/<<NIX>>/libidn2-2.3.4
/<<NIX>>/xgcc-12.3.0-libgcc
/<<NIX>>/glibc-2.37-8
/<<NIX>>/zlib-1.2.13
/<<NIX>>/sqlite-3.42.0
/<<NIX>>/numactl-2.0.16
/<<NIX>>/lttng-ust-2.13.1
/<<NIX>>/expat-2.5.0
/<<NIX>>/ncurses-6.4
/<<NIX>>/liburcu-0.14.0
/<<NIX>>/xz-5.4.4
/<<NIX>>/openssl-3.0.10
/<<NIX>>/bzip2-1.0.8
/<<NIX>>/libffi-3.4.4
/<<NIX>>/libxcrypt-4.4.36
/<<NIX>>/liburcu-0.14.0-dev
/<<NIX>>/gcc-12.3.0-libgcc
/<<NIX>>/gcc-12.3.0-lib
/<<NIX>>/readline-8.2p1
/<<NIX>>/bash-5.2-p15
/<<NIX>>/mailcap-2.1.53
/<<NIX>>/gdbm-1.23
/<<NIX>>/tzdata-2023c
/<<NIX>>/python3-3.10.12
/<<NIX>>/lttng-ust-2.13.1-bin
/<<NIX>>/lttng-ust-2.13.1-dev
One of the frequent users of lttng-ust.dev is pipewire:
$ nix path-info -r $(nix-build '<nixpkgs>' -A pipewire.out) | fgrep lttng
/nix/store/23jh1m6irhvr16zjmrvy2cnpgz7yi6gj-lttng-ust-2.13.1
/nix/store/xznqsvr1la1xnfnzia2yvnicfz03yjqb-lttng-ust-2.13.1-bin
/nix/store/fckk2ncjnxdw1xsx8v8rxjnmhldbx8pr-lttng-ust-2.13.1-dev
And I’m glad to announce that it’s also gone with the patch above:
$ nix path-info -r $(nix-build ~/n -A pipewire.out) | fgrep lttng
/nix/store/bv3i3qjphzgfzmmdhws9nhwz76qscy61-lttng-ust-2.13.1
Here is the closure size difference. Before:
$ nix path-info -rsSh $(nix-build '<nixpkgs>' -A pipewire.out) | nl | tail -n1
219 /nix/store/ff0w34nr807in3b1swmqklxy9g9v5hg9-pipewire-0.3.79 1.4M 543.4M
After:
$ nix path-info -rsSh $(nix-build ~/n -A pipewire.out) | nl | tail -n1
207 /nix/store/hl4dffvc73nsh3zfbji0y7h9lcnrk14b-pipewire-0.3.79 1.3M 452.0M
12 dependencies and ~90MB are just gone! It’s a 543.4M -> 452.0M
reduction, ~20% of the whole output)
Parting words
Looking at the runtime closure size is always fun. There are many other
low-hanging fruit in nixpkgs to pick. Most of the time just splitting
an extra dev output is enough to slim down the output.
__FILE__ is a tricky macro that makes nix builds to leak out
unnecessary references into final closure.
-fmacro-prefix-map= seems to be a robust workaround for __FILE__
induced leaks. The flag is supported in both gcc and clang for quite
a while.
Even nix package itself had a redundant dependency in it’s final
closure.
https://github.com/NixOS/nixpkgs/pull/255192 should fix it in nixpkgs.
And for some packages like pipewire the closure size reduction is
substantial and is taking about 20% of all closure size. Not bad for a
single extra compiler option.
Have fun!