nixpkgs cross-compilation improvements
TL;DR
gcc
cross-compilers are now stripped!
For example nixpkgs/182513
decreases wine
closure (or any other pkgsCross.*.stdenv
)
by a ~1GB.
nixpkgs/182513 also fixes
stripping of static libraries. You can now remove existing dontStrip = true;
workarounds in nixpkgs
if you had to put them in to restore linkage.
For example nixpkgs/183484
decreases mingw
closure by 200MB.
With nixpkgs/181943 gcc
cross-compilers and cross-built gcc
s now enable expected features based
on target’s libc headers. Previously libc headers were not passed correctly.
That caused cross-gcc
and cross-build gcc
to assume too conservative
assumptions about libc like use of libssp
on targets or use of executable
stack support.
Story mode
I like cross-compilation. It’s a great way to peek at other CPU architectures’ properties without having to deal with real hardware.
Cross-compilation is fundamentally just a compilation. The compiler should emit code for that one CPU type. Should be a solved problem by now, right? If you ever tried to cross-compile something large you probably already know the complications that usually arise from it.
The problem
Scrolling through open nixpkgs
PRs I stopped on this one:
stdenv: lib{gmp,mpc,mpfr,isl}-stage3: isPower64 -> no -fstack-protector.
It looked like something I could review. A few month ago I fiddled with
stdenv bootstrap when I dealt
with glibc-2.35
update. If nothing else I knew stdenv
is a bit hard
to reason about when it comes to figuring out bootstrap dependency tower.
In the PR Adam Joseph shared the problem he was trying to address. Somewere
at bootstrap time on powerpc64le-linux
platform one of the intermediate
gcc
builds failed to link as:
/tmp/nix-build-gcc-10.3.0.drv-0/build/./prev-gcc/xg++ \
-o cc1plus \
cp/cp-lang.o ... main.o ... ../libdecnumber/libdecnumber.a ... -lz
/<<NIX>>/binutils-2.35.2/bin/ld: /<<NIX>>/mpfr-4.1.0/lib/libmpfr.a(mpfr-gmp.o):
(.toc+0x8): undefined reference to `__stack_chk_guard'
/<<NIX>>/binutils-2.35.2/bin/ld: /<<NIX>>/mpfr-4.1.0/lib/libmpfr.a(mpfr-gmp.o):
(.toc+0x8): undefined reference to `__stack_chk_guard'
...
Adam suggested disabling stack protector just for a few bootstrap
packages (mpfr
and the similar) to get past the errro. While it probably
gets the job done it also flags an assumption incompatibility between
compilers. It should not normally happen.
What is libssp?
What is that __stack_chk_guard
thing anyway? What is supposed to
provide it?
It has something to do with -fstack-protector*
set of options in
gcc
. Let’s pick a trivial void wr(char * p, char v){ *p = v; }
function and build it with and without stack protector to get a
feel of it:
"void wr(long * p, long v){ *p = v; }" | gcc -S -x c - -o - -fno-stack-protector -O2
$ printf
file "<stdin>"
.
.text4
.p2align
.globl wr, @function
.type wrwr:
.LFB0:
.cfi_startproc
movq %rsi, (%rdi) ; Our `*p = v;` code
ret
.cfi_endproc.LFE0:
, .-wr
.size wr"GCC: (GNU) 12.1.0"
.ident section .note.GNU-stack,"",@progbits .
The actual code takes 1 line here: movq %rsi, (%rdi)
. It stores
64-bit value at %rsi
register (long v
parameter) to memory pointed
by %rdi
(long * p
parameter). The rest is a bit of metadata to get
the code placed properly into the ELF
file.
Now let’s add stack protector code to it with -fstack-protector-all
option:
"void wr(long * p, long v){ *p = v; }" | gcc -S -x c - -o - -fstack-protector-all -O2
$ printf
file "<stdin>"
.
.text4
.p2align
.globl wr, @function
.type wrwr:
.LFB0:
.cfi_startproc$24, %rsp ; allocated a bit of space for canary on stack
subq 32
.cfi_def_cfa_offset movq %fs:40, %rax ; canary = %fs:40
movq %rax, 8(%rsp) ; store canary on stack
%eax, %eax ; clean registers up
xorl
movq %rsi, (%rdi) ; initial `*p = v;` code
movq 8(%rsp), %rax ; load canary value back from stack
%fs:40, %rax ; compare to the reference value
subq jne .L5 ; exit if canary comparison failed
$24, %rsp
addq
.cfi_remember_state8
.cfi_def_cfa_offset ret ; exit `wr()`
.L5:
.cfi_restore_statecall __stack_chk_fail ; handle failure
.cfi_endproc.LFE0:
, .-wr
.size wr"GCC: (GNU) 12.1.0"
.ident section .note.GNU-stack,"",@progbits .
Now our original code was diluted with 9(!) extra instructions related
to stack protector checks. To make the checking work the compiler uses
%fs:40
thread-local memory location as a canary value. At start of
each function code places canary on stack (with movq %rax, 8(%rsp)
)
and at the end of function code reads the canary value back from the
same location (with movq 8(%rsp), %rax
) and checks if it was unchanged
(with subq %fs:40, %rax
and jne .L5
). If the canary check
failed then __stack_chk_fail()
is called.
If we generalize the above to pseudo-code gcc
turned our program to
something like:
void wr(long * p, long v) {
long canary = __stack_chk_guard;
*p = v; // original code
if (canary != __stack_chk_guard)
();
__stack_chk_failreturn 0;
}
Something has to provide that __stack_chk_fail()
function. In case
of glibc
that function is provided by libc.so.6
library starting
from 2.4
version:
$ nm -D <<NIX>>/glibc-2.34-210/lib/libc.so.6 | fgrep __stack
0000000000116de0 T __stack_chk_fail@@GLIBC_2.4
Something also has to arrange addressable %fs:40
memory. In case of
glibc
that value is placed by glibc
itself
in the early startup code. %fs
is a TLS
segment register for a
segment maintained by kernel: kernel changes the segment address
on thread switch.
Thus the above assembly code generated by gcc
implies presence of
operating system and supporting libc.
Not all architectures have a way to address thread-local data in that
fashion. For targets without TLS
glibc
emulates a bit of stack
protection with a global variable uintptr_t __stack_chk_guard attribute_relro;
.
Turns out it’s not the only implementation of stack protector prologue
and epilogue even on x86_64
. What happens on glibc-2.0
? Or on other
libcs or kernels?
The implementation we saw above was the default case of --disable-libssp
mode of gcc
. We can also build gcc
in --enable-libssp
. In this
case we get a bit different code:
`gcc` build with `./configure --enable-libssp`:
# Locally built "void wr(long * p, long v){ *p = v; }" | gcc/xgcc -Bgcc -S -x c - -o - -fstack-protector-all -O2
$ printf
file "<stdin>"
.
.text4
.p2align
.globl wr, @function
.type wrwr:
.LFB0:
.cfi_startproc$24, %rsp ; allocated a bit of space for canary on stack
subq 32
.cfi_def_cfa_offset movq __stack_chk_guard(%rip), %rax ; canary = __stack_chk_guard
movq %rax, 8(%rsp) ; store canary on stack
%eax, %eax ; clean registers up
xorl
movq %rsi, (%rdi) ; initial `*p = v;` code
movq 8(%rsp), %rax ; load canary value back from stack
(%rip), %rax ; compare to the reference value
subq __stack_chk_guardjne .L5 ; exit if canary comparison failed
$24, %rsp
addq
.cfi_remember_state8
.cfi_def_cfa_offset ret ; exit `wr()`
.L5:
.cfi_restore_statecall __stack_chk_fail ; handle failure
.cfi_endproc.LFE0:
, .-wr
.size wr"GCC: (GNU) 13.0.0 20220724 (experimental)"
.ident section .note.GNU-stack,"",@progbits .
The assembly code is very close to --disable-libssp
case. The
difference is how canary is read:
instead of using thread-local %fs:40
location gcc
now resorts
to using a global __stack_chk_guard
variable.
Note that glibc
does not provde __stack_chk_guard
symbol. In gcc
’s
case expected to come from libssp
library we just enabled. gcc
’s spec
files add -lssp
(or equivalent) to all link commands.
This means that binaries produced by --enable-libssp
and by
--disable-libssp
are slightly incompatible: the final result needs
to be linked by --enable-libssp
gcc
. Otherwise we’ll get linker
failures:
$ printf "void wr(long * p, long v){ *p = v; }" | gcc/xgcc -Bgcc -c -x c - -fPIC -o a.o -fstack-protector-all -O2
$ gcc -shared a.o -o liba.so -Wl,-no-undefined
<<NIX>>/binutils-2.38/bin/ld: a.o: in function `wr':
<stdin>:(.text+0x7): undefined reference to `__stack_chk_guard'
collect2: error: ld returned 1 exit status
Looks familiar? That’s exactly the same failure we started with.
So why do we get a mix of gcc flavours?
Not all libc versions provide stack protector infrastructure. gcc
tries to guess at ./configure
time by
peeking
at target libc’s headers:
-f $target_header_dir/features.h \
[if test && glibc_version_major_define=`$EGREP '^[ ]*#[ ]*define[ ]+__GLIBC__[
+[0-9]' $target_header_dir/features.h` \
]&& glibc_version_minor_define=`$EGREP '^[ ]*#[ ]*define[ ]+__GLIBC_MIN
+[0-9]' $target_header_dir/features.h`; then
OR__[ ]=`echo "$glibc_version_major_define" | sed -e 's/.*__GLIBC__[
glibc_version_major*//'`
]=`echo "$glibc_version_minor_define" | sed -e 's/.*__GLIBC_MINOR
glibc_version_minor*//'`
__[ ]
fi]
...# Test for stack protector support in target C library.
(__stack_chk_fail in target C library,
AC_CACHE_CHECK,
gcc_cv_libc_provides_ssp=no
[gcc_cv_libc_provides_ssp
...
case "$target" in*-*-musl*)
# All versions of musl provide stack protector
=yes;;
gcc_cv_libc_provides_ssp*-*-linux* | *-*-kfreebsd*-gnu)
# glibc 2.4 and later provides __stack_chk_fail and
# either __stack_chk_guard, or TLS access to stack guard canary.
([2], [4], [gcc_cv_libc_provides_ssp=yes], [
GCC_GLIBC_VERSION_GTE_IFELSE
...*) gcc_cv_libc_provides_ssp=no ;;
esac) fi]
Here configure.ac
just greps glibc
’s features.h
header for library
version. It does not do usual linking probing as bootstrap frequently
starts from gcc
and glibc
headers alone.
In nixpkgs
’s case gcc
build in cross-compile
case
(host != target
) was looking at a wrong directory location by
attempting to add sysroot
prefix:
if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x ||
test x$build != x$host || test "x$with_build_sysroot" != x; then
if test "x$with_build_sysroot" != x; then
BUILD_SYSTEM_HEADER_DIR=$with_build_sysroot'$${sysroot_headers_suffix}$(NATIVE_SYSTEM_HEADER_DIR)'
else
BUILD_SYSTEM_HEADER_DIR='$(CROSS_SYSTEM_HEADER_DIR)'
fi
if test x$host != x$target
then
CROSS="-DCROSS_DIRECTORY_STRUCTURE"
ALL=all.cross
SYSTEM_HEADER_DIR=$BUILD_SYSTEM_HEADER_DIR
elif test "x$TARGET_SYSTEM_ROOT" != x; then
SYSTEM_HEADER_DIR='$(CROSS_SYSTEM_HEADER_DIR)'
fi
if test "x$with_build_sysroot" != "x"; then
target_header_dir="${with_build_sysroot}${native_system_header_dir}"
elif test "x$with_sysroot" = x; then
target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"
elif test "x$with_sysroot" = xyes; then
target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-root${native_system_header_dir}"
else
target_header_dir="${with_sysroot}${native_system_header_dir}"
fi
else
target_header_dir=${native_system_header_dir}
fi
Note how hard gcc
tries:
${buildsysroot}/${native_system_header_dir}
${exec_prefix}/${target}/sys-include
${exec_prefix}/${target}/sys-root${native_system_header_dir}
${sysroot}${native_system_header_dir}
nixpkgs
provided none of these directories and build was falling back
to outdated glibc-0.0
assumption.
Thus initial fix was simple: just add --with-build-sysroot=/
option to
gcc
’s ./configure
to trick it to use /${native_system_header_dir}
path.
One-liner change! This allowed me to cross-build gcc
for
powerpc64le-linux
and make sure stack protector is using glibc
support code. Are we done?
Pandora’s box
The --with-build-sysroot=/
now started enabling all sorts of
libc-specific features. That should be fine on it’s own, but for
nixpkgs
cross-build (build != host == target
) case it was
like that for the first time.
Varios linux targets just worked with the fix. Mostly because
we are compiling from glibc
to glibc
. Or from glibc
to musl
.
It’s usually not that bad to miss a feature or two.
I was confident of the fix, but ofborg
presubmit test told me that I broke x86_64-darwin
gcc
build:
impure path `//' used in link
collect2: error: ld returned 1 exit status
After a bit of debugging I found it to be just a false positive check
failure in nixpkgs
-specific ld
wrapper script. Wrapper complained
that -syslibroot //
refers outside /nix/store
path and thus breaks
the sandboxing. But in reality it’s a no-op flag. Thus I just skipped
this specific path in the wrapper.
I tried to cross-build gcc
. It failed again. This time ofborg
was
still unhappy and complained about missing sys/sdt.h
header~
That was surprising: darwin's
libc does provide sys/sdt.h
,
while glibc
does not. Why does it even try to use that header?
Normally gcc
’s configure.ac
probes it
as a target header as well:
# Test for <sys/sdt.h> on the target.
GCC_TARGET_TEMPLATE([HAVE_SYS_SDT_H])
AC_MSG_CHECKING(sys/sdt.h in the target C library)
have_sys_sdt_h=no
if test -f $target_header_dir/sys/sdt.h; then
have_sys_sdt_h=yes
AC_DEFINE(HAVE_SYS_SDT_H, 1,
[Define if your target C library provides sys/sdt.h])
fi
AC_MSG_RESULT($have_sys_sdt_h)
The answer was straightforward: nixpkgs
incorrectly used host’s
headers as target headers!
After I sorted this failure yet another failure came up: pkgsLLVM
bootstrap was broken because gcc
enables corss-compilation mode
for build != host || host != target
case. But nixpkgs
uses
x86_64-unknown-linux-gnu
for both gcc
(host) and llvm
(target) toolchains and bootstraps it as a proper cross-compiler.
That was easy to fix with nixpkgs/182666.
Is that it? I’m not sure. I think we have a few more workarounds
buried in nixpkgs
that stemmed from the fact that we used wrong
headers. One bug at a time.
Parting words
nixpkgs
makes it trivial to try various cross-compilers with a
single command. darwin
port was very useful to expose two bugs
in generic include layour scheme nixpkgs
was using.
Reproducible environment made it possible to debug early stage of
gcc
bootstrap when libc is not yet present for target. When I did
a similar work on Gentoo’s crossdev
I was frequently tricked by
the fact that building initiall cross-toolchain frequently results
in a different result than after a crossdev
rerun.
gcc
’s ./configure
is surprisingly resilient to all the invalid
configurations you throw at it. It always manages to produce something
that mostly works and gets you going as an initial porting effort.
I think it’s a good thing in the toolchain world as target environments
are so diverse. But it takes some time to debug it efficiently.
Have fun!