dynamic linking ABI is hard
Today on #gentoo-haskell Ke shown an example of subtle ABI breakage.
nettle library exports as part of its API NULL-terminated array of
functions nettle_hashes:
// in nettle-meta.h
extern const struct nettle_hash * const nettle_hashes[];and defines that array as
// nettle-meta-hashes.c
const struct nettle_hash * const nettle_hashes[] = {
&nettle_md2,
&nettle_md4,
&nettle_md5,
&nettle_ripemd160,
&nettle_sha1,
&nettle_sha224,
&nettle_sha256,
&nettle_sha384,
&nettle_sha512,
NULL
};Quiz question!
Will ABI change if we add or remove a few entries in array? (like
in this
patch)
Would you expect existing binaries to start crashing after library upgrade on your system?
TL;DR: yes, things will break.
Tiny trigger
To understand how exactly things break let’s dive into simpler example of a library exporting only constant strings and nothing else. Public library interface:
// l.h:
extern const char s1[];
extern const char s2[];Full library implementation:
// l.c:
#include "l.h"
#ifdef V1
const char s1[] = "v1 s1";
const char s2[] = "v1 s2";
#endif
#ifdef V2
const char s1[] = "v2 s1 <V2 addition>";
const char s2[] = "v2 s2 <V2 addition>";
#endifAnd library user:
// exe.c:
#include <stdio.h>
#include "l.h"
int main() {
printf ("s1='%s'\n", s1);
printf ("s2='%s'\n", s2);
return 0;
}Here we just print array values from executable. Nothing fancy. Now let’s try to do the following sequence of actions:
- build a library in
V1mode (shorter strings) - build an executable against
V1library - run executable (linked against
V1) againstV1library - build a library in
V2mode (longer strings) - run executable (linked against
V1) againstV2library - build an executable against
V2 - run executable (linked against
V2) againstV2library
Doing 1-3 steps:
$ gcc -O2 -DV1 -shared -fPIC l.c -o libl.so
$ gcc -O2 exe.c -o exe -L. -ll '-Wl,-rpath=$ORIGIN'
$ echo 'Runnig exe/V1'
Runnig exe/V1
$ ./exe
s1='v1 s1'
s2='v1 s2'
$ cp exe exe-v1
No surprises here. Let’s update library to V2 (steps 4-5):
$ gcc -O2 -DV2 -shared -fPIC l.c -o libl.so
$ echo 'Runnig exe/V2'
Runnig exe/V2
$ ./exe
./exe: Symbol `s2' has different size in shared object, consider re-linking
./exe: Symbol `s1' has different size in shared object, consider re-linking
s1='v2 s1 '
s2='v2 s2 v2 s1 '
Aha! Data corruption! glibc runtime dynamic linker even hints us
to relink an executable. Let’s do that (steps 6-7):
$ gcc -O2 exe.c -o exe -L. -ll '-Wl,-rpath=$ORIGIN'
$ echo 'Runnig exe/V2 (relinked)'
Runnig exe/V2 (relinked)
$ ./exe
s1='v2 s1 <V2 addition>'
s2='v2 s2 <V2 addition>'
$ cp exe exe-v2
Recovered.
The clues
So how could executable change when linked against V1 and V2
versions? The easiest way to see it is to dump all the ELF information
we have:
$ readelf -a exe-v1 > v1
$ readelf -a exe-v2 > v2
$ diff -u v1 v2--- v1 2016-12-03 14:39:09.475769368 +0000
+++ v2 2016-12-03 14:39:11.510768031 +0000
@@ -1,3 +1,3 @@
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[24] .bss NOBITS 0000000000601030 00001030
- 0000000000000010 0000000000000000 WA 0 0 1
+ 0000000000000038 0000000000000000 WA 0 0 16
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
...
LOAD 0x0000000000000de8 0x0000000000600de8 0x0000000000600de8
- 0x0000000000000248 0x0000000000000258 RW 200000
+ 0x0000000000000248 0x0000000000000280 RW 200000
...
Relocation section '.rela.dyn' at offset 0x498 contains 4 entries:
Offset Info Type Sym. Value Sym. Name + Addend
...
000000601030 000900000005 R_X86_64_COPY 0000000000601030 s2 + 0
-000000601036 000600000005 R_X86_64_COPY 0000000000601036 s1 + 0
+000000601050 000600000005 R_X86_64_COPY 0000000000601050 s1 + 0
...
Symbol table '.dynsym' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
...
- 6: 0000000000601036 6 OBJECT GLOBAL DEFAULT 24 s1
+ 6: 0000000000601050 20 OBJECT GLOBAL DEFAULT 24 s1
...
- 9: 0000000000601030 6 OBJECT GLOBAL DEFAULT 24 s2
+ 9: 0000000000601030 20 OBJECT GLOBAL DEFAULT 24 s2
...
Symbol table '.symtab' contains 60 entries:
Num: Value Size Type Bind Vis Ndx Name
...
- 43: 0000000000601030 6 OBJECT GLOBAL DEFAULT 24 s2
+ 43: 0000000000601030 20 OBJECT GLOBAL DEFAULT 24 s2
...
- 54: 0000000000601036 6 OBJECT GLOBAL DEFAULT 24 s1
+ 54: 0000000000601050 20 OBJECT GLOBAL DEFAULT 24 s1We see here a lot of interesting facts:
s1ands2symbols have known sizes- the sizes change from 6 bytes (
"v1 s1\0") to 20 bytes ("v2 s1 <V2 addition>\0") - both
s1ands2have mysteriousR_X86_64_COPYrelocation type .bsssection size increased for +40 bytes- LOAD read/write segment increased for +40 bytes
It means array contents is copied from library .data section to an
executable .bss section at each execution startup time.
Why does it behave like that?
But why copy? Arrays might be huge in size and copying them would take a
while. Why not just map the library and use it’s symbols?
For that we need to understand what drives the process of binary
generation.
All starts from exe.c file being converted to the assembly form.
Let’s look at it:
; gcc -O2 exe.c -S -o exe.S
.file "exe.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "s1='%s'\n"
.LC1:
.string "s2='%s'\n"
.section .text.startup,"ax",@progbits
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB23:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $s1, %edx
movl $.LC0, %esi
movl $1, %edi
xorl %eax, %eax
call __printf_chk
movl $s2, %edx
movl $.LC1, %esi
movl $1, %edi
xorl %eax, %eax
call __printf_chk
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE23:
.size main, .-main
.ident "GCC: (Gentoo 6.2.0-r1 p1.1) 6.2.0"
.section .note.GNU-stack,"",@progbitsThe relevant piece of code here is how s1 gets propagated to
printf call:
;
.file "exe.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "s1='%s'\n"
...
main:
...
movl $s1, %edx
movl $.LC0, %esi
movl $1, %edi
xorl %eax, %eax
call __printf_chk$s1 is an absolute address to s1 symbol. It is not known at
exe link time as it’s storage is in external library. There is no
indirection used.
One way of adjusting this address is to use a relocation in code segment
(also known at TEXTREL). But such relocations are unwelcome in linux
systems. They have a few disadvantages:
insecurity:.textsections that containTEXTRELs need to be mapped withRWXpermissions.inefficiency: Fixing up these relocation has to be done before program takes control. Even if code will never be executed.inefficiency: Each fixed relocation un-shares code page where relocation was fixed up increasing runtime memory footprint.
s1 and s2 object size is known at link time: ld accepts both
exe.c and libl.so files to resolve all used symbols in final
exe. Thus linker decides to provide storage for such data in
exe’s own writable .bss section and generates special COPY
relocations as if external data would be local to exe.
When we update libl.so with new s1 object size exe still
contains COPY relocation of symbol s1 of the old size. This
leads to partial symbol copying at exe startup.
In case of nettle that means NULL-terminated array will be
copied only partially (missing 4 last elements including NULL) which
causes occasional SIGSEGVs.
A fun workaround
This absolute relocation problem is well known when writing shared
libraries. Compiler has a special position independent mode (-fPIC)
that generates non-absolute access to each symbol in the library.
We can workaround the problem by building exe.c with -fPIC:
$ gcc -O2 -DV1 -shared -fPIC l.c -o libl.so
$ gcc -O2 -fPIC exe.c -o exe -L. -ll '-Wl,-rpath=$ORIGIN'
$ echo 'Runnig exe/V1'
Runnig exe/V1
$ ./exe
s1='v1 s1'
s2='v1 s2'
$ gcc -O2 -DV2 -shared -fPIC l.c -o libl.so
$ echo 'Runnig exe/V2'
Runnig exe/V2
$ ./exe
s1='v2 s1 <V2 addition>'
s2='v2 s2 <V2 addition>'
It just works. Let’s look at the changes in generated code for
exe.c:
; gcc -fPIC -O2 exe.c -S -o exe-fPIC.S
.file "exe.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "s1='%s'\n"
.LC1:
.string "s2='%s'\n"
.section .text.startup,"ax",@progbits
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB23:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movq s1@GOTPCREL(%rip), %rdx
leaq .LC0(%rip), %rsi
movl $1, %edi
xorl %eax, %eax
call __printf_chk@PLT
movq s2@GOTPCREL(%rip), %rdx
leaq .LC1(%rip), %rsi
movl $1, %edi
xorl %eax, %eax
call __printf_chk@PLT
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE23:
.size main, .-main
.ident "GCC: (Gentoo 6.2.0-r1 p1.1) 6.2.0"
.section .note.GNU-stack,"",@progbitsOr in a diff form:
--- exe.S 2016-12-03 17:51:28.229898505 +0000
+++ exe-fPIC.S 2016-12-03 18:09:38.341060805 +0000
@@ -16,2 +16,2 @@
- movl $s1, %edx
- movl $.LC0, %esi
+ movq s1@GOTPCREL(%rip), %rdx
+ leaq .LC0(%rip), %rsi
@@ -20,3 +20,3 @@
- call __printf_chk
- movl $s2, %edx
- movl $.LC1, %esi
+ call __printf_chk@PLT
+ movq s2@GOTPCREL(%rip), %rdx
+ leaq .LC1(%rip), %rsi
@@ -25 +25 @@
- call __printf_chk
+ call __printf_chk@PLTAccess to s1 is now done via separate global offset table (aka
.got). This way we get another layer of indirection (memory
dereference) and get our s1 contents without copies.
A few takeaways
- Be careful when exporting any objects from libraries (arrays, structs, integral constants)
- Exporting a pointer (
const char *) instead of an array (const char []) would be not so devastating - Dynamic linking is hard :)
- To learn more it’s worth reading Ulrich Drepper’s
DSOhowto - Another good book is Linkers and Loaders by Jonh R. Levine.
Have fun!