dynamic linking ABI is hard

Today on #gentoo-haskell Ke shown an example of subtle ABI breakage. nettle library exports as part of it’s API NULL-terminated array of functions nettle_hashes:

// in nettle-meta.h
extern const struct nettle_hash * const nettle_hashes[];

and defines that array as

// nettle-meta-hashes.c
const struct nettle_hash * const nettle_hashes[] = {
&nettle_md2,
&nettle_md4,
&nettle_md5,
&nettle_ripemd160,
&nettle_sha1,
&nettle_sha224,
&nettle_sha256,
&nettle_sha384,
&nettle_sha512,
NULL
};

Quiz question!

Will ABI change if we add or remove a few entries in array? (like this patch)

Would you expect existing binaries to start crashing after library upgrade on your system?

TL;DR: yes, things will break.

Tiny trigger

To understand how exactly things break let’s dive into simpler example of a library exporting only constant strings and nothing else.

Public library interface:

// l.h:
extern const char s1[];
extern const char s2[];

Full library implementation:

// l.c:
#include "l.h"
#ifdef V1
const char s1[] = "v1 s1";
const char s2[] = "v1 s2";
#endif
#ifdef V2
const char s1[] = "v2 s1 <V2 addition>";
const char s2[] = "v2 s2 <V2 addition>";
#endif

And library user:

// exe.c:
#include <stdio.h>
#include "l.h"
int main() {
printf ("s1='%s'\n", s1);
printf ("s2='%s'\n", s2);
return 0;
}

Here we just print array values from executable. Nothing fancy.

Now let’s try to do the following sequence of actions:

  1. build a library in V1 mode (shorter strings)
  2. build an executable against V1 library
  3. run executable (linked against V1) against V1 library
  4. build a library in V2 mode (longer strings)
  5. run executable (linked against V1) against V2 library
  6. build an executable against V2
  7. run executable (linked against V2) against V2 library

Doing 1-3 steps:

$ gcc -O2 -DV1 -shared -fPIC l.c -o libl.so
$ gcc -O2 exe.c -o exe -L. -ll '-Wl,-rpath=$ORIGIN'
$ echo 'Runnig exe/V1'
Runnig exe/V1
$ ./exe
s1='v1 s1'
s2='v1 s2'
$ cp exe exe-v1

No surprises here. Let’s update library to V2 (steps 4-5):

$ gcc -O2 -DV2 -shared -fPIC l.c -o libl.so
$ echo 'Runnig exe/V2'
Runnig exe/V2
$ ./exe
./exe: Symbol `s2' has different size in shared object, consider re-linking
./exe: Symbol `s1' has different size in shared object, consider re-linking
s1='v2 s1 '
s2='v2 s2 v2 s1 '

Hah! Data corruption! glibc’s runtime dynamic linker even hints us to relink an executable. Let’s do that (steps 6-7):

$ gcc -O2 exe.c -o exe -L. -ll '-Wl,-rpath=$ORIGIN'
$ echo 'Runnig exe/V2 (relinked)'
Runnig exe/V2 (relinked)
$ ./exe
s1='v2 s1 <V2 addition>'
s2='v2 s2 <V2 addition>'
$ cp exe exe-v2

Recovered.

The clues

So how could executable change when linked against V1 and V2 versions? The easiest way to see it is to dump all the ELF information we have:

$ readelf -a exe-v1 > v1
$ readelf -a exe-v2 > v2
$ diff -u v1 v2
--- v1 2016-12-03 14:39:09.475769368 +0000
+++ v2 2016-12-03 14:39:11.510768031 +0000
@@ -1,3 +1,3 @@
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[24] .bss NOBITS 0000000000601030 00001030
- 0000000000000010 0000000000000000 WA 0 0 1
+ 0000000000000038 0000000000000000 WA 0 0 16
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
...
LOAD 0x0000000000000de8 0x0000000000600de8 0x0000000000600de8
- 0x0000000000000248 0x0000000000000258 RW 200000
+ 0x0000000000000248 0x0000000000000280 RW 200000
...
Relocation section '.rela.dyn' at offset 0x498 contains 4 entries:
Offset Info Type Sym. Value Sym. Name + Addend
...
000000601030 000900000005 R_X86_64_COPY 0000000000601030 s2 + 0
-000000601036 000600000005 R_X86_64_COPY 0000000000601036 s1 + 0
+000000601050 000600000005 R_X86_64_COPY 0000000000601050 s1 + 0
...
Symbol table '.dynsym' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
...
- 6: 0000000000601036 6 OBJECT GLOBAL DEFAULT 24 s1
+ 6: 0000000000601050 20 OBJECT GLOBAL DEFAULT 24 s1
...
- 9: 0000000000601030 6 OBJECT GLOBAL DEFAULT 24 s2
+ 9: 0000000000601030 20 OBJECT GLOBAL DEFAULT 24 s2
...
Symbol table '.symtab' contains 60 entries:
Num: Value Size Type Bind Vis Ndx Name
...
- 43: 0000000000601030 6 OBJECT GLOBAL DEFAULT 24 s2
+ 43: 0000000000601030 20 OBJECT GLOBAL DEFAULT 24 s2
...
- 54: 0000000000601036 6 OBJECT GLOBAL DEFAULT 24 s1
+ 54: 0000000000601050 20 OBJECT GLOBAL DEFAULT 24 s1

We see here a lot of interesting facts:

It means array contents is copied from library .data section to an executable .bss section at each execution startup time.

Why does it behave like that?

But why copy? Arrays might be huge in size and copying them would take a while. Why not just map the library and use it’s symbols?

For that we need to understand what drives the process of binary generation.

All starts from exe.c file being converted to the assembly form. Let’s look at it:

; gcc -O2 exe.c -S -o exe.S
.file "exe.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "s1='%s'\n"
.LC1:
.string "s2='%s'\n"
.section .text.startup,"ax",@progbits
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB23:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $s1, %edx
movl $.LC0, %esi
movl $1, %edi
xorl %eax, %eax
call __printf_chk
movl $s2, %edx
movl $.LC1, %esi
movl $1, %edi
xorl %eax, %eax
call __printf_chk
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE23:
.size main, .-main
.ident "GCC: (Gentoo 6.2.0-r1 p1.1) 6.2.0"
.section .note.GNU-stack,"",@progbits

The relevant piece of code here is how s1 gets propagated to printf call:

;
.file "exe.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "s1='%s'\n"
...
main:
...
movl $s1, %edx
movl $.LC0, %esi
movl $1, %edi
xorl %eax, %eax
call __printf_chk

$s1 is an absolute address to s1 symbol. It is not known at exe link time as it’s storage is in external library. There is no indirection used.

One way of adjusting this address is to use a relocation in code segment (also known at TEXTREL). But such relocations are unwelcome in linux systems. They have a few disadvantages:

s1 and s2 object size is known at link time: ld accepts both exe.c and libl.so files to resolve all used symbols in final exe. Thus linker decides to provide storage for such data in exe’s own writable .bss section and generates special COPY relocations as if external data would be local to exe.

When we update libl.so with new s1 object size exe still contains COPY relocation of symbol s1 of the old size. This leads to partial symbol copying at exe startup.

In case of nettle that means NULL-terminated array will be copied only partially (missing 4 last elements including NULL) which causes occasional SIGSEGVs.

A fun workaround

This absolute relocation problem is well known when writing shared libraries. Compiler has a special position independent mode (-fPIC) that generates non-absolute access to each symbol in the library.

We can workaround the problem by building exe.c with -fPIC:

$ gcc -O2 -DV1 -shared -fPIC l.c -o libl.so
$ gcc -O2 -fPIC exe.c -o exe -L. -ll '-Wl,-rpath=$ORIGIN'
$ echo 'Runnig exe/V1'
Runnig exe/V1
$ ./exe
s1='v1 s1'
s2='v1 s2'
$ gcc -O2 -DV2 -shared -fPIC l.c -o libl.so
$ echo 'Runnig exe/V2'
Runnig exe/V2
$ ./exe
s1='v2 s1 <V2 addition>'
s2='v2 s2 <V2 addition>'

It just works. Let’s look at the changes in generated code for exe.c:

; gcc -fPIC -O2 exe.c -S -o exe-fPIC.S
.file "exe.c"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "s1='%s'\n"
.LC1:
.string "s2='%s'\n"
.section .text.startup,"ax",@progbits
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB23:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movq s1@GOTPCREL(%rip), %rdx
leaq .LC0(%rip), %rsi
movl $1, %edi
xorl %eax, %eax
call __printf_chk@PLT
movq s2@GOTPCREL(%rip), %rdx
leaq .LC1(%rip), %rsi
movl $1, %edi
xorl %eax, %eax
call __printf_chk@PLT
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE23:
.size main, .-main
.ident "GCC: (Gentoo 6.2.0-r1 p1.1) 6.2.0"
.section .note.GNU-stack,"",@progbits

Or in a diff form:

--- exe.S 2016-12-03 17:51:28.229898505 +0000
+++ exe-fPIC.S 2016-12-03 18:09:38.341060805 +0000
@@ -16,2 +16,2 @@
- movl $s1, %edx
- movl $.LC0, %esi
+ movq s1@GOTPCREL(%rip), %rdx
+ leaq .LC0(%rip), %rsi
@@ -20,3 +20,3 @@
- call __printf_chk
- movl $s2, %edx
- movl $.LC1, %esi
+ call __printf_chk@PLT
+ movq s2@GOTPCREL(%rip), %rdx
+ leaq .LC1(%rip), %rsi
@@ -25 +25 @@
- call __printf_chk
+ call __printf_chk@PLT

Access to s1 is now done via separate global offset table (aka .got). This way we get another layer of indirection (memory dereference) and get our s1 contents without copies.

A few takeaways

Have fun!

Posted on December 3, 2016
comments powered by Disqus