How do shared library collisions break?


Shared libraries are fun. The concept is simple in theory: we move a piece of code out of the main application into a separate (dynamically loadable) binary and nothing changes:

In practice we get two moving parts where each could be updated separately. People still don’t agree if shared libraries are a net win or a net loss as a concept :)

When you maintain both application and library as a single code base the difference does not really matter. Things get interesting when library starts it’s own life as a separate project and gets it’s own dependencies over time.

What if we want to use two different versions of the same library project within a single application. Say, use gtk-2 and gtk-3 together or ffmpeg-4 and ffmpeg-5.

You might even do it by accident by including two dependencies that rely on different ffmpeg versions:

Is it a safe combination? Can we just link against both versions of a library and be done with it? Let’s try!

toy example

We’ll need an executable and two libraries to play with. Library API provides a single function to tell us it’s name:

// $ cat lib.h
const char * lib_name (void);

Library sources implement the API by returning pointer to source file name:

// $ cat lib1.c
#include "lib.h"

const char * lib_name (void) {  return __FILE__; }
// $ cat lib2.c
#include "lib.h"

const char * lib_name (void) {  return __FILE__; }

Main program:

// $ cat main.c
#include <stdio.h>
#include <stdlib.h>

#include <unistd.h>

#include "lib.h"

int main() {
    /* important part: */
    fprintf (stderr, "lib_name() = %s\n", lib_name());

    /* library loading introspection: */
    fprintf (stderr, "My address space:\n");
    char cmd[1000];
    /* search for code segments (should be at most one per loaded ELF) */
    snprintf(cmd, sizeof (cmd), "grep 'r-x' /proc/%u/maps", getpid());

The important bit here is fprintf (stderr, “lib_name() = %s”, lib_name());. The rest is convenience debugging to see what libraries are loaded into address space.

What happens if we link main.c dynamically against both lib1.c and lib2.c together as external shared libraries?

Would linker complain? Would it pick first library? Or maybe the second one? It depends!

Let’s build shared libraries the simplest way possible and link our program against them:

$ mkdir -p l1 l2


$ gcc -fPIC -shared lib1.c -o l1/
$ gcc -fPIC -shared lib2.c -o l2/

$ gcc main.c -o main1 -Ll1 -Ll2 -ll1 -ll2 -Wl,-rpath,'$ORIGIN/l1' -Wl,-rpath,'$ORIGIN/l2'
$ gcc main.c -o main2 -Ll1 -Ll2 -ll2 -ll1 -Wl,-rpath,'$ORIGIN/l1' -Wl,-rpath,'$ORIGIN/l2'

Quiz question: what would these ./main1 and ./main2 programs print when executed?

Now let’s compare the results:

$ ./main1 | unnix
lib_name() = lib1.c
My address space:
00401000-00402000 r-xp 00001000 00:1b 1404344872                         /home/slyfox/dev/c/shared-libs/main1
7f994d9e2000-7f994db4c000 r-xp 00028000 00:1b 1350927183                 /<<NIX>>/glibc-2.34-210/lib/
7f994dbb9000-7f994dbba000 r-xp 00001000 00:1b 1404344871                 /home/slyfox/dev/c/shared-libs/l2/
7f994dbbe000-7f994dbbf000 r-xp 00001000 00:1b 1404344870                 /home/slyfox/dev/c/shared-libs/l1/
7f994dbc5000-7f994dbea000 r-xp 00001000 00:1b 1350927176                 /<<NIX>>/glibc-2.34-210/lib/
7fff98b52000-7fff98b54000 r-xp 00000000 00:00 0                          [vdso]

$ ./main2 | unnix
lib_name() = lib2.c
My address space:
00401000-00402000 r-xp 00001000 00:1b 1404344873                         /home/slyfox/dev/c/shared-libs/main2
7f95c8773000-7f95c88dd000 r-xp 00028000 00:1b 1350927183                 /<<NIX>>/glibc-2.34-210/lib/
7f95c894a000-7f95c894b000 r-xp 00001000 00:1b 1404344870                 /home/slyfox/dev/c/shared-libs/l1/
7f95c894f000-7f95c8950000 r-xp 00001000 00:1b 1404344871                 /home/slyfox/dev/c/shared-libs/l2/
7f95c8956000-7f95c897b000 r-xp 00001000 00:1b 1350927176                 /<<NIX>>/glibc-2.34-210/lib/
7ffdbb255000-7ffdbb257000 r-xp 00000000 00:00 0                          [vdso]

Note: lib_name() returns two very different results. And that is for a program that is linked against the same set of libraries and headers in both cases!

A few more observations:

Now let’s pretend that and don’t have material difference and implement identical ABI and semantics. On ELF platforms ABI and semantics are usualy reflected by a DT_SONAME tag attached to a library. We can assign SONAME to built library with -Wl,-soname,… flag. Let’s specify identical SONAME to both libraries (I also had to create symlinks to SONAME name):

$ mkdir -p l1 l2

# same SONAME

$ gcc -fPIC -shared lib1.c -o l1/ -Wl,-soname,
$ ln -s l1/
$ gcc -fPIC -shared lib2.c -o l2/ -Wl,-soname,
$ ln -s l2/

$ gcc main.c -o main1 -Ll1 -Ll2 -ll1 -ll2 -Wl,-rpath,'$ORIGIN/l1' -Wl,-rpath,'$ORIGIN/l2'
$ gcc main.c -o main2 -Ll1 -Ll2 -ll1 -ll2 -Wl,-rpath,'$ORIGIN/l2' -Wl,-rpath,'$ORIGIN/l1'

Quiz question: how would these result differ compared to previous run?

Comparing the results again:

$ ./main1 | unnix
lib_name() = lib1.c
My address space:
00401000-00402000 r-xp 00001000 00:1b 1404345186                         /home/slyfox/dev/c/shared-libs/main1
7f3bf33c3000-7f3bf352d000 r-xp 00028000 00:1b 1350927183                 /<<NIX>>/glibc-2.34-210/lib/
7f3bf359a000-7f3bf359b000 r-xp 00001000 00:1b 1404345184                 /home/slyfox/dev/c/shared-libs/l1/
7f3bf35a1000-7f3bf35c6000 r-xp 00001000 00:1b 1350927176                 /<<NIX>>/glibc-2.34-210/lib/
7ffcb5934000-7ffcb5936000 r-xp 00000000 00:00 0                          [vdso]

$ ./main2 | unnix
lib_name() = lib2.c
My address space:
00401000-00402000 r-xp 00001000 00:1b 1404345187                         /home/slyfox/dev/c/shared-libs/main2
7f2c1a48d000-7f2c1a5f7000 r-xp 00028000 00:1b 1350927183                 /<<NIX>>/glibc-2.34-210/lib/
7f2c1a664000-7f2c1a665000 r-xp 00001000 00:1b 1404345185                 /home/slyfox/dev/c/shared-libs/l2/
7f2c1a66b000-7f2c1a690000 r-xp 00001000 00:1b 1350927176                 /<<NIX>>/glibc-2.34-210/lib/
7ffd7c1ba000-7ffd7c1bc000 r-xp 00000000 00:00 0                          [vdso]


Library order matters materially only if a symbol is present in multiple shared libraries (a symbol collision is present). Otherwise you don’t have to worry about it.

Another important assumption here is that lib.h is identical for both and It’s not always the case for more complex scenarios: ffmpeg and gtk certainly change their API and data structures across major releases (or even in different build configurations for the same library release).

diamond dependency trees

Is it a frequent problem to get a mix of libraries like that in a single process? Or it’s a purely hypothetical problem not worth worrying about?

Let’s pick gdb executable (command line debugger) as an example.

Quiz question: how many libraries does gdb use as dependencies. Should it be just libc? Maybe ncurses as well?

ELF files have DT_NEEDED entries in .dynamic section. Those list all immediate shared library dependencies. We can dump DT_NEEDED entries with tools like objdump, readelf, scanelf, patchelf and many others. I’ll use patchelf:

$ patchelf --print-needed `which gdb` | nl

18 immediate libraries! Some of them have their own dependencies. We can dump the whole tree with lddtree:

$ lddtree `which gdb` | unnix | nl
lddtree `which gdb` | unnix | nl
     1  gdb => /run/current-system/sw/bin/gdb (interpreter => /<<NIX>>/glibc-2.34-210/lib/
     2 => /<<NIX>>/readline-8.1p2/lib/
     3 => /<<NIX>>/zlib-1.2.12/lib/
     4 => /<<NIX>>/ncurses-6.3-p20220507/lib/
     5 => /<<NIX>>/python3-3.9.13/lib/
     6 => /<<NIX>>/glibc-2.34-210/lib/
     7 => /<<NIX>>/glibc-2.34-210/lib/
     8 => /<<NIX>>/glibc-2.34-210/lib/
     9 => /<<NIX>>/expat-2.4.8/lib/
    10 => /<<NIX>>/libipt-2.0.4/lib/
    11 => /<<NIX>>/mpfr-4.1.0/lib/
    12 => /<<NIX>>/gmp-with-cxx-6.2.1/lib/
    13 => /<<NIX>>/source-highlight-3.1.9/lib/
    14 => /<<NIX>>/boost-1.77.0/lib/
    15 => /<<NIX>>/glibc-2.34-210/lib/
    16 => /<<NIX>>/icu4c-71.1/lib/
    17 => /<<NIX>>/icu4c-71.1/lib/
    18 => /<<NIX>>/icu4c-71.1/lib/
    19 => /<<NIX>>/gcc-11.3.0-lib/lib/
    20 => /<<NIX>>/glibc-2.34-210/lib/
    21 => /<<NIX>>/glibc-2.34-210/lib/
    22 => /<<NIX>>/glibc-2.34-210/lib/
    23 => /<<NIX>>/glibc-2.34-210/lib/

Just 4 more libraries added by boost internals: ->,,,

From lddtree output it might look like it’s a rare occasion when shared libraries have their owne dependencies. That is misleading: lddtree hides already printed entries by default.

Quiz question: guess how many dependencies does gdb have if we consider all the duplicates.

We’ll use lddtree -a option to answer that question:

$ lddtree -a `which gdb` | unnix | nl
     1  gdb => /run/current-system/sw/bin/gdb (interpreter => /<<NIX>>/glibc-2.34-210/lib/
     2 => /<<NIX>>/readline-8.1p2/lib/
     3 => /<<NIX>>/ncurses-6.3-p20220507/lib/
     4     => /<<NIX>>/glibc-2.34-210/lib/
     5         => /<<NIX>>/glibc-2.34-210/lib/
     6 => /<<NIX>>/glibc-2.34-210/lib/
     7     => /<<NIX>>/glibc-2.34-210/lib/
     8 => /<<NIX>>/zlib-1.2.12/lib/
     9 => /<<NIX>>/glibc-2.34-210/lib/
    10     => /<<NIX>>/glibc-2.34-210/lib/
    11 => /<<NIX>>/ncurses-6.3-p20220507/lib/
    12 => /<<NIX>>/glibc-2.34-210/lib/
    13     => /<<NIX>>/glibc-2.34-210/lib/
    14 => /<<NIX>>/python3-3.9.13/lib/
    15 => /<<NIX>>/glibc-2.34-210/lib/
    16     => not found
    17 => /<<NIX>>/glibc-2.34-210/lib/
    18     => not found
   263 => /<<NIX>>/glibc-2.34-210/lib/
   264 => /<<NIX>>/glibc-2.34-210/lib/
   265 => /<<NIX>>/glibc-2.34-210/lib/

265! It’s more than 10x compared to the output without duplicates. A thing to note here is that is a frequent guest here. The 265 number is also inflated as many subtrees repeat multiple times here.

If we use something more heavyweight like i3 window manager we’ll get even bigger dependency tree:

$ lddtree `which i3` | wc -l
$ lddtree -a `which i3` | wc -l

Let’s draw gdb dependencies as a graph. I find the result less scary:

OK, it’s still unreadable.

Let’s remove all the glibc and gcc dependencies. They are present almost everywhere and clutter our graph. Here is the result of graph with noise removed:

Now it should be more obvious what gdb usually uses.

Diamond dependencies are the ones that have more than one input arrow: they cause dependency graph to be a graph instead of a tree.

Another way to look at it applied to library dependencies: diamond dependencies have more than one path in the graph from dependency root.

For example can be reached via two distinct paths:

From the toy example above we know that the same library does not get loaded multiple times if the absolute library path is the same.

Problems happen when such a diamond dependency is slightly different in two branches. There are many ways to break this diamond by accident. The most popular one is to have slightly different SONAMEs in two branches:

To make it work at all and need to have no colliding symbols or make this mix and match work via other means. Most C libraries don’t handle such coexistence. They assume that everyone can update to and would never compete with it:

Example failures

Unfortunately nothing prevents such inconsistent diamonds with a library version mix to appear. We just did it ourselves in our toy example. How come do we not get into that state all the time? Or maybe we do?

Normally distributions try hard to avoid such version mix by not providing two versions of a library at any point in time: there are no two glibc version installed, no two ffmpeg versions present and so on.

But to each rule there is an exception: not all applications have migrated from python2 to python3, some applications are still on gtk-2, most on gtk-3 and some are already on gtk-4. In these cases you might find all these libraries in your system. Their presence might create false confidence that it’s a safe setup. It is not.

Here are a few examples I saw the past:

gdb and tinfo/tinfow

This example is based on where gdb crashed at start. ncurses provides a few flavours of roughtly the same library with slightly different APIs: ncurses (no unicode support) and ncursesw (has unicode support). Sometimes distributions also enable split-library version of ncurses: and

In 2022 you would normally use ncursesw library everywhere (or everywhere in distributions with split setup).

Due to a minor glitch gdb managed to pull in the following library dependency graph:

  $ lddtree /usr/bin/gdb
  /usr/bin/gdb (interpreter => /lib64/ => /lib64/ => /lib64/ => /lib64/ => /lib64/ => /lib64/

See the problem already? Picture form might help a bit:

I see two problems: and export the same symbol names. That on it’s own might work. But ABI assumptions around private data structures in w and non-w librarues are different. For example WINDOWLIST structure has different size and has extra fields:

    // Somewhere in ncurses/curses.priv.h
    // ...
    struct _win_list {
        struct _win_list *next;
        SCREEN *screen;
        WINDOW win;
        char addch_work[(MB_LEN_MAX * 9) + 1];
        unsigned addch_used;
        int addch_x;
        int addch_y;

If such a structure would be allocated in (without NCURSES_WIDECHAR) and be used as there will likely be data corruption in an attempt to write to non-existent tail of structure (fields addch_work, addch_used, addch_x, addch_y don’t get allocated in

The fix was to update gdb to always link to tinfow if ncursesw is present. And to fix readline to link to ncursesw to match the default of the rest of distribution.

Case of readline is especially worrying: if -> was a conscious decision by readline packagers then gdb would have to inspect it’s dependency first to match it’s defaults when picking the ncurses flavor at gdb build time. Nobody analyzes transitive dependencies in C land and assumes that build environment provides consistent and unambiguous environment: there should be just one library of ncurses discoverable via pkgconfig (or similar) and that version should be used when building both readline and gdb.

I would say that providing both and in the same system is proven to be dangerous. Perhaps providing just SONAMEs like would be slightly less prone to accidental linkage of unintended library.

binutils and multitarget

Another example from Gentoo’s bugzilla:

It starts off very similar to ncurses: Gentoo provides a way to install multiple versions of library:

Gentoo allows multiple parallel major versions of sys-devel/binutils to be present in the system at the same time. And allows only one version of sys-libs/binutils-libs. The split is needed for limitations of package manager library handling. The idea is that sys-devel/binutils libraries will ever be used only by sys-devel/binutils itself: strip, ld and friends will use private library. While external users (like perf or ghc) will never use it and will always pull in sys-libs/binutils-libs library:

Or the same in pictures:

The sets are seemingly disjoint. It should be fine, right? Wrong.

The problem happens when some build system decides to use LD_LIBRARY_PATH=/usr/lib override (like firefox one). It looks cosmetic as /usr/lib is already a default library search path. It should not hurt. But in practice it redirects from:


After the redirect effective runtime dependency graph looks as:

Is it a big deal? Shouldn’t these libraries already be identical? They share after all.

Unfortunately, no.: binutils can be built in a few different incompatible modes that affect library ABI compatibility:

  1. default mode: support only current target and use default file offsets (32-bit offsets on 32-bit systems, 64-bit offsets on 64-bit systems).
  2. 64-bit mode (--enable-64-bit-bfd): support only current target and use 64-bit file offsets
  3. multi-target mode (--enable-targets=all): support multiple target architectures and use 64-bit file offsets.

All these 3 modes produce the same, but it’s ABIs differ quite a bit: 64-bit mode switches public API from typedef unsigned long bfd_vma; to typedef uint64_t bfd_size_type. This breaks global _bfd_std_section array size and breaks ABI similar to nettle ABI breakage.

As a result attempt to force LD_LIBRARY_PATH=/usr/lib on 32-bit systems fails as:

$ LD_LIBRARY_PATH=/usr/lib ld --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ -o z /usr/lib/Scrt1.o
ld: internal error /dev/shm/portage/sys-devel/binutils-2.38/work/binutils-2.38/ld/ldlang.c 6635

The fix (or workaround) was straightforward: change to something that depends on the configuration: or similar:

Mike also suggested another fix: use DT_RPATH ELF tags in binutils binaries instead of DT_RUNPATH to get higher precedence over LD_LIBRARY_PATH:

I think this bug is a good example why you should try hard to avoid multiple libraries in the system with the same SONAME: seemingly uncontroversial LD_LIBRARY_PATH can cause so much trouble.

mpfr/mpc version mismatch

Another case is

In a steady state gcc and it’s mpc dependency both depend on mpfr. All three are distinct packages in Gentoo and can only be updated one at a time on a live system.

Once mpfr is updated it brings into the system a new library: On it’s own it’s fine as gcc and mpc still refer to (which does not get deleted as long as there are referrers to it).

The problem happens when we try to update mpc: we introduce a broken diamond dependency as two versions of mpfr get pulled into gcc:

By luck it did not render gcc broken as gcc was able to recompile itself. Otherwise user would have to redownload broken compiler. Or an ad-hoc upgrade tool would have to be written just for this case.

I wonder how other distributions solve this class of lockstep upgrade problems in their build systems.

does nix magically solve diamond dependency problem?

The short answer is: no, it does not fundamentally prevent such relations from happening. It is even more prone to accidentally inconsistent diamonds as it allows you to install multiple versions of the same library in parallel (say, glibc or ncurses) and be pulled in both as a dependency.

The typical example would be an incorrect attempt to enable debugging mode for some popular dependency. Say, ncurses for gdb:

$ nix build --impure --expr 'with import <nixpkgs> {}; gdb.override { ncurses = ncurses.overrideAttrs(oa: { NIX_CFLAGS_COMPILE = "-O0"; }); }'

Looks benign, isn’t it? We pass slightly modified unoptimised ncurses dependency to gdb.

Unfortunately gdb’s readline dependency also uses ncurses. And in this case it uses unmodified version of ncurses. We can see it in the resulting binary:

lddtree -a ./result/bin/gdb |& fgrep -B1 ncurses => /nix/store/87g044p2zq221fvjzyrqyrkzxxayy1p9-readline-8.1p2/lib/ => /nix/store/7ji068smnymqz2lg2fd42hjnjd5czbl6-ncurses-6.3-p20220507/lib/
   => /nix/store/fz33c1mfi2krpg1lwzizfw28kj705yg0-glibc-2.34-210/lib/ => /nix/store/3hwz3archcn9z8y93b2qdnkrgdf7g5jb-ncurses-6.3-p20220507/lib/

To be fair this output is slightly misleading as both shold be loaded by DT_RUNPTH and would probably end up being pulled in from the same location. There would be no double-load. But it’s hard to predict which of the two would win.

To sidestep this kind of problems nixpkgs tries hard to use a single version of a library throughout the tree. As a result the whole system you build will use the same ncurses library. And it does not have to be the same ncurses you used for older version of your system.

The less incorrect way to achieve the -O0 effect for ncurses would be to override the ncurses attribute itself and let all the packages (up to gdb) use it. One way to do it is via

# slightly better
$ nix build --impure --expr 'with import <nixpkgs> { overlays = [(final: prev: { ncurses = prev.ncurses.overrideAttrs(oa: { NIX_CFLAGS_COMPILE = "-O0"; }); })]; }; gdb'
[2/0/33 built, 5 copied (0.8/0.8 MiB), 0.2 MiB DL] building readline-8.1p2 (buildPhase): mv search.o

Note: this command attepts to rebuild 33 packages:

$ nix build --impure --expr 'with import <nixpkgs> { overlays = [(final: prev: { ncurses = prev.ncurses.overrideAttrs(oa: { NIX_CFLAGS_COMPILE = " -O0"; }); })]; }; gdb' --dry-run |& unnix
these 34 derivations will be built:

Now we can verify that all instances are pulled in from a single path:

$ lddtree -a ./result/bin/gdb |& fgrep -B1 ncurses => /nix/store/k8p8sj27cgblad8f0zavpzwwyvv5gn0d-readline-8.1p2/lib/ => /nix/store/3hwz3archcn9z8y93b2qdnkrgdf7g5jb-ncurses-6.3-p20220507/lib/
   => /nix/store/fz33c1mfi2krpg1lwzizfw28kj705yg0-glibc-2.34-210/lib/ => /nix/store/3hwz3archcn9z8y93b2qdnkrgdf7g5jb-ncurses-6.3-p20220507/lib/

allowed symbol collisions

There are a few cases when it is natural to have symbol collisions:

If you use neither of the above you still can load libraries with clashing symbols. You would have to use dlopen(“path/to/”, RTLD_LOCAL) / dlsym() to extract symbols under non-ambiguous names. Plugins frequently use this technique to avoid namespace pollution and to simplify plugin unloading.

Typical examples of LD_PRELOAD users that rely on runtime symbol overload are:

parting words

Symbol clashes are nasty. They are most frequent to appear when multiple versions of the same library are loaded into the program over different dependency paths.

Default toolchain support does not help much in catching duplicate symbols. You might have to resort to local hacks to detect such cases. Or you can add a feature to your favorite linker!

Luckily there is a simple rule to follow to avoid it most if the time: try hard not to expose more than one version of a library in your depgraph.

Have fun!

Posted on June 18, 2022 by trofi. Email, pull requests or comments are welcome!