fixing ghc on sparc ia64 and friends

September 13, 2014

The ghc-7.8.1 release was the first release where ghc binary (and tools like ghc-pkg, hpc, hsc2hs, etc.) was dynamically linked against haskell libraries shipped with ghc.

At that time:

ia64 (unreported, not easy to build :]),
sparc (#8857),
ppc (#8024)

all crashed when trying to use dynamic linking in ghc-7.8.

Having upstreamed simplest build fixes for unreg arches I foolishly hoped all other architectures would Just Work, but fun stuff only started to happen.

To know all the gory details read on! :]

amd64 bugs:

Being bitten by #8748 before trying real exotics like sparc I first decided to check how good (or bad) UNREG amd64 build was.

To get a feeling about state of some platform you only need to build ghc and run regression tests:

./configure --enable-unregisterised
make
make fulltest # run regtest

There was about 200 tests of 4000 broken (which is quite nice, but not ideal). The first thing caught my eye was broken arithmetics in rare cases like integerConversions.

Tests gave incorrect results for code like

Prelude> 4 - 2^64 == 0
True

Thinking about it as The Root Cause of all SIGSEGVs I have ever seen I’ve started trimming bits down to minimal .cmm snippet.

And got this:

CInt a = -1;
return (a == -1)
# returns False

At that stage exploring generated C code is trivial which gave:

StgWord64 a = (StgWord32)0xFFFFffffFFFFffffu;
return (a == 0xFFFFffffFFFFffffu)

The bug here is in pretty-printing 32-bit constant, which was easy to fix with Reid’s help.

sparc bug:

But integer-literals fix didn’t help dynamic binaries on sparc and I moved to explore on that box.

The immediate symptom on sparc was a SIGSEGV somewhere before program’s main() entry point. And that was very weird.

I expected C part of haskell runtime to run first and crash later. My plan was to break on main() and step-by-step get basic understanding on what was going on.

Thus I tried to update binutils, gcc and glibc first in hope of some toolchain bug.

No luck, the bug persisted in exactly the same form.

Backtrace of core dump suggested crash was happening somewhere in foreignExportStablePtr function, which gets registered at binary (or library) load time before your main() entry point. That __attribute__((constructor)); makes the magic happen.

For such haskell code

module M where
import Foreign.C.Types
foreign export ccall f :: CInt -> CInt
f :: CInt -> CInt
f n = n + 1

ghc basically generates the following stub:

extern StgClosure M_zdfstableZZC0ZZCmainZZCMZZCf_closure;
HsInt32 f(HsInt32 a1)
{
    Capability *cap;
    HaskellObj ret;
    HsInt32 cret;
    cap = rts_lock();
    rts_evalIO(&cap,rts_apply(cap,(HaskellObj)runNonIO_closure,rts_apply(cap,&M_zdfstableZZC0ZZCma
    inZZCMZZCf_closure,rts_mkInt32(cap,a1))) ,&ret);
    rts_checkSchedStatus("f",cap);
    cret=rts_getInt32(ret);
    rts_unlock(cap);
    return cret;
}
/* our static constructor */
static void stginit_export_M_zdfstableZZC0ZZCmainZZCMZZCf() __attribute__((constructor));
static void stginit_export_M_zdfstableZZC0ZZCmainZZCMZZCf()
{foreignExportStablePtr((StgPtr) &M_zdfstableZZC0ZZCmainZZCMZZCf_closure);}

sparc solution:

Having smaller program it’s once againt simpler to explore the breakage.

RISC CPUs are fun creatures.

To feel all the delight of looking at the sparc assembly I propose to look at the generated code for the following C snippet:

unsigned unt f(void)
{
    return 0x1234ABCD;
}

i386 easily puts a value into a register:

movl $0x1234ABCD, %eax

while sparc32 has hard time:

sethi %hi(0x1234A800), %o0
or    %o0, 0x3CD, %o0

You can’t encode 32-bit immediate in a 32-bit instruction containing a tuple of (opcode, dest-reg, imm). Opcode usually takes 5 bits, register 5 bits (32 regs) and imm gets only 22 bits. On 2-operand instructions (opcode, src-reg, imm) we get even less.

Thus sparc32 had to add special instruction to their ISA setting high bits for a given reg (o0 in our case).

Things are even worse for sparc64 and ppc64 where instructions are 32-bit, but registers are 64-bit wide.

x86_64 easily puts a 64-bit value into a register:

movabs $0x1234ABCD5678DCBA, %rax

while sparc64 does something completely awful:

sethi   %hi(0x1234a800), %o0
sethi   %hi(0x5678dc00), %g1
or      %o0, 0x3CD, %o0
or      %g1, 0xBA, %g1
sllx    %o0, 32, %o0
add     %o0, %g1, %o0

The above is important because sometimes those constants are not known at compile (and assembly) time, but known only at link time.

ghc’s driver pipeline

What ghc basically does when compiler .hs file on via-C (aka UNREG) arch is the following:

.hs -> … -> .hc file (haskell-to-C pass)
.hc -> .s (haskell-to-asm pass)
.s -> .s (asm-to-asm mangling pass, no-op in UNREG mode)
.s -> .o (asm-to-object pass)

The bug was introduced into pipeline when -dynamic way was added to ghc (to build dynamic haskell libraries or position-independent executables).

On many architectures libraries require position independent code layout (so called PIC). It’s controlled by -fpic / -fPIC set of gcc (and ghc) flags.

ghc passed -fPIC option to 1. and 2., but not to 4.(!) where assembler needs to generate either absolute or relative relocation types for the following asm snippet:

; load GOT address into %l7
sethi   %hi(_GLOBAL_OFFSET_TABLE_-8), %l7
add     %l7, %lo(_GLOBAL_OFFSET_TABLE_-4), %l7

The bug was easily fixed when identified. Now shared haskell libs work on sparc!

a bit more on sparc’s relocations

Sometimes there is many ways to generate PIC code even for a single given arch. On sparc for example there is at least:

-fpic option (22-bit relocations)
-fPIC (32-bit relocations)

If you are curious:

extern int g_i;
int * f(void)
{
    return &g_i;
}

Generates the following assembly for -fpic:

f:
    save    %sp, -96, %sp
    sethi   %hi(_GLOBAL_OFFSET_TABLE_-8), %l7
    add     %l7, %lo(_GLOBAL_OFFSET_TABLE_-4), %l7
    call    __sparc_get_pc_thunk.l7
     nop
    ld      [%l7+g_i], %g1
    mov     %g1, %i0
    restore
    jmp     %o7+8
     nop

and for -fPIC:

f:
    save    %sp, -96, %sp
    sethi   %hi(_GLOBAL_OFFSET_TABLE_-8), %l7
    add     %l7, %lo(_GLOBAL_OFFSET_TABLE_-4), %l7
    call    __sparc_get_pc_thunk.l7
     nop
    sethi   %gdop_hix22(g_i), %g1
    xor     %g1, %gdop_lox10(g_i), %g1
    ld      [%l7 + %g1], %g1, %gdop(g_i)
    mov     %g1, %i0
    restore
    jmp     %o7+8
     nop

The difference here is what is used to access GOT:

; -fpic: immediate offset in 'ld'
ld      [%l7+g_i], %g1
;
; -fPIC: loading full 32-bit offset into %l7 register
sethi   %gdop_hix22(g_i), %g1
xor     %g1, %gdop_lox10(g_i), %g1
ld      [%l7 + %g1], %g1, %gdop(g_i)

ia64 and integer-gmp

Having dealt with sparc I’ve decided to have a look at ia64 once again (i didn’t touch it much after ghc-7.4).

ghc on ia64 has many problems. One of them is known as gprel addressing overflow. You just can’t link static ghc-7.6 binary on ia64.

Thus one of the ways to fix it is to get dynamic linking working (and a working ghci as a side effect).

Luckily, sparc fix was enough to unbreak it!

Another long-standing problem was non-working integer-gmp library (any call to that library resulted in SIGSEGV). The workaround was to use pure-haskell integer-simple library.

It broke around ghc-7.0 release and was not touched since.

After some painful debugging (linking libghc takes 2 hours on ia64, bulding from scratch - 10 hours on 4-core box) I’ve found a cause: C codegen generates data-like prototypes for function-like objects.

Consider the following snippet:

// a.c
void f(void) {}
// b.c
extern void f(void);
void * g (void)
{
    return (void*)&f;
}
// c.c
extern int f;
void * g (void)
{
    return (void*)&f;
}

For most popular arches you would expect the same code to be generated for b.c and c.c files. But it’s not the case for ia64. Function pointers there are not pointers to code, but pointers to a structure called function descriptor (a structure of 2 “pointers”: a pointer to code and a new gp value).

As usual once spotted the problem was easy to fix.

You would not normally write code like c.c, but in this case it was a stupid bug.

I have an idea to clean C codegen to a state when gcc’s LTO will be able to build ghc on amd64 and will be able to find such kinds of bugs at compile time.

For curious here is how assembly looks for b.c on ia64:

g:
    .mmi
    nop 0
    addl r8 = @ltoff(@fptr(f#)), gp
    nop 0
    ;;
    .mib
    ld8 r8 = [r8]
    nop 0
    br.ret.sptk.many rp

and for c.c:

g:
    .mmi
    nop 0
    addl r8 = @ltoffx(f#), r1
    nop 0
    ;;
    .mib
    ld8.mov r8 = [r8], f#
    nop 0
    br.ret.sptk.many rp

Here f# and @fptr(f#) are different objects with different access rules. @fptr for example needs one more dereference to be called/compared/whatever.

how I’ve spent an august

What works better now in ghc-HEAD

64-bit UNREG arches make less mistakes in integer operations
shared libraries (and ghci) now do work at least on sparc and ia64 (i hope on ppc as well)
integer-gmp now works on ia64

Have fun!