fixing ghc on sparc ia64 and friends
The ghc-7.8.1 release was the first release where ghc binary (and tools like ghc-pkg, hpc, hsc2hs, etc.) was dynamically linked against haskell libraries shipped with ghc.
At that time:
all crashed when trying to use dynamic linking in ghc-7.8.
Having upstreamed simplest build fixes for unreg arches I foolishly hoped all other architectures would Just Work, but fun stuff only started to happen.
To know all the gory details read on! :]
amd64 bugs:
Being bitten by #8748 before trying real exotics like sparc I first decided to check how good (or bad) UNREG amd64 build was.
To get a feeling about state of some platform you only need to build ghc and run regression tests:
./configure --enable-unregisterised
make
make fulltest # run regtest
There was about 200 tests of 4000 broken (which is quite nice, but not ideal). The first thing caught my eye was broken arithmetics in rare cases like integerConversions.
Tests gave incorrect results for code like
Prelude> 4 - 2^64 == 0
True
Thinking about it as The Root Cause of all SIGSEGVs I have ever seen I’ve started trimming bits down to minimal .cmm snippet.
And got this:
= -1;
CInt a return (a == -1)
# returns False
At that stage exploring generated C code is trivial which gave:
= (StgWord32)0xFFFFffffFFFFffffu;
StgWord64 a return (a == 0xFFFFffffFFFFffffu)
The bug here is in pretty-printing 32-bit constant, which was easy to fix with Reid’s help.
sparc bug:
But integer-literals fix didn’t help dynamic binaries on sparc and I moved to explore on that box.
The immediate symptom on sparc was a SIGSEGV somewhere before program’s main() entry point. And that was very weird.
I expected C part of haskell runtime to run first and crash later. My plan was to break on main() and step-by-step get basic understanding on what was going on.
Thus I tried to update binutils, gcc and glibc first in hope of some toolchain bug.
No luck, the bug persisted in exactly the same form.
Backtrace of core dump suggested crash was happening somewhere in foreignExportStablePtr function, which gets registered at binary (or library) load time before your main() entry point. That __attribute__((constructor)); makes the magic happen.
For such haskell code
module M where
import Foreign.C.Types
f :: CInt -> CInt
foreign export ccallf :: CInt -> CInt
= n + 1 f n
ghc basically generates the following stub:
extern StgClosure M_zdfstableZZC0ZZCmainZZCMZZCf_closure;
(HsInt32 a1)
HsInt32 f{
*cap;
Capability ;
HaskellObj ret;
HsInt32 cret= rts_lock();
cap (&cap,rts_apply(cap,(HaskellObj)runNonIO_closure,rts_apply(cap,&M_zdfstableZZC0ZZCma
rts_evalIO,rts_mkInt32(cap,a1))) ,&ret);
inZZCMZZCf_closure("f",cap);
rts_checkSchedStatus=rts_getInt32(ret);
cret(cap);
rts_unlockreturn cret;
}
/* our static constructor */
static void stginit_export_M_zdfstableZZC0ZZCmainZZCMZZCf() __attribute__((constructor));
static void stginit_export_M_zdfstableZZC0ZZCmainZZCMZZCf()
{foreignExportStablePtr((StgPtr) &M_zdfstableZZC0ZZCmainZZCMZZCf_closure);}
sparc solution:
Having smaller program it’s once againt simpler to explore the breakage.
RISC CPUs are fun creatures.
To feel all the delight of looking at the sparc assembly I propose to look at the generated code for the following C snippet:
unsigned unt f(void)
{
return 0x1234ABCD;
}
i386 easily puts a value into a register:
$0x1234ABCD, %eax movl
while sparc32 has hard time:
%hi(0x1234A800), %o0
sethi or %o0, 0x3CD, %o0
You can’t encode 32-bit immediate in a 32-bit instruction containing a tuple of (opcode, dest-reg, imm). Opcode usually takes 5 bits, register 5 bits (32 regs) and imm gets only 22 bits. On 2-operand instructions (opcode, src-reg, imm) we get even less.
Thus sparc32 had to add special instruction to their ISA setting high bits for a given reg (o0 in our case).
Things are even worse for sparc64 and ppc64 where instructions are 32-bit, but registers are 64-bit wide.
x86_64 easily puts a 64-bit value into a register:
$0x1234ABCD5678DCBA, %rax movabs
while sparc64 does something completely awful:
%hi(0x1234a800), %o0
sethi %hi(0x5678dc00), %g1
sethi or %o0, 0x3CD, %o0
or %g1, 0xBA, %g1
%o0, 32, %o0
sllx add %o0, %g1, %o0
The above is important because sometimes those constants are not known at compile (and assembly) time, but known only at link time.
ghc’s driver pipeline
What ghc basically does when compiler .hs file on via-C (aka UNREG) arch is the following:
- .hs -> … -> .hc file (haskell-to-C pass)
- .hc -> .s (haskell-to-asm pass)
- .s -> .s (asm-to-asm mangling pass, no-op in UNREG mode)
- .s -> .o (asm-to-object pass)
The bug was introduced into pipeline when -dynamic way was added to ghc (to build dynamic haskell libraries or position-independent executables).
On many architectures libraries require position independent code layout (so called PIC). It’s controlled by -fpic / -fPIC set of gcc (and ghc) flags.
ghc passed -fPIC option to 1. and 2., but not to 4.(!) where assembler needs to generate either absolute or relative relocation types for the following asm snippet:
; load GOT address into %l7
%hi(_GLOBAL_OFFSET_TABLE_-8), %l7
sethi add %l7, %lo(_GLOBAL_OFFSET_TABLE_-4), %l7
The bug was easily fixed when identified. Now shared haskell libs work on sparc!
a bit more on sparc’s relocations
Sometimes there is many ways to generate PIC code even for a single given arch. On sparc for example there is at least:
- -fpic option (22-bit relocations)
- -fPIC (32-bit relocations)
If you are curious:
extern int g_i;
int * f(void)
{
return &g_i;
}
Generates the following assembly for -fpic:
f:
%sp, -96, %sp
save %hi(_GLOBAL_OFFSET_TABLE_-8), %l7
sethi add %l7, %lo(_GLOBAL_OFFSET_TABLE_-4), %l7
call __sparc_get_pc_thunk.l7
nop
[%l7+g_i], %g1
ld mov %g1, %i0
restore
jmp %o7+8
nop
and for -fPIC:
f:
%sp, -96, %sp
save %hi(_GLOBAL_OFFSET_TABLE_-8), %l7
sethi add %l7, %lo(_GLOBAL_OFFSET_TABLE_-4), %l7
call __sparc_get_pc_thunk.l7
nop
%gdop_hix22(g_i), %g1
sethi xor %g1, %gdop_lox10(g_i), %g1
[%l7 + %g1], %g1, %gdop(g_i)
ld mov %g1, %i0
restore
jmp %o7+8
nop
The difference here is what is used to access GOT:
; -fpic: immediate offset in 'ld'
[%l7+g_i], %g1
ld ;
; -fPIC: loading full 32-bit offset into %l7 register
%gdop_hix22(g_i), %g1
sethi xor %g1, %gdop_lox10(g_i), %g1
[%l7 + %g1], %g1, %gdop(g_i) ld
ia64 and integer-gmp
Having dealt with sparc I’ve decided to have a look at ia64 once again (i didn’t touch it much after ghc-7.4).
ghc on ia64 has many problems. One of them is known as gprel addressing overflow. You just can’t link static ghc-7.6 binary on ia64.
Thus one of the ways to fix it is to get dynamic linking working (and a working ghci as a side effect).
Luckily, sparc fix was enough to unbreak it!
Another long-standing problem was non-working integer-gmp library (any call to that library resulted in SIGSEGV). The workaround was to use pure-haskell integer-simple library.
It broke around ghc-7.0 release and was not touched since.
After some painful debugging (linking libghc takes 2 hours on ia64, bulding from scratch - 10 hours on 4-core box) I’ve found a cause: C codegen generates data-like prototypes for function-like objects.
Consider the following snippet:
// a.c
void f(void) {}
// b.c
extern void f(void);
void * g (void)
{
return (void*)&f;
}
// c.c
extern int f;
void * g (void)
{
return (void*)&f;
}
For most popular arches you would expect the same code to be generated for b.c and c.c files. But it’s not the case for ia64. Function pointers there are not pointers to code, but pointers to a structure called function descriptor (a structure of 2 “pointers”: a pointer to code and a new gp value).
As usual once spotted the problem was easy to fix.
You would not normally write code like c.c, but in this case it was a stupid bug.
I have an idea to clean C codegen to a state when gcc’s LTO will be able to build ghc on amd64 and will be able to find such kinds of bugs at compile time.
For curious here is how assembly looks for b.c on ia64:
g:
.mminop 0
r8 = @ltoff(@fptr(f#)), gp
addl nop 0
;;
.mibr8 = [r8]
ld8 nop 0
.ret.sptk.many rp br
and for c.c:
g:
.mminop 0
r8 = @ltoffx(f#), r1
addl nop 0
;;
.mib.mov r8 = [r8], f#
ld8nop 0
.ret.sptk.many rp br
Here f# and @fptr(f#) are different objects with different access rules. @fptr for example needs one more dereference to be called/compared/whatever.
how I’ve spent an august
What works better now in ghc-HEAD
- 64-bit UNREG arches make less mistakes in integer operations
- shared libraries (and ghci) now do work at least on sparc and ia64 (i hope on ppc as well)
- integer-gmp now works on ia64
Have fun!