fixing ghc on sparc ia64 and friends
The ghc-7.8.1
release
was the first release where ghc
binary (and tools like ghc-pkg
,
hpc
, hsc2hs
, etc.) was dynamically linked against haskell
libraries shipped with ghc
. At that time all the below crashed when
trying to use dynamic linking in ghc-7.8
:
Having upstreamed simplest build fixes for unregisterised arches I foolishly hoped all other architectures would Just Work, but fun stuff only started to happen.
amd64
bugs
Being bitten by #8748
before trying real exotics like sparc
I first decided to check how
good (or bad) UNREG
amd64
build was.
To get a feeling about state of some platform you only need to build
ghc
and run regression tests:
./configure --enable-unregisterised
make
make fulltest # run regtest
There was about 200 tests of 4000 broken (which is quite nice, but not
ideal). The first thing caught my eye was broken arithmetic in rare
cases like
integerConversions
.
Tests gave incorrect results for code like:
Prelude> 4 - 2^64 == 0
True
Thinking about it as The Root Cause of all SIGSEGV
s I have ever seen
I’ve started trimming bits down to minimal .cmm
snippet. And got this:
= -1;
CInt a return (a == -1)
# returns False
At that stage exploring generated C code is trivial which gave:
= (StgWord32)0xFFFFffffFFFFffffu;
StgWord64 a return (a == 0xFFFFffffFFFFffffu)
The bug here is in pretty-printing 32-bit constant, which was easy to fix with Reid’s help.
sparc
bug
But integer-literals fix didn’t help dynamic binaries on sparc
and
I moved to explore on that box.
The immediate symptom on sparc
was a SIGSEGV
somewhere before
program’s main()
entry point. And that was very weird.
I expected C part of haskell
runtime to run first and crash later. My
plan was to break on main()
and step-by-step get basic understanding
on what was going on.
Thus I tried to update binutils
, gcc
and glibc
first in hope
of some toolchain bug. No luck, the bug persisted in exactly the same
form.
Backtrace of core dump suggested crash was happening somewhere in
foreignExportStablePtr
function, which gets registered at binary (or
library) load time
before
your main()
entry point. That __attribute__((constructor));
makes the magic happen.
For such haskell
code
module M where
import Foreign.C.Types
f :: CInt -> CInt
foreign export ccallf :: CInt -> CInt
= n + 1 f n
ghc
basically generates the following stub:
extern StgClosure M_zdfstableZZC0ZZCmainZZCMZZCf_closure;
(HsInt32 a1)
HsInt32 f{
*cap;
Capability ;
HaskellObj ret;
HsInt32 cret= rts_lock();
cap (&cap,rts_apply(cap,(HaskellObj)runNonIO_closure,rts_apply(cap,&M_zdfstableZZC0ZZCma
rts_evalIO,rts_mkInt32(cap,a1))) ,&ret);
inZZCMZZCf_closure("f",cap);
rts_checkSchedStatus=rts_getInt32(ret);
cret(cap);
rts_unlockreturn cret;
}
/* our static constructor */
static void stginit_export_M_zdfstableZZC0ZZCmainZZCMZZCf() __attribute__((constructor));
static void stginit_export_M_zdfstableZZC0ZZCmainZZCMZZCf()
{foreignExportStablePtr((StgPtr) &M_zdfstableZZC0ZZCmainZZCMZZCf_closure);}
sparc
solution
Having smaller program it’s once against simpler to explore the
breakage. RISC
-style CPUs are fun creatures. To feel all the delight
of looking at the sparc
assembly I propose to look at the generated
code for the following C snippet:
unsigned unt f(void)
{
return 0x1234ABCD;
}
i386
easily puts a value into a register:
$0x1234ABCD, %eax movl
while sparc32
has hard time:
%hi(0x1234A800), %o0
sethi or %o0, 0x3CD, %o0
You can’t encode 32-bit immediate in a 32-bit instruction containing a
tuple of (opcode, dest-reg, imm)
. Opcode usually takes 5 bits,
register 5 bits (32 regs
) and imm
gets only 22 bits. On 2-operand
instructions (opcode, src-reg, imm)
we get even less.
Thus sparc32
had to add special instruction to their ISA setting
high bits for a given reg (o0
in our case).
Things are even worse for sparc64
and ppc64
where instructions
are 32-bit, but registers are 64-bit wide.
x86_64
easily puts a 64-bit value into a register:
$0x1234ABCD5678DCBA, %rax movabs
while sparc64
does something very verbose:
%hi(0x1234a800), %o0
sethi %hi(0x5678dc00), %g1
sethi or %o0, 0x3CD, %o0
or %g1, 0xBA, %g1
%o0, 32, %o0
sllx add %o0, %g1, %o0
The above is important because sometimes those constants are not known at compile (and assembly!) time, but known only at link time.
ghc
driver pipeline
What ghc
basically does when compiler .hs
file on -fvia-C
(aka
UNREG
) arch is the following:
.hs -> ... -> .hc
file (haskell-to-C
pass).hc -> .s
(haskell-to-asm
pass).s -> .s
(asm-to-asm
mangling pass, no-op inUNREG
mode).s -> .o
(asm-to-object
pass)
The bug was introduced into pipeline when -dynamic
way was added to
ghc
(to build dynamic haskell
libraries or position-independent
executables).
On many architectures libraries require position independent code layout
(so called PIC
). It’s controlled by -fpic
/ -fPIC
set of
gcc
(and ghc
) flags.
ghc
passed -fPIC
option to [1.]
and [2.]
, but not to
[4.]
(!) where assembler needs to generate either absolute or relative
relocation types for the following asm snippet:
; load GOT address into %l7
%hi(_GLOBAL_OFFSET_TABLE_-8), %l7
sethi add %l7, %lo(_GLOBAL_OFFSET_TABLE_-4), %l7
The bug was easily
fixed
when identified. Now shared haskell
libraries work on sparc
!
a bit more on sparc
relocations
Sometimes there is many ways to generate PIC
code even for a single
given arch. On sparc
for example there is at least:
-fpic
option (22-bit relocations)-fPIC
(32-bit relocations)
If you are curious:
extern int g_i;
int * f(void)
{
return &g_i;
}
Generates the following assembly for -fpic
:
f:
%sp, -96, %sp
save %hi(_GLOBAL_OFFSET_TABLE_-8), %l7
sethi add %l7, %lo(_GLOBAL_OFFSET_TABLE_-4), %l7
call __sparc_get_pc_thunk.l7
nop
[%l7+g_i], %g1
ld mov %g1, %i0
restore
jmp %o7+8
nop
and for -fPIC
:
f:
%sp, -96, %sp
save %hi(_GLOBAL_OFFSET_TABLE_-8), %l7
sethi add %l7, %lo(_GLOBAL_OFFSET_TABLE_-4), %l7
call __sparc_get_pc_thunk.l7
nop
%gdop_hix22(g_i), %g1
sethi xor %g1, %gdop_lox10(g_i), %g1
[%l7 + %g1], %g1, %gdop(g_i)
ld mov %g1, %i0
restore
jmp %o7+8
nop
The difference here is what is used to access GOT
:
; -fpic: immediate offset in 'ld'
[%l7+g_i], %g1
ld ;
; -fPIC: loading full 32-bit offset into %l7 register
%gdop_hix22(g_i), %g1
sethi xor %g1, %gdop_lox10(g_i), %g1
[%l7 + %g1], %g1, %gdop(g_i) ld
ia64
and integer-gmp
Having dealt with sparc
I’ve decided to have a look at ia64
once again (i didn’t touch it much after ghc-7.4
).
ghc
on ia64
has many problems. One of them is known as gprel
addressing overflow.
You just can’t link static ghc-7.6
binary on ia64
.
Thus one of the ways to fix it is to get dynamic linking working (and a
working ghci
as a side effect).
Luckily, sparc
fix was enough to unbreak it!
Another long-standing problem was non-working integer-gmp
library
(any call to that library resulted in SIGSEGV
). The workaround was
to use pure-haskell
integer-simple
library.
It broke around ghc-7.0
release and was not touched since.
After some painful debugging (just linking libghc
takes 2 hours on
ia64
, bulding from scratch - 10 hours on 4-core box) I’ve found a
cause: C
code generator generates data
-like prototypes for
function
-like objects. Consider the following snippet:
// a.c
void f(void) {}
// b.c
extern void f(void);
void * g (void)
{
return (void*)&f;
}
// c.c
extern int f;
void * g (void)
{
return (void*)&f;
}
For most popular arches you would expect the same code to be generated
for b.c
and c.c
files. But it’s not the case for ia64
.
Function pointers there are not pointers to code, but pointers to a
structure called function descriptor (a structure of 2 “pointers”:
a pointer to code and a new gp
value).
As usual once spotted the problem was easy to fix.
You would not normally write code like c.c
, but in this case it was
a stupid bug.
I have an idea to clean C code generator to a state when gcc
's LTO
will be able to build ghc
on amd64
and will be able to find such
kinds of bugs at compile/link time.
For curious here is how assembly looks for b.c
on ia64
:
g:
.mminop 0
r8 = @ltoff(@fptr(f#)), gp
addl nop 0
;;
.mibr8 = [r8]
ld8 nop 0
.ret.sptk.many rp br
and for c.c
:
g:
.mminop 0
r8 = @ltoffx(f#), r1
addl nop 0
;;
.mib.mov r8 = [r8], f#
ld8nop 0
.ret.sptk.many rp br
Here f#
and @fptr(f#)
are different objects with different access
rules. @fptr
for example needs one more dereference to be
called/compared/whatever.
how I’ve spent an august
What works better now in ghc-HEAD
- 64-bit
UNREG
arches make less mistakes in integer operations - shared libraries (and
ghci
) now do work at least onsparc
andia64
(i hope onppc
as well) integer-gmp
now works onia64
Have fun!