signed char or unsigned char?
Yesterday I debugged an interesting
bug: sqlite
test suite was hanging up
on csv
parsing test on powerpc32
and powerpc64
platforms. But
other platforms were fine! ia64
was ok, hppa
was ok, sparc
was ok. Thus it’s not (just) endianness issue or stack growth
direction. What could it be?
It took me a while to debug the issue but it boiled down to an infinite
loop of trying to find EOF
when reading a file. Let’s look at the
simplified version of buggy code in sqlite
:
#include <stdio.h> /* #define EOF (-1) */
int main() {
int n = 0;
FILE * f = fopen ("/dev/null", "rb"); // supposed to have 0 bytes
for (;;) {
char c = fgetc (f); // truncate 'int' to char
if (c == EOF) break;
++n;
}
return n;
}
The code is supposed reach end of file (or maybe 'xFF'
symbol) and
finish. Normally it’s exactly what happens:
$ x86_64-pc-linux-gnu-gcc a.c -o a && ./a && echo $?
0
But not on powerpc64
:
$ powerpc64-unknown-linux-gnu-gcc a.c -o a && ./a && echo $?
<hung>
The bug here is simple: c == EOF
promotes both operands char c
and -1
to int
. Thus for EOF
case condition looks like that:
((int)(char)(-1) == -1)
So why does it never fire on powerpc64
? Because it’s an unsigned char
platform! As a result it looks like two different conditions:
((int)0xff == -1) // powerpc64
((int)(-1) == -1) // x86_64
Once we know the problem we can force x86_64
to hang as well by
using -funsigned-char
gcc
option:
$ x86_64-pc-linux-gnu-gcc a.c -o a -funsigned-char && ./a && echo $?
<hung>
I did not encounter bugs related to char signedness for quite a while.
What are other platforms defaulting to unsigned char
? I tried the simple
hack (I’ve encountered it
today due to changes in
c++11
to forbid narrowing conversion in initializers):
$ cat a.cc
char c[] = { -1 };
for cxx in /usr/bin/*-g++; do echo -n "$cxx "; $cxx -c a.cc 2>/dev/null && echo SIGNED || echo UNSIGNED; done | sort -k2
/usr/bin/afl-g++ SIGNED
/usr/bin/alpha-unknown-linux-gnu-g++ SIGNED
/usr/bin/hppa-unknown-linux-gnu-g++ SIGNED
/usr/bin/hppa2.0-unknown-linux-gnu-g++ SIGNED
/usr/bin/i686-w64-mingw32-g++ SIGNED
/usr/bin/ia64-unknown-linux-gnu-g++ SIGNED
/usr/bin/m68k-unknown-linux-gnu-g++ SIGNED
/usr/bin/mips64-unknown-linux-gnu-g++ SIGNED
/usr/bin/sh4-unknown-linux-gnu-g++ SIGNED
/usr/bin/sparc-unknown-linux-gnu-g++ SIGNED
/usr/bin/sparc64-unknown-linux-gnu-g++ SIGNED
/usr/bin/x86_64-HEAD-linux-gnu-g++ SIGNED
/usr/bin/x86_64-UNREG-linux-gnu-g++ SIGNED
/usr/bin/x86_64-pc-linux-gnu-g++ SIGNED
/usr/bin/x86_64-w64-mingw32-g++ SIGNED
/usr/bin/aarch64-unknown-linux-gnu-g++ UNSIGNED
/usr/bin/armv5tel-softfloat-linux-gnueabi-g++ UNSIGNED
/usr/bin/armv7a-hardfloat-linux-gnueabi-g++ UNSIGNED
/usr/bin/armv7a-unknown-linux-gnueabi-g++ UNSIGNED
/usr/bin/powerpc-unknown-linux-gnu-g++ UNSIGNED
/usr/bin/powerpc64-unknown-linux-gnu-g++ UNSIGNED
/usr/bin/powerpc64le-unknown-linux-gnu-g++ UNSIGNED
/usr/bin/s390x-unknown-linux-gnu-g++ UNSIGNED
Or in a shorter form:
- signed:
alpha
,hppa
,x86
,ia64
,m68k
,mips
,sh
,sparc
- unsigned:
arm
,powerpc
,s390
Why would compiler prefer one signedness over another? The answer is the underlying Instruction Set Architecture. Or … not :), read on!
Let’s look at generated code for two simple functions fetching single char from memory into register and compare generated code:
signed long sc2sl (signed char * p) { return *p; }
unsigned long uc2ul (unsigned char * p) { return *p; }
Alpha
Alpha
is a 64-bit architecture. Does not support unaligned reads in its
basic ISA. You have been warned.
; alpha-unknown-linux-gnu-gcc -O2 -c a.c && objdump -d a.o
sc2sl:
; example: 0x12345(BB address, p)
; | 0x12346(CC address, p+1)
; v v
; mem: [ .. AA BB CC DD .. ]
; a0 = 0x12345
,1(a0) ; load address: t0 = p+1
lda t0; t0 = 0x12346 (a0 + 1)
,0(a0) ; load unaligned: v0 = *(long*)(align(p))
ldq_u v0; v0 = *(long*)0x12340
; v0 = 0xDDCCBBAA????????
,t0,v0 ; extract actual byte into MSB position
extqh v0; v0 = v0 << 16
; v0 = 0xBBAA????????0000
,56,v0 ; get sign-extended byte using arithmetic shift-right
sra v0; v0 = v0 >> 56
; v0 = 0xFFFFFFFFFFFFFFBB
ret ; return
uc2ul:
,0(a0) ; load unaligned: v0 = *(long*)(align(p))
ldq_u v0,a0,v0 ; extract byte in v0
extbl v0ret
In this case alpha
handles unsigned load slightly nicer (does not
require arithmetic shift and shift offset computation). It takes quite a
bit of time to understand sc2sl
implementation.
creemj
noted on #gentoo-alpha
BWX
ISA extension (enabled
with -mbwx
in gcc
):
; alpha-unknown-linux-gnu-gcc -O2 -mbwx -c a.c && objdump -d a.o
sc2sl:
,0(a0)
ldbu v0,v0 ; sign-extend-byte
sextb v0ret
uc2ul:
,0(a0)
ldbu v0ret
Here signed load requires one instruction to amend default-unsigned load semantics.
HPPA
(PA-RISC)
Currently HPPA
userland supports only 32-bit mode on linux
. Similar to
many RISC architectures its branching instructions take two clock cycles
to execute. By convention it means the next instruction right after
branch (bv
instruction) is also executed.
; hppa2.0-unknown-linux-gnu-gcc -O2 -c a.c && objdump -d a.o
sc2sl:
0(r26),ret0 ; load byte
ldb (rp) ; return
bv r0,s ret0,31,8,ret0 ; sign-extend 8 bits into 31
extrw
uc2ul:
(rp) ; return
bv r00(r26),ret0 ldb
Similar to Alpha
signed chars require one more arithmetic operation.
x86
64-bit mode:
; x86_64-pc-linux-gnu-gcc -O2 -c a.c && objdump -d a.o
sc2sl:
(%rdi),%rax ; load/sign-extend byte to quad
movsbq
retquc2ul:
(%rdi),%eax ; load/zero-extend byte to long
movzbl retq
Note the difference between target operands (64 vs. 32 bits). x86_64
implicitly zeroes out register part for us in 64-bit mode.
32-bit mode:
; x86_64-pc-linux-gnu-gcc -O2 -m32 -c a.c && objdump -d a.o
sc2sl:
mov 0x4(%esp),%eax
(%eax),%eax
movsbl ret
uc2ul:
mov 0x4(%esp),%eax
(%eax),%eax
movzbl ret
No surprises here. Argument is passed through stack.
ia64
ia64
“instructions” are huge. They are 128-bit long and encode 3
real instructions. Result of memory fetch is not used in the same bundle
thus we need at least two bundles to fetch and shift. (I don’t know why
yet, either in order to avoid memory stall in the same bundle or it’s a
“Write ; Read-Write” conflict on r8
in a single bundle)
; ia64-unknown-linux-gnu-gcc -O2 -c a.c && objdump -d a.o
sc2sl:
] nop.m 0x0
[MMIr8=[r32] # load byte (implicit zero-extend)
ld1 nop.i 0x0;;
] nop.m 0x0
[MIBr8=r8 # sign-extend
sxt1 .ret.sptk.many b0;;
bruc2ul:
] ld1 r8=[r32] # load byte (implicit zero-extend)
[MIBnop.i 0x0
.ret.sptk.many b0;; br
Unsigned char load requires fewer instructions (no additional shift required).
m68k
For some reason frame pointer is still preserved on -O2
. I’ve
disabled it with -fomit-frame-pointer
to make assembly shorter:
; m68k-unknown-linux-gnu-gcc -O2 -fomit-frame-pointer -c a.c && objdump -d a.o
sc2sl:
%sp@(4),%a0 ; arguments are passed through stack (as would be in i386)
moveal %a0@,%d0 ; load byte
moveb %d0 ; sign-extend result
extbl
rts
uc2ul:
%sp@(4),%a0
moveal %d0 ; zero destination register
clrl %a0@,%d0 ; load byte
moveb rts
Both functions are similar. Both require arithmetic fiddling.
mips
Similar to HPPA
has the same rule of executing one instruction after
branch instruction.
; mips64-unknown-linux-gnu-gcc -O2 -fomit-frame-pointer -c a.c && objdump -d a.o
sc2sl:
jr ra,0(a0) ; load byte (sign-extend)
lb v0
uc2ul:
jr ra,0(a0) ; load byte (zero-extend) lbu v0
Both functions are taking exactly one instruction.
SuperH
Similar to HPPA
has the same rule of executing one instruction after
branch instruction.
; sh4-unknown-linux-gnu-gcc -O2 -fomit-frame-pointer -c a.c && objdump -d a.o
sc2sl:
rtsmov.b @r4,r0 ; load byte (sign-extend)
uc2ul:
mov.b @r4,r0 ; load byte (sign-extend)
rts.b r0,r0 ; zero-extend result extu
Here unsigned load requires one instruction to amend default-signed load semantics.
SPARC
Similar to HPPA
has the same rule of executing one instruction after
branch instruction.
; sparc-unknown-linux-gnu-gcc -O2 -fomit-frame-pointer -c a.c && objdump -d a.o
sc2sl:
retl[ %o0 ], %o0
ldsb
uc2ul:
retl[ %o0 ], %o0 ldub
Both functions are taking exactly one instruction.
ARM
; armv5tel-softfloat-linux-gnueabi-gcc -O2 -fomit-frame-pointer -c a.c && armv5tel-softfloat-linux-gnueabi-objdump -d a.o
sc2sl:
, [r0] ; load/sign-extend
ldrsb r0
bx lr
uc2ul:
, [r0] ; load/zero-extend
ldrb r0 bx lr
Both functions are taking exactly one instruction.
PowerPC
PowerPC
generates quite inefficient code for -fPIC
mode. Enabling
-fno-PIC
by default.
; powerpc-unknown-linux-gnu-gcc -O2 -fomit-frame-pointer -fno-PIC -c a.c && objdump -d a.o
sc2sl:
,0(r3) ; load-byte/zero-extend
lbz r3,r3 ; sign-extend
extsb r3
blrnop
uc2ul:
,0(r3) ; load-byte/zero-extend
lbz r3 blr
Here signed load requires one instruction to amend default-unsigned load semantics.
S390
64-bit mode:
; s390x-unknown-linux-gnu-gcc -O2 -fomit-frame-pointer -fno-PIC -c a.c && objdump -d a.o
sc2sl:
%r2,8,0(%r2) ; insert-characters-under-mask-64
icmh %r2,%r2,56 ; shift-right-single-64
srag %r14
br
uc2ul:
%r2,0(%r2) ; load-logical-character
llgc %r14 br
Most esoteric instruction set :) It looks like unsigned loads are slightly shorter here.
“31”-bit mode (note -m31
):
; s390x-unknown-linux-gnu-gcc -m31 -O2 -fomit-frame-pointer -fno-PIC -c a.c && objdump -d a.o
sc2sl:
%r2,8,0(%r2) ; insert-characters-under-mask-64
icm %r2,24 ; shift-right-single
sra %r14
br
uc2ul:
%r1,0 ; load-halfword-immediate
lhi %r1,0(%r2) ; insert-character
ic %r2,%r1 ; register-to-register(?) move
lr %r14 br
Surprisingly in 31-bit mode signed stores are slightly shorter. But it
looks like uc2ul
could be shorter by eliminating lr
.
Parting words
At least from ISA standpoint some architectures treat signed char
and unsigned char
equally and could pick any signedness. Others
differ quite a bit.
Here is my silly table:
architecture | signedness | preferred signedness | match |
---|---|---|---|
alpha | SIGNED | UNSIGNED | NO |
arm | UNSIGNED | AMBIVALENT | YES |
hppa | SIGNED | UNSIGNED | NO |
ia64 | SIGNED | UNSIGNED | NO |
m68k | SIGNED | AMBIVALENT | YES |
mips | SIGNED | AMBIVALENT | YES |
powerpc | UNSIGNED | UNSIGNED | YES |
s390(64) | UNSIGNED | UNSIGNED | YES |
sh | SIGNED | SIGNED | YES |
sparc | SIGNED | AMBIVALENT | YES |
x86 | SIGNED | AMBIVALENT | YES |
What do we see here:
alpha
follows the majority of architecture in char signedness but pays for it a lot.arm
could have been signed just fine (for this tiny silly test)hppa
andia64
might be unsigned and balance the table a bit (6/5 versus 8/3) :)
Have fun!