signed char or unsigned char?
Yesterday I debugged an interesting bug: sqlite test suite was hanging up on csv parsing test on powerpc32 and powerpc64 platforms. But other platforms were fine! ia64 was ok, hppa was ok, sparc was ok. Thus it’s not (just) endianness issue or stack growth direction. What could it be?
It took me a while to debug the issue but it boiled down to an infinite loop of trying to find EOF when reading a file. Let’s look at the simplified version of buggy code in sqlite:
#include <stdio.h> /* #define EOF (-1) */
int main() {
int n = 0;
FILE * f = fopen ("/dev/null", "rb"); // supposed to have 0 bytes
for (;;) {
char c = fgetc (f); // truncate 'int' to char
if (c == EOF) break;
++n;
}
return n;
}
The code is supposed reach end of file (or maybe ‘xFF’ symbol) and finish. Normally it’s exactly what happens:
$ x86_64-pc-linux-gnu-gcc a.c -o a && ./a && echo $?
0
But not on powerpc64:
$ powerpc64-unknown-linux-gnu-gcc a.c -o a && ./a && echo $?
<hung>
The bug here is simple: c == EOF promotes both operands char c and -1 to int. Thus for EOF case condition looks like that:
((int)(char)(-1) == -1)
So why does it never fire on powerpc64? Because it’s an unsigned char platform! As a result it looks like two different conditions:
((int)0xff == -1) // powerpc64
((int)(-1) == -1) // x86_64
Once we know the problem we can force x86_64 to hang as well by using -funsigned-char gcc option:
$ x86_64-pc-linux-gnu-gcc a.c -o a -funsigned-char && ./a && echo $?
<hung>
I did not encounter bugs related to char signedness for quite a while.
What are other platforms defaulting to unsigned char? I tried the simple hack (I’ve encountered it today due to changes in c++11 to forbid narrowing conversion in initializers):
$ cat a.cc
char c[] = { -1 };
for cxx in /usr/bin/*-g++; do echo -n "$cxx "; $cxx -c a.cc 2>/dev/null && echo SIGNED || echo UNSIGNED; done | sort -k2
/usr/bin/afl-g++ SIGNED
/usr/bin/alpha-unknown-linux-gnu-g++ SIGNED
/usr/bin/hppa-unknown-linux-gnu-g++ SIGNED
/usr/bin/hppa2.0-unknown-linux-gnu-g++ SIGNED
/usr/bin/i686-w64-mingw32-g++ SIGNED
/usr/bin/ia64-unknown-linux-gnu-g++ SIGNED
/usr/bin/m68k-unknown-linux-gnu-g++ SIGNED
/usr/bin/mips64-unknown-linux-gnu-g++ SIGNED
/usr/bin/sh4-unknown-linux-gnu-g++ SIGNED
/usr/bin/sparc-unknown-linux-gnu-g++ SIGNED
/usr/bin/sparc64-unknown-linux-gnu-g++ SIGNED
/usr/bin/x86_64-HEAD-linux-gnu-g++ SIGNED
/usr/bin/x86_64-UNREG-linux-gnu-g++ SIGNED
/usr/bin/x86_64-pc-linux-gnu-g++ SIGNED
/usr/bin/x86_64-w64-mingw32-g++ SIGNED
/usr/bin/aarch64-unknown-linux-gnu-g++ UNSIGNED
/usr/bin/armv5tel-softfloat-linux-gnueabi-g++ UNSIGNED
/usr/bin/armv7a-hardfloat-linux-gnueabi-g++ UNSIGNED
/usr/bin/armv7a-unknown-linux-gnueabi-g++ UNSIGNED
/usr/bin/powerpc-unknown-linux-gnu-g++ UNSIGNED
/usr/bin/powerpc64-unknown-linux-gnu-g++ UNSIGNED
/usr/bin/powerpc64le-unknown-linux-gnu-g++ UNSIGNED
/usr/bin/s390x-unknown-linux-gnu-g++ UNSIGNED
Or in a shorter form:
- signed: alpha, hppa, x86, ia64, m68k, mips, sh, sparc
- unsigned: arm, powerpc, s390
Why would compiler prefer one signedness over another? The answer is the underlying Instruction Set Architecture. Or … not :), read on!.
Let’s look at generated code for two simple functions fetching single char from memory into register and compare generated code:
signed long sc2sl (signed char * p) { return *p; }
unsigned long uc2ul (unsigned char * p) { return *p; }
Alpha
Alpha is a 64-bit architecture. Does not support unaligned reads in its basic ISA. You have been warned.
; alpha-unknown-linux-gnu-gcc -O2 -c a.c && objdump -d a.o
sc2sl:
; example: 0x12345(BB address, p)
; | 0x12346(CC address, p+1)
; v v
; mem: [ .. AA BB CC DD .. ]
; a0 = 0x12345
,1(a0) ; load address: t0 = p+1
lda t0; t0 = 0x12346 (a0 + 1)
,0(a0) ; load unaligned: v0 = *(long*)(align(p))
ldq_u v0; v0 = *(long*)0x12340
; v0 = 0xDDCCBBAA????????
,t0,v0 ; extract actual byte into MSB position
extqh v0; v0 = v0 << 16
; v0 = 0xBBAA????????0000
,56,v0 ; get sign-extended byte using arithmetic shift-right
sra v0; v0 = v0 >> 56
; v0 = 0xFFFFFFFFFFFFFFBB
ret ; return
uc2ul:
,0(a0) ; load unaligned: v0 = *(long*)(align(p))
ldq_u v0,a0,v0 ; extract byte in v0
extbl v0ret
In this case Alpha handles unsigned load slightly nicer (does not require arithmetic shift and shift offset computation). It takes quite a bit of time to understand sc2sl implementation.
creemj noted on #gentoo-alpha BWX ISA extension (enabled with -mbwx in gcc):
; alpha-unknown-linux-gnu-gcc -O2 -mbwx -c a.c && objdump -d a.o
sc2sl:
,0(a0)
ldbu v0,v0 ; sign-extend-byte
sextb v0ret
uc2ul:
,0(a0)
ldbu v0ret
Here signed load requires one instruction to amend default-unsigned load semantics.
HPPA (PA-RISC)
Currently HPPA userland supports only 32-bit mode on linux. Similar to many RISC architectures its branching instructions take two clock cycles to execute. By convention it means the next instruction right after branch (bv) is also executed.
; hppa2.0-unknown-linux-gnu-gcc -O2 -c a.c && objdump -d a.o
sc2sl:
0(r26),ret0 ; load byte
ldb (rp) ; return
bv r0,s ret0,31,8,ret0 ; sign-extend 8 bits into 31
extrw
uc2ul:
(rp) ; return
bv r00(r26),ret0 ldb
Similar to Alpha signed chars require one more arithmetic operation.
x86
64-bit mode:
; x86_64-pc-linux-gnu-gcc -O2 -c a.c && objdump -d a.o
sc2sl:
(%rdi),%rax ; load/sign-extend byte to quad
movsbq
retquc2ul:
(%rdi),%eax ; load/zero-extend byte to long
movzbl retq
Note the difference between target operands (64 vs. 32 bits). x86-64 implicitly zeroes out register part for us in 64-bit mode.
32-bit mode:
; x86_64-pc-linux-gnu-gcc -O2 -m32 -c a.c && objdump -d a.o
sc2sl:
mov 0x4(%esp),%eax
(%eax),%eax
movsbl ret
uc2ul:
mov 0x4(%esp),%eax
(%eax),%eax
movzbl ret
No surprises here. Argument is passed through stack.
ia64
ia64 “instructions” are huge. They are 128-bit long and encode 3 real instructions. Result of memory fetch is not used in the same bundle thus we need at least two bundles to fetch and shift. (I don’t know why yet, either in order to avoid memory stall in the same bundle or it’s a “Write ; Read-Write” conflict on r8 in a single bundle)
; ia64-unknown-linux-gnu-gcc -O2 -c a.c && objdump -d a.o
sc2sl:
] nop.m 0x0
[MMIr8=[r32] # load byte (implicit zero-extend)
ld1 nop.i 0x0;;
] nop.m 0x0
[MIBr8=r8 # sign-extend
sxt1 .ret.sptk.many b0;;
bruc2ul:
] ld1 r8=[r32] # load byte (implicit zero-extend)
[MIBnop.i 0x0
.ret.sptk.many b0;; br
Unsigned char load requires fewer instructions (no additional shift required).
m68k
For some reason frame pointer is still preserved on -O2. I’ve disabled it with -fomit-frame-pointer to make assembly shorter:
; m68k-unknown-linux-gnu-gcc -O2 -fomit-frame-pointer -c a.c && objdump -d a.o
sc2sl:
%sp@(4),%a0 ; arguments are passed through stack (as would be in i386)
moveal %a0@,%d0 ; load byte
moveb %d0 ; sign-extend result
extbl
rts
uc2ul:
%sp@(4),%a0
moveal %d0 ; zero destination register
clrl %a0@,%d0 ; load byte
moveb rts
Both functions are similar. Both require arithmetic fiddling.
mips
Similar to HPPA has the same rule of executing one instruction after branch instruction.
; mips64-unknown-linux-gnu-gcc -O2 -fomit-frame-pointer -c a.c && objdump -d a.o
sc2sl:
jr ra,0(a0) ; load byte (sign-extend)
lb v0
uc2ul:
jr ra,0(a0) ; load byte (zero-extend) lbu v0
Both functions are taking exactly one instruction.
SuperH
Similar to HPPA has the same rule of executing one instruction after branch instruction.
; sh4-unknown-linux-gnu-gcc -O2 -fomit-frame-pointer -c a.c && objdump -d a.o
sc2sl:
rtsmov.b @r4,r0 ; load byte (sign-extend)
uc2ul:
mov.b @r4,r0 ; load byte (sign-extend)
rts.b r0,r0 ; zero-extend result extu
Here unsigned load requires one instruction to amend default-signed load semantics.
SPARC
Similar to HPPA has the same rule of executing one instruction after branch instruction.
; sparc-unknown-linux-gnu-gcc -O2 -fomit-frame-pointer -c a.c && objdump -d a.o
sc2sl:
retl[ %o0 ], %o0
ldsb
uc2ul:
retl[ %o0 ], %o0 ldub
Both functions are taking exactly one instruction.
ARM
; armv5tel-softfloat-linux-gnueabi-gcc -O2 -fomit-frame-pointer -c a.c && armv5tel-softfloat-linux-gnueabi-objdump -d a.o
sc2sl:
, [r0] ; load/sign-extend
ldrsb r0
bx lr
uc2ul:
, [r0] ; load/zero-extend
ldrb r0 bx lr
Both functions are taking exactly one instruction.
PowerPC
Powerpc generates quite inefficient code for -fPIC mode. Enabling -fno-PIC by default.
; powerpc-unknown-linux-gnu-gcc -O2 -fomit-frame-pointer -fno-PIC -c a.c && objdump -d a.o
sc2sl:
,0(r3) ; load-byte/zero-extend
lbz r3,r3 ; sign-extend
extsb r3
blrnop
uc2ul:
,0(r3) ; load-byte/zero-extend
lbz r3 blr
Here signed load requires one instruction to amend default-unsigned load semantics.
S390
64-bit mode:
; s390x-unknown-linux-gnu-gcc -O2 -fomit-frame-pointer -fno-PIC -c a.c && objdump -d a.o
sc2sl:
%r2,8,0(%r2) ; insert-characters-under-mask-64
icmh %r2,%r2,56 ; shift-right-single-64
srag %r14
br
uc2ul:
%r2,0(%r2) ; load-logical-character
llgc %r14 br
Most esoteric instruction set :) It looks like unsigned loads are slightly shorter here.
“31”-bit mode (note -m31):
; s390x-unknown-linux-gnu-gcc -m31 -O2 -fomit-frame-pointer -fno-PIC -c a.c && objdump -d a.o
sc2sl:
%r2,8,0(%r2) ; insert-characters-under-mask-64
icm %r2,24 ; shift-right-single
sra %r14
br
uc2ul:
%r1,0 ; load-halfword-immediate
lhi %r1,0(%r2) ; insert-character
ic %r2,%r1 ; register-to-register(?) move
lr %r14 br
Surprisingly in 31-bit mode signed stores are slightly shorter. But it looks like uc2ul could be shorter by eliminating lr.
Parting words
At least from ISA standpoint some architectures treat signed char and unsigned char equally and could pick any signedness. Others differ quite a bit.
Here is my silly table:
architecture | signedness | preferred signedness | match |
---|---|---|---|
alpha | SIGNED | UNSIGNED | NO |
arm | UNSIGNED | AMBIVALENT | YES |
hppa | SIGNED | UNSIGNED | NO |
ia64 | SIGNED | UNSIGNED | NO |
m68k | SIGNED | AMBIVALENT | YES |
mips | SIGNED | AMBIVALENT | YES |
powerpc | UNSIGNED | UNSIGNED | YES |
s390(64) | UNSIGNED | UNSIGNED | YES |
sh | SIGNED | SIGNED | YES |
sparc | SIGNED | AMBIVALENT | YES |
x86 | SIGNED | AMBIVALENT | YES |
What do we see here:
- alpha follows the majority of architecture in char signedness but pays for it a lot.
- arm could have been signed just fine (for this tiny silly test)
- hppa and ia64 might be unsigned and balance the table a bit (6/5 versus 8/3) :)
Have fun!