Most programmers are used to targeting x86 systems where char
is signed char
by default. It can lead to this assumption being baked into the code. Later,
the code might break on a system where defaults differ, causing trouble. The
first solution you might try to reach out for is -fsigned-char
compiler’s
switch, but there is a catch there which might or might not matter for a
specific code base.
Sample problematic piece of code
void function_a(wchar_t key) { if (key < CHAR_MIN || key > CHAR_MAX) { // do something } // do something else }
Typical assumptions about types in the code above:
-
CHAR_MIN
andCHAR_MAX
areint
s (in C++ they could bechar
s, but either way they are likely to be at most as large aswchar_t
); -
key
is be at leastint
(a smaller type is conceivable and would affect the evaluation, but for practical purposesint
will do).
Important things here are integer promotions and conversions defined by the language. Given above assumptions, here are relevant bits:
-
smaller type gets extended before comparison against a larger type, possibly with a change of sign;
-
when change of sign occurs, it occurs after conversion to a larger type.
So, char
compared against unsigned int
first becomes int
and then
unsigned int
. In code it would look like (unsigned int)(int)ch
and not
like (unsigned int)(unsigned char)ch
. Which makes perfect sense as it
keeps -1
intact when converting char
to int
, for example.
Now, as the code above suggests, the issue occurs when you compare wchar_t
and integer (you can cast it to char
explicitly, it won’t matter) or a
char
. And the problem is that -f[un]signed-char
affects whether char
means signed char
or unsigned char
, but it keeps wchar_t
as is. This
implies that use of -f[un]signed-char
will lead to wchar_t
and char
not
being simultaneously signed or unsigned on some platforms. And because there is
no “unsigned/signed wchar_t
”, you’re in trouble.
Let’s take key < CHAR_MIN
as an example and see what it can mean for an
unsigned wchar_t
and signed char
in terms of transformations (type of
CHAR_MIN
isn’t affected, but its value is):
key < -127
key < (int)-127
key < (unsigned int)(int)-127
key < (unsigned int)0xffffff81
- Always evaluates to
false
The check becomes meaningless on platforms with unsigned wchar_t
when you
compile the condition with -fsigned-char
.
Another way to trip over
Equality comparison of wchar_t
and char
can also be problematic when value
of a char
is greater than 127 or less than 0 in a code like this:
void function_b(wchar_t key, char ch) { if (key == ch) { // do something } // do something else }
Of course, there are much more snippets which would be susceptible to this problem, but these two are probably the most representative.
Fix attempt
You can’t really fix it as long as -fsigned-char
is kept. It is possible to
do this, which will sort of work:
if ((int)key < (int)CHAR_MIN || (int)key > (int)CHAR_MAX) {
But it’s easy to see it breaking for values of key
greater than INT_MAX
.
Which is arguably isn’t practically an issue, but still isn’t a great idea.
What you really want is this:
if ((signed wchar_t)key < CHAR_MIN || (signed wchar_t)key > CHAR_MAX) {
This might even work, but I wouldn’t bet on it. Standards don’t seem
to guarantee signed
or unsigned
modifiers to be applicable to wchar_t
.
An actual solution
Instead of using -fsigned-char
, which affects char
everywhere, it’s better
to just add signed
or unsigned
modifier to char
types where needed. Most
of the code that uses char
should work just fine regardless whether its
signed or not. In those places where it does matter it’s easy to be explicit.
You don’t even need to change the operation you do on data, just specify if
char
is signed there.