Most programmers are used to targeting x86 systems where char is signed char
by default. It can lead to this assumption being baked into the code. Later,
the code might break on a system where defaults differ, causing trouble. The
first solution you might try to reach out for is -fsigned-char compiler’s
switch, but there is a catch there which might or might not matter for a
specific code base.
Sample problematic piece of code
void function_a(wchar_t key) { if (key < CHAR_MIN || key > CHAR_MAX) { // do something } // do something else }
Typical assumptions about types in the code above:
-
CHAR_MINandCHAR_MAXareints (in C++ they could bechars, but either way they are likely to be at most as large aswchar_t); -
keyis be at leastint(a smaller type is conceivable and would affect the evaluation, but for practical purposesintwill do).
Important things here are integer promotions and conversions defined by the language. Given above assumptions, here are relevant bits:
-
smaller type gets extended before comparison against a larger type, possibly with a change of sign;
-
when change of sign occurs, it occurs after conversion to a larger type.
So, char compared against unsigned int first becomes int and then
unsigned int. In code it would look like (unsigned int)(int)ch and not
like (unsigned int)(unsigned char)ch. Which makes perfect sense as it
keeps -1 intact when converting char to int, for example.
Now, as the code above suggests, the issue occurs when you compare wchar_t
and integer (you can cast it to char explicitly, it won’t matter) or a
char. And the problem is that -f[un]signed-char affects whether char
means signed char or unsigned char, but it keeps wchar_t as is. This
implies that use of -f[un]signed-char will lead to wchar_t and char not
being simultaneously signed or unsigned on some platforms. And because there is
no “unsigned/signed wchar_t”, you’re in trouble.
Let’s take key < CHAR_MIN as an example and see what it can mean for an
unsigned wchar_t and signed char in terms of transformations (type of
CHAR_MIN isn’t affected, but its value is):
key < -127key < (int)-127key < (unsigned int)(int)-127key < (unsigned int)0xffffff81- Always evaluates to
false
The check becomes meaningless on platforms with unsigned wchar_t when you
compile the condition with -fsigned-char.
Another way to trip over
Equality comparison of wchar_t and char can also be problematic when value
of a char is greater than 127 or less than 0 in a code like this:
void function_b(wchar_t key, char ch) { if (key == ch) { // do something } // do something else }
Of course, there are much more snippets which would be susceptible to this problem, but these two are probably the most representative.
Fix attempt
You can’t really fix it as long as -fsigned-char is kept. It is possible to
do this, which will sort of work:
if ((int)key < (int)CHAR_MIN || (int)key > (int)CHAR_MAX) {
But it’s easy to see it breaking for values of key greater than INT_MAX.
Which is arguably isn’t practically an issue, but still isn’t a great idea.
What you really want is this:
if ((signed wchar_t)key < CHAR_MIN || (signed wchar_t)key > CHAR_MAX) {
This might even work, but I wouldn’t bet on it. Standards don’t seem
to guarantee signed or unsigned modifiers to be applicable to wchar_t.
An actual solution
Instead of using -fsigned-char, which affects char everywhere, it’s better
to just add signed or unsigned modifier to char types where needed. Most
of the code that uses char should work just fine regardless whether its
signed or not. In those places where it does matter it’s easy to be explicit.
You don’t even need to change the operation you do on data, just specify if
char is signed there.