reversed(top()) code tags rss about

How -f[un]signed-char can break your code

October 25, 2019
[programming] [issue] [c] [c++]

Most programmers are used to targeting x86 systems where char is signed char by default. It can lead to this assumption being baked into the code. Later, the code might break on a system where defaults differ, causing trouble. The first solution you might try to reach out for is -fsigned-char compiler’s switch, but there is a catch there which might or might not matter for a specific code base.

Sample problematic piece of code

void
function_a(wchar_t key)
{
    if (key < CHAR_MIN || key > CHAR_MAX) {
        // do something
    }
    // do something else
}

Typical assumptions about types in the code above:

  • CHAR_MIN and CHAR_MAX are ints (in C++ they could be chars, but either way they are likely to be at most as large as wchar_t);

  • key is be at least int (a smaller type is conceivable and would affect the evaluation, but for practical purposes int will do).

Important things here are integer promotions and conversions defined by the language. Given above assumptions, here are relevant bits:

  • smaller type gets extended before comparison against a larger type, possibly with a change of sign;

  • when change of sign occurs, it occurs after conversion to a larger type.

So, char compared against unsigned int first becomes int and then unsigned int. In code it would look like (unsigned int)(int)ch and not like (unsigned int)(unsigned char)ch. Which makes perfect sense as it keeps -1 intact when converting char to int, for example.

Now, as the code above suggests, the issue occurs when you compare wchar_t and integer (you can cast it to char explicitly, it won’t matter) or a char. And the problem is that -f[un]signed-char affects whether char means signed char or unsigned char, but it keeps wchar_t as is. This implies that use of -f[un]signed-char will lead to wchar_t and char not being simultaneously signed or unsigned on some platforms. And because there is no “unsigned/signed wchar_t”, you’re in trouble.

Let’s take key < CHAR_MIN as an example and see what it can mean for an unsigned wchar_t and signed char in terms of transformations (type of CHAR_MIN isn’t affected, but its value is):

  1. key < -127
  2. key < (int)-127
  3. key < (unsigned int)(int)-127
  4. key < (unsigned int)0xffffff81
  5. Always evaluates to false

The check becomes meaningless on platforms with unsigned wchar_t when you compile the condition with -fsigned-char.

Another way to trip over

Equality comparison of wchar_t and char can also be problematic when value of a char is greater than 127 or less than 0 in a code like this:

void
function_b(wchar_t key, char ch)
{
    if (key == ch) {
        // do something
    }
    // do something else
}

Of course, there are much more snippets which would be susceptible to this problem, but these two are probably the most representative.

Fix attempt

You can’t really fix it as long as -fsigned-char is kept. It is possible to do this, which will sort of work:

if ((int)key < (int)CHAR_MIN || (int)key > (int)CHAR_MAX) {

But it’s easy to see it breaking for values of key greater than INT_MAX. Which is arguably isn’t practically an issue, but still isn’t a great idea.

What you really want is this:

if ((signed wchar_t)key < CHAR_MIN || (signed wchar_t)key > CHAR_MAX) {

This might even work, but I wouldn’t bet on it. Standards don’t seem to guarantee signed or unsigned modifiers to be applicable to wchar_t.

An actual solution

Instead of using -fsigned-char, which affects char everywhere, it’s better to just add signed or unsigned modifier to char types where needed. Most of the code that uses char should work just fine regardless whether its signed or not. In those places where it does matter it’s easy to be explicit. You don’t even need to change the operation you do on data, just specify if char is signed there.