File: signed-char-misuse.rst

package info (click to toggle)
swiftlang 6.0.3-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,519,992 kB
  • sloc: cpp: 9,107,863; ansic: 2,040,022; asm: 1,135,751; python: 296,500; objc: 82,456; f90: 60,502; lisp: 34,951; pascal: 19,946; sh: 18,133; perl: 7,482; ml: 4,937; javascript: 4,117; makefile: 3,840; awk: 3,535; xml: 914; fortran: 619; cs: 573; ruby: 573
file content (118 lines) | stat: -rw-r--r-- 4,654 bytes parent folder | download | duplicates (10)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
.. title:: clang-tidy - bugprone-signed-char-misuse

bugprone-signed-char-misuse
===========================

`cert-str34-c` redirects here as an alias for this check. For the CERT alias,
the `DiagnoseSignedUnsignedCharComparisons` option is set to `false`.

Finds those ``signed char`` -> integer conversions which might indicate a
programming error. The basic problem with the ``signed char``, that it might
store the non-ASCII characters as negative values. This behavior can cause a
misunderstanding of the written code both when an explicit and when an
implicit conversion happens.

When the code contains an explicit ``signed char`` -> integer conversion, the
human programmer probably expects that the converted value matches with the
character code (a value from [0..255]), however, the actual value is in
[-128..127] interval. To avoid this kind of misinterpretation, the desired way
of converting from a ``signed char`` to an integer value is converting to
``unsigned char`` first, which stores all the characters in the positive [0..255]
interval which matches the known character codes.

In case of implicit conversion, the programmer might not actually be aware
that a conversion happened and char value is used as an integer. There are
some use cases when this unawareness might lead to a functionally imperfect code.
For example, checking the equality of a ``signed char`` and an ``unsigned char``
variable is something we should avoid in C++ code. During this comparison,
the two variables are converted to integers which have different value ranges.
For ``signed char``, the non-ASCII characters are stored as a value in [-128..-1]
interval, while the same characters are stored in the [128..255] interval for
an ``unsigned char``.

It depends on the actual platform whether plain ``char`` is handled as ``signed char``
by default and so it is caught by this check or not. To change the default behavior
you can use ``-funsigned-char`` and ``-fsigned-char`` compilation options.

Currently, this check warns in the following cases:
- ``signed char`` is assigned to an integer variable
- ``signed char`` and ``unsigned char`` are compared with equality/inequality operator
- ``signed char`` is converted to an integer in the array subscript

See also:
`STR34-C. Cast characters to unsigned char before converting to larger integer sizes
<https://wiki.sei.cmu.edu/confluence/display/c/STR34-C.+Cast+characters+to+unsigned+char+before+converting+to+larger+integer+sizes>`_

A good example from the CERT description when a ``char`` variable is used to
read from a file that might contain non-ASCII characters. The problem comes
up when the code uses the ``-1`` integer value as EOF, while the 255 character
code is also stored as ``-1`` in two's complement form of char type.
See a simple example of this below. This code stops not only when it reaches
the end of the file, but also when it gets a character with the 255 code.

.. code-block:: c++

  #define EOF (-1)

  int read(void) {
    char CChar;
    int IChar = EOF;

    if (readChar(CChar)) {
      IChar = CChar;
    }
    return IChar;
  }

A proper way to fix the code above is converting the ``char`` variable to
an ``unsigned char`` value first.

.. code-block:: c++

  #define EOF (-1)

  int read(void) {
    char CChar;
    int IChar = EOF;

    if (readChar(CChar)) {
      IChar = static_cast<unsigned char>(CChar);
    }
    return IChar;
  }

Another use case is checking the equality of two ``char`` variables with
different signedness. Inside the non-ASCII value range this comparison between
a ``signed char`` and an ``unsigned char`` always returns ``false``.

.. code-block:: c++

  bool compare(signed char SChar, unsigned char USChar) {
    if (SChar == USChar)
      return true;
    return false;
  }

The easiest way to fix this kind of comparison is casting one of the arguments,
so both arguments will have the same type.

.. code-block:: c++

  bool compare(signed char SChar, unsigned char USChar) {
    if (static_cast<unsigned char>(SChar) == USChar)
      return true;
    return false;
  }

.. option:: CharTypdefsToIgnore

  A semicolon-separated list of typedef names. In this list, we can list
  typedefs for ``char`` or ``signed char``, which will be ignored by the
  check. This is useful when a typedef introduces an integer alias like
  ``sal_Int8`` or ``int8_t``. In this case, human misinterpretation is not
  an issue.

.. option:: DiagnoseSignedUnsignedCharComparisons

  When `true`, the check will warn on ``signed char``/``unsigned char`` comparisons,
  otherwise these comparisons are ignored. By default, this option is set to `true`.