1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265
|
@page
@node Complexity Computation
@chapter Complexity Computation
@cindex Complexity Computation
The principal goal
Fundamentally, this program counts lines of non-comment source lines,
multiplies by a ``nesting factor'' for each level of logic nesting and
divides by a scaling factor so that the typical results lie roughly in
the same range as @code{pmccabe} results. That happens to be approximately 20.
@menu
* parsing:: Parsing Method
* scoring algorithm:: Complexity Measurement Algorithm
* scores:: Complexity Scores
* stats:: Complexity Statistics
* tuning:: Scoring Adjustments
@end menu
@node parsing
@section Parsing Method
@cindex parsing
The method chosen for parsing the source has an effect on what gets
seen (scored) by the program.
@menu
* complexity parsing:: Complexity Measurement Parsing
* post-pp parsing:: Post-PreProcessing Parsing
* during pp parsing:: During PreProcessing Parsing
* pmccabe parsing:: @code{pmccabe} Parsing
@end menu
@node complexity parsing
@subsection Complexity Measurement Parsing
This program examines the actual source a human looks at when the
file is opened, provided it is not pre-processed by @code{unifdef},
@xref{complexity unifdef, @code{unifdef}}. This was chosen because
uncompiled code adds to the complexity of what a human must understand.
However, sometimes the source will contain unbalanced braces a la:
@example
#if FOO
for (int ix = foo;;) @{
#else
for (int ix = bar;;) @{
#endif
code...
@}
@end example
rendering code that cannot be parsed correctly. @code{unifdef}-ing
makes it parsable. Unfortunately, because the practice of @code{ifdef}-ing
unbalanced curly braces is so common, this program cannot rely on
finding the correct closing brace.
@strong{CAVEAT}: for the purposes of this program, procedures end
when either a matching closing brace is found @emph{or} a closing curly brace
is found in column 1, whichever comes first. If the closing brace
in column one does not match the procedure opening brace, the
procedure is considered unscorable.
Fortunately, unscorable procedures are relatively unusual.
@strong{CAVEAT2}: K&R procedure headers are not recognized.
If anything other than an opening curly brace appears after
the parameter list will cause the code recognizer to go back
into ``look for a procedure header'' mode. K&R procedures
are not just not scored, they are completely ignored.
This should probably get fixed, though.
@node post-pp parsing
@subsection Post-PreProcessing Parsing
Another approach would be to use the @code{C} compiler and analize the
tokens coming out of the preprocessor. The drawbacks are that macro
expansions will add to the complexity, even though they do not add to
human perceived complexity, and uncompiled code do not add to the
complexity measure. The benefit, of course, is that you know for
certain where a procedure body starts and ends.
@node during pp parsing
@subsection During PreProcessing Parsing
This would require going into the C preprocessor code and cause macros
to not be expanded. Again, the great benefit is that you know for
certain you can find the starting and ending braces for every
procedure body. The downsides are the extra work and, again, the
uncompiled code won't get counted in the complexity measure.
This might be a useful exercise to do some day, just to see
how helpful it might be. Being able to recognize all procedure
bodies without fail would be a good thing.
@node pmccabe parsing
@subsection @code{pmccabe} Parsing
The @code{pmccabe} parsing actually inspired the method for this program.
Thd difference is that @code{pmccabe} will always keep scanning until
a procedure body's closing curly brace is found, even if that means
counting the code from several following procedure definitions.
The consequence of this is that this program's code will see some
procedures that @code{pmccabe} will not, and vice versa.
@node scoring algorithm
@section Complexity Measurement Algorithm
Fundamentally, this program counts non-comment source lines and
examines elements of parenthesized expressions. This score
is multiplied by a nesting scoring factor for each layer of code nesting.
A parenthesized expression is scanned for operators. If they are all
arithmetic operators, or all arithmetic and one relational operator,
the score is zero. If all the operators are boolean @code{and}s or
they are all @code{or}s, then the score is one. An assignment
operator with arithmetic operators also scores one. If you mix
relational operators and all @code{and}s or all @code{or}s, the score
is the number of boolean elements. If you mix @code{and}s and
@code{or}s at the same parenthetical level, the two counts are
multiplied, unless the boolean element count is higher.
Fundamentally, do not use multiple relational or boolean operators at
the same parenthetical level, unless they are all boolean @code{and}s
or they are all boolean @code{or}s. If you use boolean operators and
relational operators in one expression, you are charged one statement
for each boolean element.
After scoring each statement and any parenthesized expressions, the
score is multiplied by any encompassing controlled block and added to
the score of that block. A ``controlled block'' is a curly-braced
collection of statements controlled by one of the statement
controlling statements @code{do}, @code{for}, @code{else}, @code{if},
@code{switch}, or @code{while}. Stand alone blocks for scoping local
variables do not trigger the multiplier.
You may trace the scores of parenthesized expressions and code
blocks (@pxref{complexity trace, trace output file}). You will see
the raw score of the code block or expression.
The final score is the outermost score divided by the ``scaling factor'',
@xref{complexity scale, complexity scaling factor}.
@node scores
@section Complexity Scores
@cindex scores
The ``Complexity Scores'' table shows the score of each procedure identified
that also exceeded the threshold score,
@xref{complexity threshold, ---threshold}. The entries on each line are:
@itemize @bullet
@item
The computed score
@item
The number of lines between the opening and closing curly braces
@item
The number of non-comment, non-blank lines found there
@item
The name of the source file
@item
The line number of the opening curly brace
@item
The name of the procedure
@end itemize
The output is sorted by the score and then the number of non-comment lines.
Procedures with scores below the threshold are not displayed.
@node stats
@section Complexity Statistics
@cindex statistics
The statistics are displayed both as a table and as a histogram,
@xref{Example Output}. It is under the control of the
@ref{complexity histogram, ---histogram} option.
The statistics are for each non-comment
source line and each source line is given the score of its
encompassing procedure. This way, larger procedures are given
proportionally more weight than one line procedures.
The histogram is broken up into three ranges. Scores of 0 through 99
are displayed in 10 point groupings, 100 through 999 in 100 point
groupings and 1000 and above (good grief!!, but they exist) are in
1000 point groupings. The number of asterisks represent the number
of lines of code that are in procedures that score in the specified
range.
The tabular statistics are also based on lines, not procedures.
@table @samp
@item Average line score
This is the procedure score times the non-comment
line count, all added up and divided by the total non-comment source
lines found.
@item 25%-ile score
@itemx 50%-ile score
@itemx 75%-ile score
@itemx Highest score
Since the distribution of scores is nothing like a bell curve, the mean
and standard deviation do not give a very clear picture of the distribution
of the scores. Typically, the standard deviation is larger than the average
score. So, instead the program prints the the four quartile scores.
The score for which 25, 50, and 75 percent of code is scored less than,
plus the highest scoring procedure (100 percent of code scores less than
or equal to that score).
@item Unscored procedures
If any procedures were found that could not be scored, the number of such
procedures is printed.
@end table
@node tuning
@section Scoring Adjustments
@cindex tuning
@cindex scores
Scores can be adjusted with three different options:
@table @samp
@item nesting-penalty
@xref{complexity nesting-penalty, ---nesting-penalty}.
@item demi-nesting-penalty
@xref{complexity demi-nesting-penalty, ---demi-nesting-penalty}.
@item scale
@xref{complexity scale, ---scale}.
@end table
The raw score is the number of lines or statements, whichever is
greater, adjusted by a factor for the depth of the logic. Statements
are nested when they are inside of a block of statements for a
``block'' statement (viz., ``do'', ``for'', ``if'', ``switch'' or
``while''). Statements within blocks used to constrain the scope of
variables (not controlled by a block statement) are not multiplied by
this factor.
Expressions are nested when contained within parentheses.
The @i{cost} of these is different. Block level nesting multiplies the
score for the block by the @code{--nesting-penalty} factor (2.0 by default).
Nested expressions are multiplied by the @code{--demi-nesting-penalty},
the square root of @code{--nesting-penalty} by default.
Some attempt is made to judge the complexity of an expression.
A complicated expression is one that contains an assignment operator,
more than one relation operator, or a mixture of ``and'' and ``or''
operators with any other different kind of non-arithmetic operator.
Expression scores are minimized by:
@itemize @bullet
@item
Doing assignments outside of
boolean expressions, or at least parenthesizing them.
@item
Parenthesizing each relationship operation in an expression
of multiple ``and'' and/or ``or'' operations. Yes, precedence
parses them correctly, but it is less clear.
@item
Parenthesizing groups of ``and'' and ``or'' operations so that
operators of only one type appear at one level. For example,
the first expression below instead of the second. Yes, precedence
means the effect is the same, but we're after code clarity so that
correctness is more obvious.
@example
1: ((a && b) || (c && d))
2: (a && b || c && d)
@end example
The first adds 2 to the raw score (before dividing by the scaling factor).
The latter will add 5, assuming a @code{demi-nesting-penalty} of @code{1.41}.
@end itemize
|