File: intro.texi

package info (click to toggle)
complexity 1.10%2Bdfsg-1
links: PTS, VCS
area: main
in suites: stretch
size: 1,704 kB
sloc: sh: 5,111; ansic: 1,936; makefile: 96
file content (207 lines) | stat: -rw-r--r-- 8,067 bytes
parent folder | download | duplicates (4)

@page
@node    Introduction
@chapter Introduction
@cindex  Introduction

Complexity measurement tools provide several pieces of information.
They help to:

@enumerate
@item
locate suspicious areas in unfamiliar code
@item
get an idea of how much effort may be required to understand that code
@item
get an idea of the effort required to test a code base
@item
provide a reminder to yourself.  You may see what you've written as obvious,
but others may not.  It is useful to have a hint about what code may
seem harder to understand by others, and then decide if some rework
may be in order.
@end enumerate

But why another complexity analyzer?  Even though the McCabe analysis
tool already exists (@code{pmccabe}), I think the job it does is too
rough for gauging complexity, though it is ideal for gauging the
testing effort.  Each code path should be tested and the @code{pmccabe}
program provides a count of code paths.  That, however, is not the
only issue affecting human comprehension.  This program attempts to
take into account other factors that affect a human's ability to
understand.

@menu
* code length::             Code Length
* switch statement::        Switch Statement
* logic conditions::        Logic Conditions
* personal experience::     Personal Experience
* rationale summary::       Rationale Summary
* language constraints::    Constraints on Source Code
@end menu

@node    code length
@section Code Length
Since @code{pmccabe} does not factor code length into its score, some folks
have taken to saying either long functions or a high McCabe score find
functions requiring attention.  But it means looking at two factors
without any visibility into how the length is obfuscating the code.

The technique used by this program is to count 1 for each line that a
statement spans, plus the complexity score of control expressions
(@code{for}, @code{while}, and @code{if} expressions).  The value for
a block of code is the sum of these multiplied by a nesting factor
(@pxref{complexity nesting-penalty}).  This score is then added to the
score of the encompassing block.  With all other things equal, a
procedure that is twice as long as another will have double the score.
@code{pmccabe} scores them identically.

@node    switch statement
@section Switch Statement
@code{pmccabe} has changed the scoring of @code{switch}
statements because they seemed too high.  @code{switch} statements
are now ``free'' in this new analysis.  That's wrong, too.
The code length needs to be counted and the code within a @code{switch}
statement adds more to the difficulty of comprehension than code at
a shallower logic level.

This program will multiply the score of the @code{switch} statement
content by the @xref{complexity nesting-penalty, nesting score factor}.

@node    logic conditions
@section Logic Conditions

@file{pmccabe} does not score logic conditions very well.
It overcharges for simple logical operations, it doesn't charge for
comma operators, and it undercharges for mixing assignment operators
and relational operators and the @code{and} and @code{or} logical
operators.

For example:
@example
xx = (A && B) || (C && D) || (E && F);
@end example
scores as @code{6}.  Strictly speaking, there are, indeed, six code
paths there.  That is a fairly straight forward expression that is not
nearly as complicated as this:
@example
  if (A) @{
    if (B) @{
      if (C) @{
        if (D)
          a-b-c-and-d;
      @} else if (E) @{
          a-b-no_c-and-e;
      @}
    @}
  @}
@end example
@noindent
and yet this scores exactly the same.  This program reduces the cost
to very little for a sequence of conditions at the same level.  (That
is, all @code{and} operators or all @code{or} operators.)  so the raw score
for these examples are 4 and 35, respectively (1 and 2 after scaling,
@pxref{complexity scale, @code{--scale}}).

If you nest boolean expressions, there is a little cost, assuming you
parenthesize grouped expressions so that @code{and} and @code{or}
operators do not appear at the same parenthesized level.  Also
assuming that you do not mix assignment and relational and boolean
operators all together.  If you do not parenthesize these into
subexpressions, their small scores get multiplied in ways that
sometimes wind up as a much higher score.

The intent here is to encourage easy to understand boolean expressions.
This is done by,
@itemize 
@item
not combining them with assignment statements
@item
canonicalizing them (two level expressions with all @code{&&}
operators at the bottom level and all @code{||} operators in the
nested level -\- or vice versa)
@item
parenthesizing for visual clarity (relational operations parenthesized
before being joined into larger @code{&&} or @code{||} expressions)
@item
breaking them up into multiple @code{if} statements, if convenient.
@end itemize

@node    personal experience
@section Personal Experience

I have used @code{pmccabe} on a number of occasions.  For a first order
approximation, it does okay.  However, I was interested in zeroing in on the
modules that needed the most help and there were a lot of modules needing
help.  I was finding I was looking at some functions where I ought to have
been looking at others.  So, I put this together to see if there was a
better correlation between what seemed like hard code to me and the score
derived by an analysis tool.

This has worked much better.  I ran @code{complexity} and @code{pmccabe}
against several million lines of code.  I correlated the scores.  Where the
two tools disagreed noticeably in relative ranking, I took a closer look.  I
found that @file{complexity} did, indeed, seem to be more appropriate in its
scoring.

Also, I have included with this project a shell script (@code{cx-vs-mc}) for
summarizing the differences between @code{pmccabe} and @code{complexity}.
This script will run both tools, order the results of @code{pmccabe} and
list the procedures that are ranked significantly differently.  By default,
``significant'' means by more than 25% of the number of procedures scored.
But it is adjustable.  @code{stderr} will also receive a list of procedures
seen by one tool but not the other.

@node    rationale summary
@section Rationale Summary

Ultimately, complexity is in the eye of the beholder and, even,
the particular mood of the beholder, too.  It is difficult to
tune a tool to properly accommodate these variables.

@code{complexity} will readily score as zero functions that are
extremely simple, and code that is long with many levels of logic
nesting will wind up scoring much higher than with @code{pmccabe}, barring
extreme changes to the default values for the tunables.

I have included several adjustments so that scores can be
tweaked to suit personal taste or gathered experience.
(@xref{complexity nesting-penalty, nesting score factor}, and
@ref{complexity demi-nesting-penalty, nested expression scoring factor},
but also @xref{complexity scale, normalization scaling factor},
to adjust scores to approximate scores rendered by @code{pmccabe}).


@node language constraints
@section Constraints on Source Code

Be forewarned: this program has its own parser.  It is expected that
users would have the code compiled first, so no attempt at all is made
to diagnose syntax errors.  But there are further constraints that
the compiler does not impose.  Patches to relax these constraints
are welcome.

@itemize
@item
Procedures must end with a closing brace in column 1.
It is also acceptable to have the closing brace on the same line as
the opening brace.

@item
The compiler will accept source with arbitrarily deep levels of logic nesting.
This program will complain in each procedure that has logic that exceeds
four levels of nesting.
@example
foo()
@{
  if (e) @{
    if (f) @{
      if (g) @{
        if (h) @{
          you_will_get_warned();
  @} @} @} @}
@}
@end example
These warnings will not directly affect the procedure score,
but the ``you_will_get_warned()'' line adds ``1.9 ** 4'' (about 13) raw
points to the procedure score.
@end itemize