1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292
|
From greyham Thu Oct 28 18:42:35 1993
Newsgroups: comp.lang.c++,comp.programming.literate
Subject: An Automatic C++ documentation compilation project.
Summary: Anyone willing to add C++ support to c2man?
Keywords: c2man, C, C++, Literate Programming, Documentation
Copyright 1993, 1994 by Graham Stoney.
This may be freely redistributed or quoted, so long as it's attributed to me.
Writing and maintaining documentation has often been a thorn in the side of the
Software Engineer and Programmer. After spending a great deal of time and
effort writing documentation about a program or software system, the code
invariably changes, quickly rendering the documentation out of date. The
documentation becomes misleading, gets neglected, and quickly becomes useless.
"Literate Programming" is one approach to solving this problem. It effectively
introduces a whole new (typesetting) language, requires a quite radical shift
on the part of the "non-literate" programmer and still requires a good deal of
effort on the part of the programmer[1].
I'd like to suggest a different approach which lies considerably closer to
more traditional programming practices, and can offer quite immediate benefits
when functional interface documentation is the main documentation required.
The primary philosophy here is to use the programming language as far as
possible to express the programmer's intentions, and to use comments only when
the programming language is not sufficiently expressive. A comment can then
become part of the language grammar which is recognised by a "documentation
compiler". This tool parses a superset of the programming language and can
automatically generate documentation in human-readable form by associating the
programmer's comments with the objects in the code by their context.
Whilst the idea of extracting documentation from comments in source code is by
no means new, the difference here is that the comments actually form part of
the grammar of the language recognised by the documentation compiler[2].
Comments should not repeat information that is already represented in the
program code; for instance, a comment describing a function argument should not
repeat the name and type of that argument (since that information has already
been included, for the compiler), but should appear near the argument.
For example, in C, the programmer should write this:
/* include an example in the article */
enum Result example(int page /* page it appears on */);
Rather than this:
/* include an example in the article
*
* PARAMETERS:
* int page page it appears on
*
* RETURNS:
* RESULT_YES The readers agreed
* RESULT_NO The readers disagreed
* RESULT_YOURE_JOKING The readers disagreed strongly
* RESULT_BLANK_LOOKS The readers didn't understand
*/
enum Result example(int page);
Also in this example, the documentation compiler knows the possible enumerated
values that the function can return (as does the "real" compiler), so it is
unnecessary for the programmer to restate them. The comments need simply be
included in the definition for "enum Result" for the "RETURNS" information to
be generated automatically:
enum Result {
RESULT_YES, /* The readers agreed */
RESULT_NO, /* The readers disagreed */
RESULT_YOURE_JOKING, /* The readers disagreed strongly */
RESULT_BLANK_LOOKS /* The readers didn't understand */
};
Critics have suggested that the latter style in the example is easier to read
for someone wishing to call the function in question. Of course, this is a
style question which depends on each person's tastes; but the criticism is tied
to the notion that the source code needs to look "beautiful" because it is the
primary reference for someone wishing to use that function. This becomes much
less significant once documentation is available which is known to _always_ be
up to date. Of course, the latter style takes longer to write and maintain,
and can become out of date should the name or type of the parameter be
changed, yet the comment get neglected.
I have implemented one such documentation compiler for the C language called
"c2man", which is freely available[3]. The response from users has been
extremely encouraging; I suspect this is partly because of the wide variety of
styles of comment placement that are recognised: it often correctly recognises
comments that weren't written with c2man in mind at all. While it's use is
focused solely on functional interface documentation and it doesn't have
anywhere near the power of a full Literate Programming system, the focus is on
reducing the effort required by the programmer to the absolute minimum, and
seeing how much documentation we can get essentially "for free".
Many people have requested C++ support be added to c2man, and I suspect that
this philosophy would be even more suitable and powerful for documenting
interfaces to C++ classes automatically.
Here is an example of how I envisage this philosophy would work when applied to
C++. It's interesting to note that this code was written a couple of years ago
exactly as you see it here, without the idea of generating documentation from
it in mind at all:
// generic Timer class
class Timer
{
private:
static int numactive; // number of constructed timers.
static Timer *first; // first one in list.
Timer *next; // next one in linked list.
Time ticksdiff; // ticks we take to expire once at front.
enum
{
INACTIVE, // timer is not in chain.
STARTED, // one-shot
RUNNING // continuous.
} state;
// original interrupt vector value.
static void interrupt (far *old_vector)(...);
void (*timeout_function)(int); // function called when we time out
int timeout_parameter; // gets passed to timeout_function
Time duration; // timer length (ticks)
static void interrupt far tick(...); // clock tick routine.
void insert(); // add into active chain.
void remove(); // remove from active chain.
void set(Time milliseconds); // set duration from ms.
public:
// constructor
Timer(Time time=0, // milliseconds
void (*function)(int)=0, // called at timeout
int param=-1); // param for function
// destructor
~Timer();
// start (or restart) a timer running.
void Start();
void Start(Time duration); // how long to run for
// start a timer running continuous.
void Run();
// stop a timer.
void Stop();
// is a timer active?
boolean Active() const { return state != INACTIVE; };
};
Processing this class declaration could generate the following automatically:
NAME
Timer - generic timer class
SYNOPSIS
class Timer
{
public:
Timer(Time time=0,
void (*function)(int)=0,
int param=-1);
~Timer();
void Start();
void Start(Time duration);
void Run();
void Stop();
boolean Active() const;
};
PARAMETERS
Time time
Milliseconds
void (*function)(int)
called at timeout.
int param
Param for function.
Time duration
How long to run for.
DESCRIPTION
Timer
Constructor
~Timer
Destructor
Start
Start (or restart) a timer running.
Run
Start a timer running continuous.
Stop
Stop a timer.
Active
Is a timer active?.
It should also be possible to extract this information from the implementation
of the class (rather than the declaration), if that's where the user prefers to
put the comments describing each member function and their parameters.
The ideal tool should:
1. Avoid imposing a style on the programmer.
2. Work out section names (NAME, SYNOPSIS etc) without the programmer having
to specify them explicitly.
3. Handle C++ and C style code equally well.
4. Not require the programmer to restate information which is already expressed
in the syntax of the programming language.
5. Work reasonably well with existing code.
6. Flatten the class hierarchy so that the documentation for each class
includes virtually everything the user needs to know about it.
A number of tools already exist which attempt to tackle this problem, such as
class2man, genman, classdoc and docclass. They vary in sophistication,
utility, and the demands they place on the programmer; however, none as yet
meet all the criteria set out above, and no one tool will suit the tastes of
all programmers.
Pouring lots of effort into a really ``smart'' documentation generator makes
sense because once it's done, you get a payback for every document you
generate. Every little feature added to the documentation generator to make
things easier for the programmer pays off multiple times, and minimising the
effort required by the programmer is the key.
The logical starting point would be to graft Jim Roskind's C++ grammar[4] into
c2man, modifying it to recognise comments in the relevant places, and adding
all the necessary structures to hold the information from the parser that will
get included in the output. Very little functional change should be needed in
the lexer, which already recognises C++ comments.
Unfortunately, at present I do not have sufficient spare time to make the
additions to c2man required to support C++. It would be a great contribution to
the C++ community, not to mention the documentation time saved by themselves,
for someone involved in C++ work to add this support and release the result[5].
If you work with a team developing C++ code, please consider having one of your
developers on a ``Usenet Sabbatical'' to extend this philosophy to C++, and
start reaping the benefits in documentation time savings.
It could also make an ideal Computer Science student compiler project.
Please contact me via E-mail if you are interested in undertaking such a
project.
Graham Stoney
greyham@research.canon.com.au
Footnotes:
1. Advocates of Literate Programming would argue that Literate Programming is
much more than snazzy documents and that it encourages this extra effort to
focus early on in the design of the software, which pays off later.
2. To get a better idea, see the file grammar.y in the c2man distribution.
3. c2man has been posted to comp.sources.misc. It should be available from:
location: ftp from any comp.sources.misc archive, in volume42
(the version in the comp.sources.reviewed archive is obsolete)
ftp /pub/Unix/Util/c2man-2.0.*.tar.gz from dnpap.et.tudelft.nl
Australia: ftp /usenet/comp.sources.misc/volume42/c2man-2.0/*
from archie.au
N.America: ftp /usenet/comp.sources.misc/volume42/c2man-2.0/*
from ftp.wustl.edu
Europe: ftp /News/comp.sources.misc/volume42/c2man-2.0/*
from ftp.irisa.fr
Japan: ftp /pub/NetNews/comp.sources.misc/volume42/c2man-2.0/*
from ftp.iij.ad.jp
Patches: ftp pub/netnews/sources.bugs/volume93/sep/c2man* from lth.se
4. Jim Roskind's yaccable C++ grammar is available via ftp from
ics.uci.edu in the ftp/pub directory as:
c++grammar2.0.tar.Z
byacc1.8.tar.Z
5. c2man's copyright requires that all derivative works remain freely
available.
|