M C P P - M A N U A L . T X T == How to Use MCPP == Kiyoshi Matsui kmatsui@t3.rim.or.jp V.2.0 1998/08 First released. kmatsui V.2.1 1998/09 Updated according to C99 1998/08 draft. kmatsui V.2.2 1998/11 Updated according to C++98 Standard. kmatsui V.2.3 prerelease 1 2002/08 Updated according to C99 Standard. Added porting to Linux / GNU C, CygWIN and LCC-Win32. GNU C-compatible features augmented. kmatsui V.2.3 prerelease 2 2002/12 Added porting to GNU C V.3.2. Revised some wording. kmatsui V.2.3 release 2003/02 Finally released. kmatsui V.2.3 patch 1 2003/03 Slightly modified. kmatsui V.2.4 prerelease 2003/11 Added porting to Visual C++. Added #pragma MCPP preprocess, #pragma MCPP preprocessed kmatsui V.2.4 release 2004/02 Extended multi-byte character handling. Added porting to Plan 9/pcc. kmatsui V.2.4.1 2004/03 Revised recursive macro expansion, and added -c option. kmatsui V.2.5 2005/03 Absorbed POST_STANDARD into STANDARD as an execution time option, absorbed OLD_PREPROCESSOR setting as an execution option of PRE_STANDARD. Renamed most of #pragma __* directives as #pragma MCPP *. Added porting to GNU C V.3.3 and 3.4, changed some options accordingly.. Removed documents on older compiler-systems (DJGPP, compiler-systems on MS-DOS except Borland C 4.0). kmatsui Contents 1. Overview 1.1 High Portability 1.2 Standard C Mode with Highest Conformance and Other Modes 2. Invocation Options and Environment Settings 2.1 How to Pass Options to MCPP 2.2 How to Specify Invocation Options 2.3 General Options 2.4 Options by MCPP Mode 2.5 General Options Except for Some Compiler Systems 2.6 Options by Compiler System 2.7 Environment Variables 2.8 Multi-Byte Character Encodings 2.9 How to Use MCPP in One-Path Compilers 2.10 How to Make MCPP Available in IDE 3. Enhancement and Compatibility 3.1 #pragma MCPP put_defines, #pragma MCPP preprocess, #pragma MCPP preprocessed, #put_defines, #preprocess, #preprocessed 3.1.1 Pre-preprocessing of Header File 3.2 #pragma once 3.2.1 Tool to Write #pragma once to Header Files 3.3 #pragma MCPP include_next, #pragma MCPP warning, #include_next, #warning 3.4 #pragma MCPP push_macro, #pragma MCPP pop_macro, #pragma push_macro, #pragma pop_macro #pragma __setlocale, #pragma setlocale 3.5 #pragma MCPP debug, #pragma MCPP end_debug, #debug, #end_debug 3.6 #assert, #asm, #endasm 3.7 New C99 Features (_Pragma() Operator, Variable Argument Macro and others) 3.8 Asm Statement in Borland C and Other Special Syntaxes 3.9 Compatibility with GNU C/Cpp 3.9.1 Preprocessing FreeBSD 2/Kernel 3.9.2 Preprocessing FreeBSD 2/Libc 3.9.3 Problems Concerning GNU C 2/cpp 3.9.4 Preprocessing Linux/glibc 2.1 3.9.5 To Use MCPP with GNU C 2 3.9.6 Preprocessing GNU C 3.2 3.9.7 To Use MCPP with GNU C 3 3.10 Visual C++ .net System Header Problems 3.10.1 Comment Generating Macro? 4. Implementation-defined Behaviors 4.1 Status Value on Exit 4.2 Include Directory Search Path 4.3 How to Construct Header Name 4.4 Evaluation of #if Expression 4.5 Character Constant Evaluation in #if Expression 4.6 #if sizeof (type) 4.7 How to Handle White-Space Sequence 4.8 Default Specifications for MCPP Executables 5. Diagnostic Messages 5.1 Diagnostic Messages Format 5.2 Translation Limits 5.3 Fatal Errors 5.3.1 MCPP's Own Bugs 5.3.2 Physical Errors 5.3.3 Translation Limits and Internal Buffer Errors 5.3.4 #pragma MCPP preprocessed Related Errors 5.4 Errors 5.4.1 Character and Token Related Errors 5.4.2 Unterminated Source File Related Errors 5.4.3 Ill-Balanced Preprocessing Group Related Errors 5.4.4 Simple Syntax Errors on Directive Lines 5.4.5 Syntax Errors in #if Expressions 5.4.6 #if Expression Evaluation Errors 5.4.7 #define Related Errors 5.4.8 #undef Related Errors 5.4.9 Macro Expansion Errors 5.4.10 #error and #assert 5.4.11 Failure of #include 5.4.12 Other Errors 5.5 Warnings (Class 1) 5.5.1 Character, Token and Comment Related Warnings 5.5.2 Unterminated Source File Related Warnings 5.5.3 Directive Line Related Warnings 5.5.4 #if Expression Related Warnings 5.5.5 Macro Expansion Related Warnings 5.5.6 Line Number Related Warnings 5.5.7 #pragma MCPP warning, #warning 5.6 Warnings (Class 2) 5.7 Warnings (Class 4) 5.8 Warnings (Class 8) 5.9 Warnings (Class 16) 5.10 Diagnostic Messages Index 6. Reporting on Bugs and Others 6.1 MCPP's Bug? 6.2 malloc() Related Bugs 6.3 How to Report Bugs 6.4 Give Us Your Feedback 1. Overview MCPP is a C preprocessor written by kmatsui (Kiyoshi Matsui) based on the DECUS cpp developed by Martin Minow, and then rewritten entirely. MCPP means Matsui cpp. This software is supplied as source codes, and to use MCPP in any compiler systems, a small amount of compiler-system- specific modifications are required before it can be compiled into an executable. [1] This document describes the specification for a MCPP executable that has been already ported to a certain compiler system. It also explains briefly how to port MCPP to a compiler system. For those who want to know more about MCPP or want to port it to other compiler systems, refer to MCPP source and its document "mcpp-porting.txt". All these sources and related documents are provided as free software. Before going into detail, some of the MCPP features are introduced here. (The sections 1.1 and 1.2 are identical with those of mcpp-porting.txt.) Note: [1] To use a MCPP executable, you must replace a compiler-system- provided preprocessor with it. Therefore, the MCPP executable has the same name with it. In many case, the name is cpp. 1.1 High Portability MCPP is portable. It supports various operating system, including GNU/ Linux, DOS/Windows. It's source code is also portable. It can be compiled by compilers which support Standard C or C++ (ANSI/ISO C or C++) as well as ancient ones which only support K&R 1st. It uses only classical library functions. Some library functions have C source code appended. MCPP was developed with emphasis placed on portability, it will never happen to this Standard C conformant preprocessor that it cannot be compiled because, for example, you do not have the Standard C compiler system or a part of your compiler system is not Standard C conformant. To port MCPP to various compiler systems, in most cases, all you have to do is to modify some macros in the header files and compile it. In the worst case, adding several dozens of lines to the source file, system.c, would be enough. As the MCPP executable runs in a small memory, it can be executed even on an 16-bit system with a small memory space, of course, under considerable limitation. MCPP's multi-byte character (Kanji) facility can handle Japanese EUC-JP and shift-JIS, Chinese GB-2312, Taiwanese Big-5 and Korean KSC-5601 (KSX 1001). The 32-bit or more system ported MCPP can also handle ISO-2022-JP and UTF-8. In addition, if the compiler-proper fails to recognize shift- JIS or Big-5, MCPP can complement it. 1.2 Standard C Mode with Highest Conformance and Other Modes By modifying some macros in the header file, system.H, on compilation of MCPP itself, preprocessors of various behavioral specifications are generated: Standard C, K&R 1st. and others. MCPP executable in the Standard C mode provides an execution option that allows MCPP to run as C++ preprocessor. Furthermore, it has an option for what I call "post-Standard mode". K&R mode executable provides an option for "Reiser cpp" mode. Different from many existing preprocessors, MCPP of the Standard C mode has the highest conformance to Standards, such as ISO/IEC 9899:1990 and its Corrigendum 1:1994, and Amendment 1:1995, as well as C99 (ISO/IEC 9899:1999). The purpose of MCPP is to become a reference model of Standard C preprocessors. Those versions of the Standard can be specified by an execution option. Even if the compiler-proper fails to conform to the Standard C, MCPP tries to compensate it whenever possible. For example, it provides the functionality to concatenate adjacent string literals on behalf of the complier proper that does not. In addition, it provides several useful enhancements: #pragma MCPP debug, which traces the process of macro expansion or #if expression evaluation, and the header file "pre-preprocessing" facility. MCPP also provides several useful execution options, such as warning level or include directory specification options. MCPP will never go into an indefinite error loop, or output an inappropriate message even if it encounters a serious error in source code. It always provides an accurate and descriptive diagnostic message and processes the error condition suitably. It also issues a warning message for a portability problem. Detailed documentation is also provided. A disadvantage of MCPP, if any, is slower processing speed. It takes twice to three times more time than GNU C/cpp, but seeing that its processing speed is almost the same as that of Borland C 5.5/cpp and that it runs a little bit faster when the header file pre-preprocessing facility is used, it cannot be described as particularly slow. MCPP puts an emphasis on standard conformance, source portability and operability in a small memory space, making this level of processing speed inevitable. Validation Suite for Standard C Preprocessing, which is used to test the extent to which a preprocessor conforms to Standard C, its documentation cpp-test.txt, which contains results of applying Validation Suite to various preprocessors, are also released with MCPP. When looking through this file, you will notice that so-called Standard C conformant preprocessors have so many conformance-related problems. During the course of developing MCPP V.2.3, it was selected as one of the "Exploratory Software Projects for 2002" by Information-Technology Promotion Agency (IPA), Japan, along with its Validation Suite. From July 2002 to February 2003, the project, financed by IPA, proceeded under advice of Yutaka Niibe project manager. I asked "HighWell, Inc." Limited Company, Tokyo, for translation of all the documents. For technical details, I revised and corrected the translated documents. MCPP was continuously adopted to one of the "Exploratory Software Projects" in 2003 by Hiroshi Ichiji project manager. The update of MCPP proceeded into the next version, V.2.4. [1] MCPP and Validation Suite have been kept on updating after the project. ISO/IEC 9899:1990 (JIS X 3010-1993) had been used as C Standard, but in 1999, ISO/IEC 9899:1999 was adopted as a new Standard. This manual calls the former C90 and latter C99. The former is generally called ANSI C or C89 because it migrated from ANSI X3.159-1989. ISO/IEC 9899: 1990 + Amendment 1995 is sometimes called C95. Note: [1] For details on "Exploratory Software Projects", visit the following web site: http://www.ipa.go.jp/jinzai/esp/ MCPP source, its documentation and Validation Suite, including recent version, are available at the following CVS repository: http://cvs.m17n.org/cgi-bin/viewcvs/?cvsroot=matsui-cpp You can download a tar-ball from the above site. They are also available by anonymous ftp from the following site: ftp://ftp.m17n.org/pub/mcpp/ The following web page provides guidance. http://www.m17n.org/mcpp/ MCPP V.2.2 and Validation Suite V.1.2 are also available at the following Vector Co.'s sites. They are also contained in the CD-ROM called "PACK for WIN GOLD". Although this software is registered in the directory called "dos/prog/c", it is not MS-DOS specific. MCPP source code is compilable on various OSs, such as UNIX, WIN32/MS-DOS. http://download.vector.co.jp/pack/dos/prog/c/cpp22src.lzh http://download.vector.co.jp/pack/dos/prog/c/cpp22bin.lzh http://download.vector.co.jp/pack/dos/prog/c/cpp12tst.lzh It seems that http://download.vector.co.jp can be substituted with ftp:/ /ftp.vector.co.jp/. The text files in these archive files available at Vector use [CR]+[LF] as a and encode Kanji in shift-JIS for DOS/Windows. On the other hand, those available at m17n.org use [LF] as a and encode Kanji in EUC-JP for UNIX. So, when they are used in other OS environments, conversion is necessary. My conversion tool called "convf" facilitates this conversion by processing all the files with one operation. This tool automatically recognizes binary files as such and copies them without conversion. Time stamps and the mode are retained as they are. However, a MCPP package contains files used to test a particular multi-byte character encoding. The encoding contained in these files must not be converted. To achieve this, apply "convf" first to all the files in the MCPP package to convert only , and then apply this tool again only to the 'doc' directory to convert Kanji encodings. Convf itself can be compiled by the compiler system MCPP was ported to. When you move these files from DOS/Windows to other OS environments, be sure to move them in an archives file, and then unarchive and convert under the new OS. If they are unarchived under MS-DOS or Windows95, case distinction of file name will be lost. Convf is found in the following location (Unfortunately, the document is in Japanese only.): http://download.vector.co.jp/pack/dos/util/text/conv/code/ convf-1.8.lzh 2. Invocation Options and Environment Settings The and [arg] shown below indicate required and optional arguments respectively. Note that the <, >, [, or ] character itself must not be entered. 2.1 How to Pass Options to MCPP Before describing how to specify invocation options, this section explains how to pass them to MCPP. MCPP invocation options are specified by do_options() in the MCPP source, "system.c", however, there is not a generic method because compiler- system-specific settings are required. For a compiler system I have already ported MCPP to, MCPP does not necessarily implement all the options that the compiler-system-specific preprocessor provides. I did not implement the options I thought unnecessary because they can be easily implemented in do_options() if necessary. In addition, some options with the same name as those of compiler-system-specific preprocessor may behave differently. A compiler-system-specific compiler driver cannot pass some options to the cpp in a normal manner. However, GCC provides the -Wp almighty option to allow you to pass any options to the cpp. For example, if you specify as follows: cc -Wp,-W31,-Q23 The -W31 and -Q23 options are passed to cpp. The options you want to pass to cpp must be specified following -Wp with each option delimited by ", ". [1], [2] For other compiler systems, if their compiler driver source is available, it is recommended that this type of an almighty option should be added to the source. If you modify the compiler driver source code in the way that, for example, when -P is specified, only - is passed to cpp, it would be very convenient because any options can be passed. To use MCPP, install it in the directory where the compiler-system- specific preprocessor should be located under an appropriate name. Before copying MCPP, be sure to change the name of compiler-system- specific one so that it may not be overwritten. On FreeBSD, the cpp is installed in the /usr/libexec directory. /usr/ bin/cc invokes /usr/libexec/cpp. There exists another /usr/bin/cpp, which is a shell-script that calls /usr/libexec/cpp by adding the -traditional option. This script seems to be provided for tools that expect old Reiser model's cpp or non-conformant sources. When you just specify "cpp", /usr/libexec/cpp is invoked. To avoid the -traditional option, specify a full path name (/usr/libexec/cpp). I recommend that you should retain the original /usr/libexec/cpp (GNU C/cpp) and change its name to, say, cpp_gnuc, and that you should change the name of a program that invokes /usr/bin/cpp to /usr/libexec/cpp_gnuc. [3], [4] For settings on Linux, FreeBSD, or CygWIN, see also 3.9.5. For settings in GNU C 3.x, see 3.9.7. Note: [1] -Wa and -Wl are almighty options for assembler and linker, respectively. The documentation on UNIX/System V/cc describes these options. Probably, GNU C/cc provides the -W option for compatibility. [2] In GNU C V.3, cpp was absorbed into cc1 (cc1plus). Therefore, the options specified with -Wp are normally passed to cc1 (cc1plus). To have cpp (cpp0), not ccl, preprocess, the -no-integrated-cpp option must be specified on gcc invocation. [3] GNU C V.2.95.3 provides cpp0, as well as cpp, under FreeBSD and Linux. It is cpp0 that gcc invokes. Under VineLinux 2.6, cpp is a link to cpp0. [4] Under VineLinux 2.6, cpp is installed in the directory /usr/lib/gcc- lib/i386-redhat-linux/2.95.3. 2.2 How to Specify Invocation Options MCPP invocation takes a form of: mcpp [- [-]] [in_file] [out_file] [- [-]] Note that you must replace the above "mcpp" with other name, depending on compiler systems or how MCPP is implemented. When out_file (an output path) is omitted, stdout is used unless the -o option is specified. When in_file (an input path) is omitted, stdin is used. A diagnostic message is output to stderr unless the -Q option is specified. If any of these files cannot be opened, preprocessing is terminated, issuing an error message. MCPP uses getopt() to get an option. Except for the -M option, missing a required option argument causes an error. For an option with arguments, white-space characters may or may not be inserted between the option character and an argument. In other words, both of "-I" and "-I " are acceptable. For options without arguments, both of "-Qi" and "-Q -i" are valid. If -D, -U, -I, or -W option is specified several times, each of them is valid. For -S, -V, or -+ option, only the first one is valid. For -2, or -3 option, its specification switches each time an option is specified. For other options, the last one is valid. The option letters are case sensitive. The switch character is '-', not '/', even under DOS/Windows. When invalid options are specified, a usage statement is displayed. To check valid options, enter a command, such as "mcpp -?". In addition to the usage message, several error messages are displayed, but they are self-explanatory. I will omit their explanations. 2.3 General Options This section covers common options across MCPP modes or compiler systems. -C Output also comments in source code. I hear this option is required when the UNIX lint utility is used. This option is useful for debugging even when the lint utility is not used. Note that a comment is moved ahead of a logical source line when output. This is because a comment is processed before macro expansion or directive processing, and a comment may appear during a macro invocation. -D [=[]] -D [=[]] Define a macro named "macro". This option can be used to change the definitions of predefined macros other than __STDC__, __STDC_VERSION__, __FILE__, __LINE__, __DATE__, __TIME__, and __cplusplus. (__STDC_HOSTED__, C99's predefined macro, is exceptionally redefined by this option, because some compiler systems, like GNU C V.3, use the -D option to define __STDC_HOSTED__.) To specify a value, use "=". If "=" is omitted, 1 is assumed. (Note that in bcc or bcc32, the macro is defined as zero- token by default.) Do not enter white-space characters immediately before "=". If a white-space character is entered immediately after "=", the macro is defined as zero token. A macro with arguments can be defined by this option. This option can be specified repeatedly. -I Specify the first directory in the include directory search path order with . For a search path, refer to 4.2. If a directory name contains spaces, it must be enclosed with " and ". -I 1, -I 2, -I 3 Specify a directory from which MCPP begins searching when it encounters a #include "header" directive (i.e. not
format). -I1, -I2 and -I3 indicate the current directory, the source file (i. e. includer) directory, and the both respectively. For details, see 4.2. -j On outputting a diagnostic message, MCPP displays only one line of diagnostic without additional information, such as source lines. (By default, one line of diagnostic message is followed by a source code line having a problem. If the source code line in question is found in a #included file, all the #including file names and including line numbers are also displayed in sequence. For a diagnostic on macro, MCPP displays also its definition information). When Validation Suite is used in the GNU C testsuite, this option must be specified to output a diagnostic message in the same format as GNU C/cpp. -o Output the preprocessed source to a file. If this option is omitted, the second argument ([out_file]) is regarded as an output path, so this option is not necessary, however, some compiler drivers use this option. I wonder if the purpose is to prevent erroneous expansion of a wild card. -P Do not output line number information to the compiler-proper. This option is specified when you want to use MCPP for purpose other than C preprocessing. -Q Output diagnostic messages to the "mcpp.err" file in the current directory. As these messages are appended to this file, it may become bigger. Delete it from time to time. -U Disable predefined macro named "macro". This option cannot disable __FILE__, __LINE__, __DATE__, __TIME__, __STDC__, __STDC_VERSION__ (and __STDC_HOSTED__ for C99 only), as well as __cplusplus invoked with -+ options. -v Output the MCPP version and a search order of include directories to stderr. -W Specify a warning level with . "OR" any one or more values of 1, 2, 4, 8 or 16, and set to the "ORed" value or 0. 1, 2, 4, 8, or 16 indicates a warning class. For example, if -W 5 is specified, warnings of classes 1 and 4 are output. If 0 is specified, no warnings are output. If this option is specified several times, all the specified values are "ORed" together. For example, -W 1 -W 4 is equivalent to -W 5. If this option is omitted, -W 1 is assumed. For warning messages, refer to 5.5 to 5.9. -z The preprocessing result of the #included files is not output, but macros are defined. 2.4 Options by MCPP Mode On compiling MCPP itself, by setting various macros in system.H, various types of preprocessors that behave according to various sets of specifications can be generated. (Refer to 4.1.3 of mcpp-porting.txt.) All the uppercased names below (including Chapters 3-5) that do not begin with "__", such as MODE, STDC, TFLAG_INIT, etc, are macros defined in system.H. These macros are used only for compiling MCPP itself and a MCPP executable generated does not contain these macros. You must understand this point clearly. Of these macros, MODE is most important in that it determines the basic behaviors of MCPP. MODE takes a value of STANDARD or PRE_STANDARD. In this manual, MCPP compiled with MODE == STANDARD is called mcpp_std and the one with MODE == PRE_STANDARD is called mcpp_prestd. Mcpp_std behaves conformingly to the Standards by default, and it has an execution time option to specify the so-called "post-Standard" specification which is called 'poststd' mode in this manual. Mcpp_prestd behaves according to K&R 1st specifications by default, and it has an execution option to specify the so-called "Reiser cpp" model which is called 'oldprep' mode in this manual. ("oldprep" means "old preprocessor".) Moreover, mcpp_std has a special execution time mode called 'compat' mode. This manual shows a list of various MCPP behaviors by mode, which may not readable. Please be patient. The following options are available for mcpp_std : -+ Behave as C++ preprocessor. MCPP predefines the __cplusplus macro (its value is defined in system.H and defaults to 1), interprets the text from // to the end of a logical line as a comment and recognizes "::", ".*" and "->*" as a single token. It evaluates "true" and "false" tokens in a #if expression to 1 and 0, respectively. If __STDC__ and __STDC_VERSION__ are defined, they are deleted. For GNU C ported MCPP, __STDC__ is not deleted for compatibility with GNU C/cpp. The predefined macros that do not begin with "_" are also deleted. However, extended characters are not converted to UCN. [1] and [2]. -2 Reverse initial settings for the digraphs processing. With DIGRAPHS_INIT == FALSE, MCPP recognizes digraphs. Otherwise, it doesn't. -3 Reverse initial settings for the trigraphs processing. With TFLAG_INIT == FALSE, MCPP recognizes trigraphs. Otherwise, it does not. -h Define the value of __STDC_HOSTED__ macro with . -S Change the value of __STDC__ to in C. In C++, this option is ignored. The range of must be 0-9. With set to 1 or higher, the predefined macros that do not begin with "_", such as unix, MSDOS, are disabled. S indicates __STDC__. If this option is omitted, __STDC__ is set to a default value (i.e. 1). For a GNU C version, -pedantic, -pedantic-errors, or -lang-c89 is equivalent to -S1, so the next -S is ignored. -@post, -@poststd Specify MCPP's so-called "post-Standard" mode of preprocessing which is a simplified version of the Standards. The differences from the Standards are as follow: 1. Does not recognize trigraphs. Digraphs are converted at translation phase 1, that is, the beginning of preprocessing. Does not deal with as a token. 2. Simplified tokenization according to complete token-base rule. When there is no white space, as a token separator between preprocessing tokens in the source code, insert a space automatically. (However, this does not get inserted between macro name and the following "(" within macro definition). Therefore, even for stringizing by # operator, it gets stringized after a space is inserted between all the preprocessing tokens. Also, at the re- definition of macros, it does not matter whether there is a token separator or not. 3. At the re-definition of function-like macros, the difference of the parameter name is not relevant. 4. Character constants cannot be used in #if expressions (it will cause an error). 5. It removed irregular "function-unlike" rules for function-like macro expansion. Hence, rescanning only targets to the replacement list of the macro, and not the sequence after that. 6. Normally, the header name with the format of #include is accepted, but it gets a warning. (by class 2 warning option) If the header name with the format of is used in a macro, it can get an error at the particular instance. It recommends to use the format of #include "stdio.h". 7. The rule, a space is required between macro name and replacement list in macro definition, is added in C99, but this rule is not complied with. (A space is inserted automatically at tokenization.) 8. UCN (universal-character-name) is not recognized. Multi-byte characters in identifier are not recognized. 9. In C++, eleven identifier-like operators are not dealt as operators. -a (-lang-asm, -x assembler-with-cpp) option cannot be used with this option. -@compat Expand recursive macro more than the Standard's specification. On expanding recursive macro, set the range of non-re-replacing of the same name narrower than the Standard. Refer to cpp-test.txt section 2.4.26 about the specifications of recursive macro expansion. See test-t/recurs.t for a sample of recursive macro. [3]. The following option is available for mcpp_prestd : -@old, -@oldprep 1. Convert comment to 0 space instead of 1 space. Usually this conversion is done in the output at the end. In macro definition, however, the conversion is done immediately after the definition. 2. When there are string literals or character constants in the replacement list of the macro definition, and if any of the parameter names match to any part of these, that part will be substituted with the argument corresponding to the parameter when calling the macro. That is to say, when the content of the string literal or character constant is searched as token sequence, stripping the enclosing quotes, if a parameter name is found, that will be substituted. 3. You can write anything you like in the lines of #else, #endif. (One usually writes MACRO of corresponding #if MACRO or #ifdef MACRO.) 4. It stops "unterminated string literal" and "unterminated character constant" errors. If there is no closure of the literal " or ', it assumes the close at line end. 5. It treats '# 123' line as '#line 123'. Note: [1] C++'s __STDC__ is not desirable and causes many problems. GNU C document says that __STDC__ needs to be predefined in C++ because many header files expect __STDC__ to be defined. The header files should be blamed for this. For common parts among C90, C99 and C++, "#if __STDC__ || __cplusplus" should be used. [2] Different from C99, the C++ Standard makes much of UCN. So did C 1997/11 draft. Half-hearted implementation is not permitted. However, implementing Unicode in earnest is too much burden for preprocessor. [3] This option is for compatibility with GNU C, Visual C++ and other major implementations. 'compat' means "compatible mode". In this manual, this is called 'compat' mode. 2.5 General Options Except for Some Compiler Systems -a Accept the following notations used in some assembler sources without causing an error. 1. #APP Even if a line that begins with # does not agree with any of C directives, MCPP outputs this line without causing an error. 2. "A very very long long string literal" The above old-fashioned string literals are concatenated into "A very very\nlong long\nstring literal". 3. Even if token concatenation using a ## operator generates an invalid C pp-token, it is not regarded as error. These sometimes happen to GNU source code, however, this option for GNU C is -x assembler-with-cpp or -lang-asm.. This option cannot be used with the -@post option. (See 2.4.) -I- Cancel default include directories and enable only ones specified with an environment variable and the -I option. Instead of -I-, GNU C ported MCPP uses -nostdinc. In GNU C, the -I- option provides quite different functionality. See 2.6. -N Disable all the predefined macros, including those that begin with "_", except for the ones required by C Standard and __MCPP. The predefined macros stipulated by C Standards include __FILE__, __LINE__, __DATE__, __TIME__, __STDC__, and __STDC_VERSION__, as well as __STDC_HOSTED__ for C99 and __cplusplus for C++. The purpose of excluding __MCPP is to prevent the -undef option from making __MCPP undefined. Previously, GNU C/gcc used by default the -undef option. If you want to disable __MCPP, use the -U option. For the Plan 9/pcc ported MCPP, use -n, instead of -N. The -V option is available for mcpp_std. -V Change the values of the predefined macros __STDC_VERSION__ for C and __cplusplus for C++ to . is of a long type. (In ISO/IEC 9899:1990/Amendment 1:1995, C99, and C++ Standard, this value is set to 199409L, 199901L and 199711L, respectively.) With __STDC__ set to 0, __STDC_VERSION__ is always set to 0L, overriding the -V option. If this option is omitted for C, __STDC_VERSION__ is set to the value of STDC_VERSION in system.H. (For GNU C V.2.7 - V.2.9x, 199409L. For others, 0L.) If specifying -V199901L results in __STDC_VERSION__ >= 199901L, MCPP conforms to the following C99 specifications: (See 3.7.) 1. Treats the text from // to the end of a line as a comment. [1] 2. Allows the sequence of p+, P+, p-, and P-, as well as e+, E+, e-, and E-, in the preprocessing-number. This is to represent a bit pattern of a floating-point number in Hex, like 0x1.FFFFFEp+128. 3. Enables the _Pragma operator (A _Pragma( "foo bar") has the same effect as specifying a #pragma foo bar.) 4. MCPP compiled with the EXPAND_PRAGMA macro set to TRUE will macro- expand an argument on a #pragma line that does not begin with STDC or MCPP. (By default, EXPAND_PRAGMA is set to FALSE in MCPP ported to compiler systems other than Visual C, so macro expansion does not occur.) 5. For compiler-systems with long long, a #if expression is evaluated in long long or unsigned long long. 6. Allows an escape sequence of Universal-Character-Name (UCN) in identifiers, character constants, string literals and pp-numbers. Note that although C99 provides for variable argument macros, MCPP allows them in the C90 and C++ modes. [2] In C++ also, when specifying -V199901L results in __cplusplus >= 199901L, MCPP will enter the C99 compatibility mode, providing the above 2-5 enhancements. (1 is enabled unconditionally and 6 is almost the same.) These are MCPP's own enhancements that do not conform to the C++ Standard. The -D option cannot be used with __STDC__, __STDC_VERSION__, and __cplusplus. This is to distinguish system-defined macros from user- defined ones. For the Plan 9/pcc ported MCPP, use -s, instead of -V. The -e option is available for the MCPP ported to 32 bits or more system. -e Change a multi-byte character encoding to . For , refer to 2.8. The following options are available for MCPP compiled with OK_MAKE == TRUE. -M* options are to output source file dependency lines for makefile. When there are several source files and the -M* option is specified for each of these source files to process and merge the outputs into a file, dependency description lines are aligned. These options are similar to those of GNU C/cpp, but there are several differences. [3] -M Output lines that describe dependency among source files. The output destination is the file specified in a command line, or stdout if omitted. If a dependency description is too long to fit in a line, it is folded over the next lines. The preprocessing result is not output. -MM Almost the same with -M, except that the following header files are not output. 1. Files specified in the format of #include 2. Files specified using an absolute path name, such as #include "/include/stdio.h". 3. Files specified in the format of #include "stdio.h" that are found not in the current or source directory, depending on compiler systems or the -I option, but in system include directories, including those specified with the -I option or with environment variables. -MD [FILE] Almost the same with -M, except that the preprocessing result is output to the specified file on a command line or stdout. If FILE is specified, MCPP outputs dependency description lines to that file. Otherwise, they are output to a file having the same base filename with the source file and the suffix of ".d" instead of ".c". -MMD [FILE] Almost the same with -MD, except that, like -MM, the files that are regarded as system header are not output. An output file MCPP outputs dependency description lines to is same as -MD [FILE]. -MF FILE The dependency lines are output to FILE. -MF FILE takes precedence over -MD FILE or -MMD FILE. -MP "Phony targets" are also output. Each included file can be written as a phony target without a dependency as follows: test.o: test.c test.h test.h: -MT TARGET The target name is specified as TARGET not foo.o. -MT '$(objpfx)foo.o' outputs the following line. $(objpfx)foo.o: foo.c -MQ TARGET Same as -MT, except that a string that has a special meaning to 'make' is quoted as follows: $$(objpfx)foo.o: foo.c Note: [1] In C90 mcpp_std treats // as a comment but issues a warning. [2] This is for compatibility with GNU C/cpp. [3] MCPP differs from GNU C/cpp in that: 1. MCPP does not provide the -MG option because its option specifications are too complicated. (Therefore, I will omit their explanations.) The -M option can substitute for the -MG option because when include files cannot be found using the -M option, MCPP fails but outputs dependency description lines. 2. MCPP excludes a wider range of header files when using the -MM and -MMD options. The GNU C 2/cpp does not exclude the header files shown in 2 and 3 of the -MM option. The GNU C 3/cpp0 now excludes the header files shown in 3 that are found in the system header directory. 4.4BSD-Lite has a /usr/bin/mkdep command. The equivalent command in FreeBSD or Linux is a shell-script that generates several *.d files and merges them into a file named ".depend". This shell-script invokes cpp -M. 2.6 Options by Compiler System The following options are available for MCPP ported to GNU C and some others. -b Output line number information just like C sources. The format used to pass the line number information from a preprocessor to compiler-proper is usually as follows: #line 123 "filename" Most compiler systems can use this C source format, but some systems cannot. The default specification of MCPP is such that, when MCPP is ported to the compiler systems that cannot use the C source format, MCPP outputs the line number information in a format that the compiler-proper can accept it. In GNU C, MCPP outputs this information in the GNU C specific format because, while the compiler- proper can accept this C source format, some cpp versions cannot accept the C format when it is given again, and because some tools, like rpcgen, can accept only the GNU C specific format. However, with this option specified, MCPP ported to the compiler- systems that do not accept the C source format outputs the line number information in that format. This option is used with #pragma MCPP put_defines (#put_defines) to pre-preprocess a header file. The following option is available in compiler-systems under MS-DOS. -m Specify a memory model with . is either t, s, c, m, l, or h. These letters are not case sensitive. With this option specified in Microsoft C, the M_I86mM macro with the "m" replaced with the uppercased letter is pre-set to 1. In other compiler systems, of __TINY__, __SMALL__, __COMPACT__, __MEDIUM__, __LARGE__, and __HUGE__, an appropriate macro is preset to 1. MCPP compiled with OK_SIZE == TRUE evaluates a #if sizeof (type) based on this memory model specification. By default, M_I86SM or __SMALL__ is set to 1. The following options are available for the LCC-Win32 ported MCPP. -g Define the __LCCDEBUGLEVEL macro as . -O Defines the __LCCOPTIMLEVEL macro as 1. The following options are available for the Visual C ported MCPP. -Fl Same as -include for GNU C. -Tc Specify that the source is written in C. The result is same with or without this option. -Tp Same as -+. -u Same as -N. -Wall Same as -W17 (-W1 -W16). -WL Same as -j. -w Same as -W0. -X Same as -I-. The following options are available for the Plan 9/pcc ported MCPP. -i Output a search order of include directories to stderr, -n Same as -N for MCPP ported to other compiler systems. -s Same as -V for MCPP ported to other compiler systems. The following options (until at the end of this 2.6 section) are available for the GNU C ported MCPP. Note that since __STDC__ is set to 1 for GNU C, the result is same with or without the -S1 option. The followings are available across the modes. -dD, -dM Output valid macro definitions in the form of #define lines at the end of preprocessing. With the -dD option specified, the preprocessing result is output too. Predefined macros are not output. With the -dM option specified, the preprocessing result is not output, and predefined macros are output except the Standard predefined ones. [1], [2] -I- Switch the specification of the -I before and after this option; directories specified with the -I options before -I- are used to search for header files only in the form of #include "header.h"; the directories specified with -I after -I-, if any, are used to search for all #include directives. In addition, during the former search, includer's directories are not used. -include #include the before processing the main source file. This is equivalent to writing #include at the beginning of the main source file. -isystem Add to the include path immediately before system-specific directories and immediately after site-specific directories. -lang-c, -x c Perform C preprocessing. The same as not specifying this option at all. -nostdinc Same as -I- for other compiler systems. -undef Same as -N. -Wcomment, -Wcomments Same as -W1. The result is same with or without this option. -Wtrigraphs Same as -W16. -Wall Same as -W17. (With -Wall, MCPP does not issue class 2 and 4 warnings because these warnings are issued frequently and annoying for GNU C standard header files. Class 8 warnings are generally surplus and bothering, but are helpful to confirm portability and etc. To use this option, be sure to specify gcc -Wp,-W31.) -w Same as -W0. -finput-charset= Same as -e . The following options are available for mcpp_std of GNU C ported MCPP. -digraphs Recognize digraphs. Digraphs specification is also reversed by -2. -lang-c89, -std=c89, -std=gnu89 Same as -S1. Not only C90 but also C95 specifications are used. The result is same with or without this option. -lang-c99, -lang-c9x, -std=c99, -std=c9x, -std=gnu99, -std=gnu9x Same as -V199901L. -lang-c++, -x c++ Perform C++ preprocessing. Same as -+. -lang-asm, -x assembler-with-cpp Same as -a for other compiler systems. This option cannot be used with -@post option. -pedantic, -pedantic-errors Same as -W7 (i.e. -W1 -W2 -W4). -std=iso: Specify a version of C Standard. To specify C, is 9899 and C++, 14882. If is 9899, is any of 1990, 199409,1999 and 199901. If is 14882, is 199711. If you enter other value than these in , __STDC_VERSION__ or __cplusplus is set to that value. In this case, must be specified in six digits, like 200503. -trigraphs Recognize trigraphs. Trigraphs specification is also reversed by -3. -ansi Define macro __STRICT_ANSI__ as 1. The following option is available for mcpp_prestd of GNU C ported MCPP. -traditional Same as -@old. MCPP neither makes the following options an error nor does anything about them (It sometime issues a warning.) -A MCPP ignores this option. In GNU C, this option is equivalent to writing #assert in the source code. Standard C, does not accept extensive directives other than #pragma. Fortunately, so far, gcc, by default, passes an equivalent macro with the -D option, so there are no actual problems unless a source program uses #assert, which is a rare case. -$ -g -idirafter -iprefix , -iwithprefix , -iwithprefixbefore -noprecomp -remap In GNU C V.3.3 or later, preprocessor has been absorbed into compiler, and independent preprocessor does not exist. Moreover, gcc often passes to preprocessor the options not for preprocessor, even if it is invoked with -no-integrated-cpp option. MCPP ported to GNU C V.3.3 or later ignores the following options, if it cannot recognize them, as that kind of pseudo-options. -E -c -quiet -W* -m* -f* Other than the above, MCPP provides additional compiler-system-specific options. Refer to usage message. Note: [1] GNU C V.3.3 or later predefines several dozens of macros. -dD option does not regard these macros as predefined and output them. [2] The output of -dM option is similar to that of '#pragma MCPP put_defines' ('#put_defines') with the following differences: 1. 'put_defines' outputs also Standard predefined macros as comments. 2. 'put_defines' outputs also the file name and the line number of the macro definition as a comment, arranging to readable format. On the other hand, -d* options output in the same simple format to GNU C, because some makefiles expect the format. 2.7 Environment Variables The system include directories MCPP does not set up by default must be specified using environment variables. For default system include directories, refer to 4.8. For the search order and environment variable names, refer to 4.2. Auxiliary system include directories to be used vary depending on Linux/ GNU C versions. So environment variables must be set up for these directories. Different GNU C versions use different auxiliary system include directories. The auxiliary system include directories which are not specified in noconfig.H or config.h must be specified with environment variables. Although MCPP needed the environment variable GCC_VERSION, MCPP V.2.5 and later does not need this. For the environment variable LC_ALL, LC_CTYPE, LANG, refer to 2.8. 2.8 Multi-Byte Character Encodings MCPP can process various multi-byte character encodings. Encodings that can be used are different between 16-bit system ported MCPP (herein after called 16-bit MCPP) and 32-bit or more system ported MCPP (herein after called 32-bit or more MCPP). There are limits to the encodings 16- bit MCPP can implement because less memory is available. The encodings common across 16-bit and 32-bit or more MCPP are as follows: EUC-JP: Japanese extended UNIX code (UJIS) shift-JIS: Japanese MS-Kanji GB-2312 : EUC-like Chinese encoding (Simplified Chinese) Big-Five: Taiwanese encoding (Traditional Chinese) KSC-5601: EUC-like Korean encoding (KSX 1001) 16-bit MCPP can implement any one of the above encodings. The following encodings are also available for 32-bit or more MCPP. ISO-2022-JP1: International standard Japanese UTF-8: A kind of Unicode encoding 32-bit or more MCPP implements all these encodings at the same time. The encoding used during execution can be specified as follows (Priority is given in this order): 1. The encoding specified in '#pragma __setlocale( "")' in source code. (For Visual C ported MCPP, '#pragma setlocale( "")'.) This directive allows you to specify several encodings in one source file. 2. The encoding specified with -e or -finput-charset= as run-time option. 3. The encoding specified with the LC_ALL, LC_CTYPE and LANG environment variables. Priority is given in this order. 4. The default encoding specified when MCPP is compiled. How to specify a is basically same across #pragma __setlocale, -e option, and the environment variables; the encoding on the left-side hand is specified by the on right-hand side; is not case sensitive; '-' and '_' are ignored. Moreover, if it has '.', the character sequence to the '.' is ignored. Therefore, EUC_JP, EUC-JP, EUCJP, euc-jp, eucjp and ja_JP.eucJP are regarded as same. '*' represents any character sequence of zero or more bytes.(iso8859-1, iso8859-2 are equivalent to iso8859*.). EUC-JP: eucjp, euc, ujis shift-JIS: sjis, shiftjis, mskanji GB-2312: gb2312, cngb, euccn BIG-FIVE: bigfive, big5, cnbig5, euctw KSC-5601: ksc5601, ksx1001, wansung, euckr IS0-2022-JP1: iso2022jp, iso2022jp1, jis UTF-8: utf8, utf Not specified: c, en*, latin*, iso8859* If any of the following encodings is specified, MCPP is no longer able to recognize multi-byte characters: C, en* (english), latin* and iso8859 *. When a non-ASCII ISO-8859 Latin- single-byte character set is used, one of these encodings must be specified. When an empty name is used (#pragma __setlocale( "")), the encoding is restored to the default. Only in the Visual C++ ported MCPP, the following encoding name can be specified with '#pragma setlocale'. This is for compatibility with Visual C++. It is recommended you should use these names because the Visual C++ compiler cannot recognize encoding names other than these. ('-' can be omitted for MCPP, but not for the Visual C++ compiler-proper.) shift-JIS: japanese, jpn GB-2312: chinese-simplified, chs BIG-FIVE: chinese-traditional, cht KSC-5601: korean, kor Not specified: C, english In Visual C++, the default multi-byte character encoding varies, depending on what language the language parameter and "Region and Language Option" of Windows are set to. However, the #pragma setlocale specification takes precedent over these Windows's settings. Only in the GNU C ported MCPP, the following encoding names can be specified with the environment variable LANG. This is for compatibility with GNU C. It is recommended that you should use these names because the GNU C compiler cannot recognize encoding names other than these. ('-' can be omitted and lowercase letters can be used for MCPP, but for the GNU C compiler-proper, these names must be specified exactly as shown below.) EUC-JP: C-EUC shift-JIS: C-SJIS ISO-2022-JP1: C-JIS Not specified: C Depending on the configuration used to compile the GNU C compiler, the GNU C compiler sometimes recognizes the environment variable LANG's C-* specification and sometimes not. [1] When the compiler fails to recognize it, MCPP complements it. Note: [1] If the --enable-c-mbchar option is specified when a configure script is used to compile GNU C, an encoding can be specified using an environmental variable, such as LANG, gcc's info says. This way of compilation seems to be available from July 1998 onward, but its implementation does not work properly yet at least on V.3.2. It is documented that, besides LANG, environmental variables, such as LC_ALL and LC_CTYPE, can be used to specify an encoding. However, the difference between using LC_ALL and LC_CTYPE or not lies only in their diagnostic messages. The same thing can be said with V.2.95 which is implemented in Vine Linux 2.6 and FreeBSD 4.7. 2.9 How to Use MCPP in One-Path Compilers Compilers whose preprocessor is integrated into themselves are called one-path compilers. These includes Visual C, Borland C, and LCC-Win32. Such compilers are becoming more popular because they can achieve a higher processing speed. However, the time for preprocessing becomes shorter due to better hardware performance. In the first place, there is much point for preprocessing to be a common phase, mostly independent of run-time environment and compiler systems. It is not desirable that one-path compilers become more popular. There will be more compiler- system-specific specifications. Anyhow, it is impossible to replace the preprocessor of a one-path compiler with MCPP. To use MCPP, a source program is preprocessed with MCPP and then the output is passed to a one-path compiler. As you see, preprocessing takes place twice. It is useless but inevitable. Using of MCPP still has merits of source checking and can avail functions not available in resident preprocessor. To use MCPP with a one-path compiler, the procedure must be written in makefile. For sample procedures, refer to the makefile re-compilation settings used to compile MCPP itself, such as visualc.mak, borlandc.mak, and lcc_w32.mak. Although GNU C 3 compiler now integrates its preprocessing facility into itself, gcc provides an option to use an external preprocessor. Use this option when MCPP is used. (See 3.9.7.) 2.10 How to Make MCPP Available in IDE It is difficult to use MCPP in Integrated Development Environment (IDE) because IDE's GUI follows compiler-system-specific specifications and internal interfaces are not usually made available to third parties. Furthermore, one-path compilers make it more difficult to insert a phase in which MCPP is used into IDE. This subsection describes how to make MCPP available in Visual C++ .net IDE. I have only version 4 of Borland C++ IDE, which is too old to do so. I think I can do the same for LCC-Win32's IDE because LCC-Win32 is shareware, although it may take time. I have not tried it. To use Borland C and LCC-Win32 ported MCPP, use command lines. 2.10.1 How to Make MCPP Available in Visual C++ .net's IDE MCPP cannot be used in a normal "project" since the internal specifications of Visual C++'s IDE are not made available to third parties and the compiler is a one-path compiler. However, once a makefile that uses MCPP is created, Visual C++'s IDE can recognize the makefile and you can create a "makefile project" using that file. This allows you to utilize most of the IDE functions, including source editing, search, and debugging. "Creating a Makefile Project" of a Visual C++ .net 2003 document describes how to make a makefile project. Perform the following procedure to create a makefile project. [1] 1. Log in as a user with debugging privilege. [2] 2. Create a makefile that specifies MCPP. (Refer to visualc.mak.) 3. Start Visual Studio .net. [3] 4. Click "New Project" to display the "New Project" window. Select "Makefile Project" and specify "Name" and "Location", and then click "OK". 5. Then the "Makefile Application Wizard" windows appears. Click "Application settings", and enter appropriate values in the "Build command line", "Output", "Clean commands", and "Rebuild command line" fields. Let me explain the appropriate values for these fields by taking an example of compiling MCPP itself and assuming that a MCPP executable, mcpp32_std.exe, is generated. "Build command line": nmake "Output": mcpp32_std.exe "Clean command": nmake clean "Rebuild command line": nmake PREPROCESSED=1 CPP=mcpp32_std Since a Makefile project does not provide a 'make install' equivalent command, you must write the makefile in such a way that the commands you specify in "Build command line" and "Rebuild command line" also perform installation. If you do not compile MCPP, "Build command line" and "Rebuild command line" can be the same. When completed, click "Finish". 6. Then the Makefile project appears in "Solution Explorer". Click the "Source Files" folder, choose "Add Existing Solution Item" from the "Project" menu, select all the source files, and then click "OK". Then the source file names appear in Solution Explorer. You can now use every functions, including Edit, Build, Rebuild and Debugging. Note: [1] This procedure worked properly under Windows 2000. However, for unknown reasons, it did not under Windows XP HE. When I copied the project file generated under Windows 2000 to Windows XP, this project file also worked properly under Windows XP. [2] To use the debugging function under Windows 2000, a user must belong to a group called "Debugger users". However, Windows XP HE does not provide such a group, so one must log in as administrator. In addition, in order to perform the source level debugging function, makefile must be written in such a way that cl.exe is called with the /Zi option appended to generate debugging information. [3] If you start Visual Studio .net by selecting "Start" -> "Programs", environment variables, such as for include directories, are not set when a built executable is executed for debugging. In order to have these variables set, you must open the 'Visual Studio .NET command prompt' to start Visual Studio. net by typing: devenv /useenv 3. Enhancements and Compatibility MCPP has its own enhancements. Each compiler-system-specific preprocessor has its own enhancements, some of which are not available in MCPP. This section covers these enhancements and their compatibility problems. Principally, mcpp_std compiled with HAVE_PRAGMA == TRUE outputs #pragma lines as they are. This principle is applied to the #pragma lines processed by mcpp_std itself. This is because the compiler-proper may interpret the same #pragma for itself. However, mcpp_std outputs neither the lines beginning with '#pragma MCPP' nor lines of '#pragma GCC' followed by either 'poison', 'dependency' or 'system_header', since those lines are for preprocessor only. Also, MCPP does not output '#pragma once' to avoid duplication of preprocessing when the header "pre-preprocessing" facility is used. The pre-preprocessing facility causes MCPP to reprocesses MCPP output. For pre-preprocessing, refer to 3.1. Mcpp_std outputs neither '#pragma push_macro', nor '#pragma pop_macro' because they are useless on the later phases. Mcpp_std compiled with HAVE_PRAGMA == FALSE does not output #pragma lines and issues a warning for those not preprocessed by mcpp_std itself. Mcpp_std compiled with EXPAND_PRAGMA == TRUE expands macros in #pragma line (in actual, EXPAND_PRAGMA is set TRUE for only Visual C ported one). However, #pragma lines followed by STDC, MCPP or GCC are never expanded. #pragma sub-directives are implementation-defined, hence there are risks of same name sub-directive having different meanings to different compiler-systems. Some device is necessary to avoid name collision. Moreover, when EXPAND_PRAGMA == TRUE, there should be a device to avoid the name of #pragma sub-directive itself being macro expanded. This is why MCPP-specific sub-directives begin with '#pragma MCPP' and are not subject to macro expansion. This device is adopted from '#pragma STDC' of C99 and '#pragma GCC' of GNU C 3. '#pragma once' is, however, implemented as it is, since this pragma has been implemented in many preprocessors and has now no risk of name collision. '#pragma __setlocale' is prefixed with "__" instead of MCPP, because it has also meaning for compiler-proper, and because the prefix avoids user-name-space. 3.1 #pragma MCPP put_defines, #pragma MCPP preprocess, #pragma MCPP preprocessed, #put_defines, #preprocess, #preprocessed Mcpp_std uses '#pragma MCPP put_defines', '#pragma MCPP preprocess' and '#pragma MCPP preprocessed'. Mcpp_prestd uses #put_defines, #preprocess and #preprocessed. Let me explain by taking an example of #pragma. When MCPP encounters '#pragma MCPP put_defines' directive, it outputs all the macros defined at that time in the form of #define lines. Of course, the #undef-ed macros are not output. The macros that cannot be #defined or #undef-ed, such __STDC__ and etc, are output in the form of #define lines, but are enclosed with comment marks. (Since __FILE__ and __LINE__ are special macros defined dynamically on a macro invocation, the replacement list output here means nothing.) Mcpp_prestd and 'poststd' mode of mcpp_std do not memorize parameter names of function-like macro definitions. So, these directives mechanically represent names of the first, second, third parameters as a, b, c, ... and so on. If it reaches the 27th parameter, it begins with a1, b1, c1, ..., a2, b2, c2, ... and so on. If you enter the following directive after invoking MCPP from keyboard without specifying input and output files, all the predefined macros are listed. #pragma MCPP put_defines If you invoke MCPP with options such as -S1 or -N, you will see a different set of predefined macros. When the MCPP compiled with DEBUG == TRUE encounters with the following directive: #pragma MCPP put_defines it also outputs a comment to indicate the source file name where each macro definition is found, as well as its line number. When MCPP encounters '#pragma MCPP preprocess' directive, it outputs the following line: #pragma MCPP preprocessed This indicates that the source file has been preprocessed. When MCPP encounters a '#pragma MCPP preprocessed' directive, it determines that the source file has been preprocessed by MCPP and continues to output the code it reads as it is, until it encounters a # define line. When MCPP does encounter a #define directive, MCPP determines that the rest of the source file are all #define lines and defines macros. [1] At this time, the MCPP compiled with DEBUG == TRUE would memorize the source filename and line number in the comment. [2] A '#pragma MCPP preprocessed' is applied only to the lines that follow the directive in the source file where the '#pragma MCPP preprocessed' directive is found. If the source file is an #included one, when control is returned to the #including file, '#pragma MCPP preprocessed' is no longer applied. Note: [1] Actual processing is a little more complex. When MCPP encounters a '#pragma MCPP preprocessed', MCPP outputs lines it has read just as they are, except for #line lines, which MCPP converts and outputs into a format that the compiler-proper can accept. MCPP disregards predefined standard macro because its #define line is enclosed with comment marks. [2] Therefore, information on where a macro definition is found is not lost during pre-preprocessing. This means, in turn, the MCPP compiled with DEBUG == FALSE cannot use this information. 3.1.1 Pre-Preprocessing Header File With above directives, you can "pre-preprocess" header files. Pre- preprocessing considerably saves the entire preprocessing time. I think the explanation so far has already given you an understanding of how to pre-preprocess header files, but to deepen your understanding, let me explain it by taking an example of MCPP's own source code. MCPP source consists of seven *.c files, of which six files include "system.H" and "internal.H". No other headers are included. The source looks like this: #if PREPROCESSED #include "mcpp.H" #else #include "system.H" #include "internal.H" #endif The system.H includes noconfig.H or configed.H, as well as several standard header files. mcpp.H is not a source file I provide and is a "pre-preprocessed" header file I am going to generate. To generate mcpp.H (Of course, after setting up system.H and other headers), invoke MCPP as follows: mcpp_std > mcpp.H For compiler systems, such as GNU C, also specify the -b option. Enter the following directives from the keyboard: #pragma MCPP preprocess #include "system.H" #include "internal.H" #pragma MCPP put_defines Enter "end-of-file" to terminate MCPP. This has accomplished mcpp.H, which consists of the preprocessed system. H and internal.H and a set of #define lines following them. Including mcpp.H gives the same effect as including system.H and internal.H, but its size is one-nth of the original header files containing standard ones. This is because #if and comments are eliminated. It takes far less time to include mcpp.H in seven or eight *.c files than to include system.H and internal.H seven or eight times. By using #pragma MCPP preprocess, much more time can be saved. On compilation, use the -DPREPROCESSED=1 option. It is recommended that the above procedure should be written in a file and the makefile should refer to it. The makefile and preproc.c appended to MCPP sources contain the procedure. Please refer to it. Although it is difficult to find a way to use independent preprocessor for one-path compilers like Visual C, Borland C or LCC-Win32, the pre- preprocessing facility is useful. The pre-preprocessing facility of header files is similar to that of the -dD option of GNU C/cpp, but it differs from it in that: 1. GNU C/cpp outputs line number information not in the form of #line 123 "filename", but in the form of # 123 "filename", which allows GNU C/ cpp to reprocess the information, but the Standard C preprocessor cannot. 2. GNU C/cpp of older version outputs a #define line whenever it encounters it, but does not output a #undef line. Therefore, reprocessing the preprocessed result may produce a different result from what the original source intends. 3. By using #pragma MCPP preprocess, which is not provided by GNU C, MCPP can provides a higher processing speed. As far as the pre-preprocessing facility is concerned, MCPP is more accurate and practical than GNU C/cpp. 3.2 #pragma once, #pragma once #pragma once directive is implemented in mcpp_std, that is, __STDC__ is preset to 0 or higher. [1] #pragma once is also available for GNU C, Visual C, LCC-Win32 and stand- alone preprocessor called Wave. These directives is used when you want to include a header file only once. With the following directive in a header file, MCPP includes the header file only once even if a #include line for that file appears many times. #pragma once Usually, compiler-system-specific standard header files prevent duplicate definitions by using the following code: #ifndef __STDIO_H #define __STDIO_H /* Contents of stdio.h */ #endif #pragma once provides similar functionality to this. Using macros always involves reading a header file. (The preprocessor cannot skip reading the code as people do and must read the entire header file for # if's or #endif's; It must read a comment before it can determine whether a line is a control line, that is, a line with # at the beginning followed by a preprocessing directive; To do so, the preprocessor must identify a string literal; After all, it must read through the entire header file and perform most of tokenization. ) #pragma once eliminates the need of even accessing to a header file, resulting in a improved processing speed for multiple includes. To determine whether two header files are identical, file name characters, including directory names in a search path, are compared. DOS/Windows is not case sensitive. Therefore, "/DIR1/header.h" and "/DIR2/header.h" are regarded as distinct, but "header.h" and "HEADER.H" are regarded as the same on DOS/Windows, but distinct on UNIX-like systems. I borrowed the idea of #pragma once from GNU C V.1.*/cpp. GNU C V.2.*, and V.3.*/cpp still has this functionality but it is regarded as obsolete. The specification of GNU C V.2.*/cpp has been changed as follows: If the entire header file is enclosed with #ifndef _MACRO, # define _MACRO, and #endif, the cpp memorizes it and inclusion occurs only once, even without #pragma once. However, this GNU C V.2 and V.3/cpp specification sometimes does not work for commercially available compiler systems that are not based on the GNU C specification, due to a difference in the standard header file notation. In addition, the GNU C V.2 and V.3/cpp specification is more complex to implement. For this reason, I decided to implement only # pragma once. As with other preprocessors, it is not advisable to rely only on #pragma once when the same header files are used. It is recommended that # pragma once should be combined with macros as follows: #ifndef __STDIO_H #define __STDIO_H #if __MCPP >= 2 #ifdef __STDC__ #pragma once #else #ifdef __cplusplus #pragma once #endif #endif #endif /* Contents of stdio.h */ #endif Assuming a pre-Standard preprocessor is also used, the above code fragment uses #if, #else, and #ifdef only, and does not use 'defined' operators. If the pre-Standard preprocessor is not used at all, the following one-line coding is enough: #pragma once Note that #pragma once must not be written in . For the reason, see 4.1.2 of cpp-test.txt. The same thing can be said with and of C++. Another problem is that the recent GNU C/glibc system has header files, like , which are repeatedly #included by other system headers. They define macros, such as __need_NULL, __need_size_t, and __need_ptrdiff_t, and then #include . Each time they do so, definitions such as NULL, size_t, and ptrdiff_t are defined in the . The same thing can be said with and , and even with . Other system headers define macros, such as __need___FILE, and then #include . Each time they do so, definitions such as FILE may be defined in . #pragma once can not be used in such header files. [2] Note: [1] MCPP has had also #pragma __once until V.2.4.1. It was removed, since it seems unnecessary. [2] This is applied at least to Linux/GNU C 2.9x and 3.2/glibc 2.1 and 2. 2. FreeBSD 4.* has much simpler system headers because it does not use glibc. 3.2.1 Tool to Write #pragma once to Header Files With a small number of header files, writing #pragma once to them does not require much effort, but it would be tremendous work if there are many header files. I wrote a simple tool to write it automatically to header files. tool/ins_once.c is a tool written for relatively old versions of GNU C. As Borland C 4.0 conform to the same standard header file notation with GNU C, this tool can be used. However, it is advisable that this tool should not be used in the systems like Glibc 2 that has many exceptions shown above. Even in the compiler systems that can use the tool, some header files do not strictly conform to the GNU C notation. GNU C V.2.*/cpp's read-once functionality also does not work properly for these header files. Compile ins_once.c and perform the following command in a directory, such as /usr/include or /usr/local/include, under Unix. chmod u+w *.h */*.h */*/*.h and then execute ins_once as follows: ins_once -t *.h */*.h */*/*.h Ins_once reports header files that do not begin with #ifndef or #if ! defined. Manually modify these files. Then, execute ins_once as follows: ins_once *.h */*.h */*/*.h If the first directive in each header file is #ifndef or #if !defined, a #pragma once line is inserted immediate below the line. Only a root user or a user with an appropriate permission is eligible for this modification. When you modified your access permission, use chmod u-w *. h */*.h */*/*.h to restore to your original permission. Ins_once provides the following options. Select the most appropriate one for your system. -t: Check whether a header file begins with #ifndef or #if !defined, excluding a comment. This option does not modify the file. -p: Insert a #pragma once line at the beginning of file. By default, this line is inserted immediate below the #ifndef or # if !defined line. -o: Generate the nine lines shown above in preparation for preprocessors that cannot accept #pragma. By default only one # pragma once line is inserted. -g: For a GNU C new system, , , , are also excluded. By default, only , and are excluded. ins_once roughly checks to write a #pragma once line only once in the same header file even if it is executed several times, but the check is not very strict. As this ins_once is of temporary and tentative nature, it scarcely performs tokenization. It worked as I expected with FreeBSD 2.0, 2.2.2,2.and 2.7, Borland C 4.0, but it may not work properly for special header files. So before executing this tool, be sure to make a backup of an original file. Have the shell expand a wild-card. (In case of buffer overflow, execute ins_once several times by specifying some of your system header files.) 3.3 #pragma MCPP include_next, #pragma MCPP warning, #include_next, #warning These directives are provided for compatibility with GNU C/cpp. GNU C/ cpp provides the #include_next and #warning directives. Although these directives are non-conformant, not only some source programs sometimes use them but also some Glibc2 system header files do. Taking this situation into consideration, I decided to implement the #include_next and #warning directives in GNU C ported MCPP to allow compilation of such source programs, however, mcpp_std issues a warning when it finds the directives. Regardless of the compiler systems MCPP is ported to, mcpp_std also implements #pragma MCPP include_next and #pragma MCPP warning. With following directives, MCPP skips the directory where the first header.h is found and includes the second header.h found during include directory search. #pragma MCPP include_next or #include_next DOS/Windows ignores the distinctions of alphabetical case of header names. The following code outputs 'any message' to stderr as a warning message: #pragma MCPP warning any message #warning any message Different from #error, this does not cause an error. 3.4 #pragma MCPP push_macro, #pragma MCPP pop_macro, #pragma push_macro, #pragma pop_macro #pragma __setlocale, #pragma setlocale When I ported MCPP to Visual C, I implemented these directives in MCPP, and then made them available for other compiler system ported MCPP. #pragma MCPP push_macro( "MACRO") and #pragma MCPP pop_macro( "MACRO") are used to "push" or "pop" a macro definition (MACRO) to the current macro definition stack. #pragma push_macro( "MACRO") and #pragma pop_macro( "MACRO") are also available for Visual C. push_macro saves a macro definition to the stack, and pop_macro retrieves the macro definition. The pushed macro definition remains valid after push_macro. To invalidate it, use #undef or redefine the macro with a new definition. push_macro can be used many times. #pragma __setlocale( "") changes the current multi-byte character encoding to . The argument of setlocale must be a string literal. For , refer to 2.8. This directive allows you to use several encodings in one translation unit. This directive is available in MCPP ported to 32-bits or more system. In Visual C++, #pragma __setlocale cannot be used. Use #pragma setlocale instead. Encoding specification must be conveyed not only to MCPP but also to the compiler-proper. The latter can recognize only # pragma setlocale. For other compiler systems, when the compiler-proper cannot recognize an encoding, MCPP complements it. There is not yet any compiler-proper which can recognize #pragma __setlocale. 3.5 #pragma MCPP debug, #pragma MCPP end_debug, #debug, #end_debug Mcpp_std implements #pragma MCPP debug and #pragma MCPP end_debug. Mcpp_prestd implements #debug and #end_debug. If MCPP is compiled with DEBUG == TRUE and/or DEBUG_EVAL == TRUE, the corresponding debug information is output. The #pragma MCPP debug directive can be written anywhere in a source program. specifies a debug information type. One #pragma MCPP debug directive can take several . One or more must be specified for each directive. MCPP begins to output debug information when it finds this directive, and stops it when it encounters #pragma MCPP end_debug . The can be omitted, in which case all types of debug information is reset. If contains an argument that is not supported by MCPP, MCPP issues a warning, but all the preceding arguments are regarded as valid. All the debug information is output to the same path with the preprocessing output to synchronize with it. Therefore, this directive prevents compilation. This directive is normally used without specifying the output destination, in which case the debug information is output to screen, so that you can trace it with your eyes. When you noticed something was wrong with the preprocessing result, enclose the coding you want to debug with the following directives, for example: #pragma MCPP debug token expand /* Coding you want to debug */ #pragma MCPP end_debug As this directive was originally used for debugging MCPP itself, it was not developed with end users in mind. So, you may not understand its behavior unless you read its source code, and you may sometimes feel it outputs too much information, but it is useful for tracing the preprocessing process. Be patient. #pragma MCPP debug and #debug are not implemented unless at least either DEBUG or DEBUG_EVAL is set to TRUE in system.H. In case of DEBUG == TRUE, tokenization or macro expansion can be traced. If DEBUG_EVAL == TRUE, evaluation of a #if line can be traced. The following debug information types can be specified with . For DEBUG == TRUE path Displays the include file search path. token Parses tokens one by one and displays its type. expand Traces a macro expansion process. if Displays the result (true or false) of #if, #elif, #ifdef and #ifndef. getc Traces preprocess 1-byte by 1-byte. memory Displays the status of heap memory used by MCPP. For DEBUG_EVAL == TRUE expression Traces #if expression evaluation. 3.5.1 #pragma MCPP debug path, #debug path With these directives, MCPP displays include directories in the search path (excluding the current and source directories with which search begins) in the order of priority, starting with the highest one first. In addition, with a #include directive, MCPP displays all the directories, including the current one, it actually searched for the # include file. When a header file with #pragma once specified is # included again, the message to that effect is displayed. 3.5.2 #pragma MCPP debug token, #debug token With these directives, MCPP displays a source line it has read, and then displays a token and its type on the source line each time it has read. This token, more specifically, is a preprocessing-token (pp-token). Not only pp-tokens on a source line but also ones MCPP reads again internally during macro expansion are displayed repeatedly. However, the following 1-byte tokens are not displayed for MCPP program' s convenience sake: 1. '#' at the beginning of a preprocessing directive line. 2. '(' at the beginning of a parameter list of a function-like macro definition. 3. ',' delimiting between function-like macro definition parameters. 4. '(' at the beginning of an argument list used for a function-like macro invocation. A pp-token has the following types: NAM: Identifier STR: String literal NUM: Preprocessing-number WSTR: Wide string literal OPE: Operator or punctuator CHR: Character constant SPE: Special pp-tokens, such as $ and @ SEP: Token separator white space WCHR: wide character constant Of SEP, other than are not normally displayed. Control codes such as are displayed as <^J> or <^M>. 3.5.3 #pragma MCPP debug expand, #debug expand With these directives, mcpp_std traces the expansion process of a macro invocation. When mcpp_std encounters a #pragma MCPP debug, it behaves as follows: If there is a macro invocation, mcpp_std displays the macro definition. Each argument is read, the argument is substituted for the corresponding parameter in the replacement list and the replacement list is rescanned. mcpp_std displays this whole process. In case of nested macro definitions, they are rescanned and expanded one by one. If an argument has a macro, mcpp_std traces the above process recursively before parameter substitution. Each time control is passed to and returned from a certain set of mcpp_std internal functions, mcpp_std displays the trace information along with the function name. The following table shows the role of these functions. Reading mcpp_std source code will gives you a concrete idea on what each function is doing. expand Entrance routine for macro expansion replace Expands a macro one level down. collect_args Collects arguments. prescan Scans a replacement list and processes # and ## operator. substitute Substitutes a parameter with an argument. rescan Rescans a replacement list. Except for expand, above functions are indirectly recursive with each other. For replace and collect_args, mcpp_std displays data it internally stacks during macro expansion. This data is displayed using the following internal codes: Nth parameter Token delimiter inserted by mcpp_std Code that inhibits re-replacement of the macro of the same name Code that indicates the end of a replacement list Code that indicates an identifier taken from source file while rescanning is used only in mcpp_std when it is in neither 'poststd' mode nor 'compat' mode. It is recommended that '#pragma MCPP debug token' should be also used. For #debug expand, mcpp_prestd uses internal routines considerably different from those used for mcpp_std. The explanations are omitted. 3.5.4 #pragma MCPP debug if, #debug if With these directives, MCPP displays #if, #elif, #ifdef and #ifndef lines and reports their evaluation result (true or false). As a #if section is skipped, no report is made. 3.5.5 #pragma MCPP debug expression, #debug expression With these directives, MCPP traces evaluation of a #if or #elif expression. DECUS cpp, based on which MCPP has been developed, provides these directives for the purpose of debugging cpp itself. I scarcely modified them. This directive outputs a very long list of internal functions, as well as variable names and their values. Unless you read the MCPP source code, you may not understand these variables. However, without the source code, you can manage to understand how the MCPP pushes onto and takes out of a evaluation stack a complex expression value. 3.5.6 #pragma MCPP debug getc, #debug getc With these directives, MCPP outputs detailed data each time it calls get (), a function to read one byte. When mcpp_std scans a pp-token, it calls get() to read only the first byte of the pp-token. With a #debug getc, MCPP calls get() during token scan, resulting in a tremendous amount of data output. In any way, using these directives outputs a huge amount of data, so you scarcely need to use them. 3.5.7 #pragma MCPP debug memory, #debug memory With these directives, MCPP reports the status of the heap memory it has internally allocated or released using malloc(), realloc() or free() only once. Only the malloc() I developed and some other types of malloc () provide this functionality. Refer to "4.extra" of mcpp-porting.txt. In case of other malloc(), MCPP will neither cause an error nor report a status. MCPP reports the heap memory status again when it terminates with these directives on. The same thing happens when MCPP terminates due to out of memory. 3.6 #assert, #asm, #endasm #assert is implemented in mcpp_prestd, if the target compiler is not GNU C. #assert provides the functionality equivalent to the #error directive in the Standard C. The following code in the Standard C: #if ULONG_MAX/2 < LONG_MAX #error Bad unsigned long handling. #endif can be expressed as: #assert LONG_MAX <= ULONG_MAX/2 The argument of #assert is evaluated as a #if expression. If it evaluates to true (non-zero), MCPP does nothing and if false (0), it displays the following message and then the argument line (after processing line splicing and comments): Preprocessing assertion failed MCPP counts this as error but continues processing. This #assert is quite different from that of System V or GNU C/cpp. MCPP regards a block enclosed with the #asm and #endasm directives as assembler coding. Mcpp_prestd implements this functionality for Microware C/09 only. To implement this functionality in other compiler systems, do_old() and put_asm() in system.c must be modified. For a #asm block, MCPP performs trigraphs conversion and deletes sequence, but it neither performs comment processing, checks tokens or characters, nor deletes white-space characters at the beginning of a line. Also, it does not expand a token that happens to have the same name with a macro and outputs it as it is. Other directive lines have no meaning within the #asm block. These #asm and #endasm directives do not conform to Standard C. In the first place, extensive directives in the form other than "#pragma sub- directive" are not Standard C conformant. Changing their directive names to #pragma asm and #pragma endasm does not solve this problem. In Standard C, the source code must consist of a C token sequence (more precisely, a preprocessing token sequence), however, an assembler program is not a C token sequence. To use assembly code in the Standard C, there is no other way but to embed it in a string literal token. Then, you have to implement a built-in function that processes that string literal in the compiler-proper and call it as follows: asm ( " leax _iob+13,y\n" " pshs x\n" ); However, this is not suitable for a longer assembly code, in which case, you had better write the assembly code as a separate file like a library function, and assemble and link the program. This seems to be inconvenient, but it is necessary to separate the assembler portion completely to write a portable C program. It is recommended that you should write assembly code in a separate file rather than using #asm. 3.7 New C99 Features (_Pragma() Operators, Variable Argument Macros and others) These features are implemented in mcpp_std. The -V199901L option with __STDC_VERSION__ set to 199901L enables the following C99's features. The same thing can be said with C++ for the -V199901L option with __cplusplus set to 199901L or more. Although C++ Standard does not provides for the features other than 1 or 7, mcpp_std provides them for better compatibility with C99. Mcpp_std allows variable argument macros in the C90 and C++ modes. [1] 1. Treats the text from // to the end of a line as a comment. 2. Enables variable argument macros. 3. Allows the sequence of p+, P+, p-, and P-, as well as e+, E+, e-, and E-, in the preprocessing-number. This is to represent a bit pattern of a floating-point number in Hex, like 0x1.FFFFFEp+128. 4. Enables the _Pragma() operator. 5. Mcpp_std compiled with the EXPAND_PRAGMA macro set to TRUE macro- expands the arguments on a #pragma line that do not begin with STDC, MCPP nor GCC. (By default, mcpp_std is compiled with EXPAND_PRAGMA == FALSE, so they are not macro expanded.) 6. For compiler-systems with long long, a #if expression is evaluated in long long or unsigned long long. 7. Allows an escape sequence named UCN for Unicode in the forms of \unnnn and \Unnnnnnnn in identifiers, character constants, string literals and pp-numbers. The value of a UCN in #if expression is evaluated as a hexadecimal representation. (UCN cannot be used in 'poststd' mode.) A variable argument macro takes a form of: #define debug(...) fprintf(stderr, __VA_ARGS__) Here is a macro invocation: debug( "X = %d\n", x); This macro is expanded as follows: fprintf(stderr, "X = %d\n", x); "..." in the parameter list corresponds to one or more parameters. In the above example, "..." corresponds to __VA_ARGS__ in the replacement list. During a macro invocation, several arguments that correspond to the "...", including ",", are concatenated to be treated as one argument. _Pragma( "foo bar") has the same effect as specifying #pragma foo bar. The argument of the _Pragma() operator must be one string literal or wide string literal. For a wide string, the prefix (L) is deleted, and for a string literal, " enclosing that string literal is deleted, and \" and \\ in that literal is replaced with "and \, respectively, before it is treated as a #pragma argument. #pragma must be written somewhere in one logical line and its argument is not macro-expanded at least for C90. On the other hand, the _Pragma() operator can be written anywhere in source code (even in a replacement list), which gives the same effect with #pragma written in a logical line. The _Pragma() operator generated during macro expansion is also valid. This flexibility provides the pragma directive with a wide range of portability and allows a header file to absorb the difference in # pragma among compiler systems. (For this sample, see pragmas.h and pragmas.t of "Validation Suite".) [2] C99 stipulates a #if expression is of maximum integer type. As "long long" and "unsigned long long" are required types, the type of an #if expression is "long long" or "unsigned long long" or larger. Note: [1] This is for compatibility with GNU C. It is difficult also for other compiler systems to implement C99 specifications all at once. Probably, they will begin to implement them little by little with __STDC_VERSION__ set to 199409L or so. [2] C99 stipulates that a #pragma argument that begins with STDC is not macro-expanded. For other #pragma arguments, whether macro is expanded is implementation-defined. MCPP compiled with the EXPAND_PRAGMA macro in the system.H set to TRUE will macro-expand a #pragma argument. 3.8 Asm Statement in Borland C and Other Special Syntaxes Borland C has the asm keyword. This keyword is used to write assembly code as follows: asm { mov x,4; ...; } This is quite irregular and deviates from the C grammar more than #asm. If there happen to be a token with the same name as a macro, it will be macro-expanded. The same can be said with Borland C and MCPP itself. It is recommended that an assembler program should be written in a separate .ASM file. MCPP compiled with the TOP_SPACE macro set to FALSE deletes white-space characters at the beginning of a line, resulting in column misalignment in a line. Visual C++ also has the __asm keyword, which provides the similar functionality to this. GNU C provide a Standard-conformant built-in function, asm( " mov x,4\n"). 3.9 Compatibility with GNU C/Cpp Although I tried to develop MCPP in such manner that the GNU C ported mcpp_std provides compatibility with GNU C/cpp to the extent that it does not hinder practical use, it is still incompatible in many aspects. First of all, as shown in Chapter 2, there are many differences in execution options. MCPP implements neither -A option nor non-conformant directives, including #assert and #ident. [1] Fortunately, there seems to be quite few sources that cannot be compiled due to a lack of this compatibility. It is more problematic that there are some sources that assume special specifications of old preprocessors. Most of such source code receives a warning when -pedantic is specified in GNU C. Mcpp_std, by default, provides almost the same functionality as GNU C/cpp's -pedantic or -pedantic-errors since it implements Standard conformant error checking. However, since GNU C/cpp, by default, allows such Standard violations without issuing a diagnostic, there are some sources that take advantage of this. It is very easy to rewrite such non-conformant code to Standard- conformant code, so it is meaningless to take the trouble to write non- conformant code only to impair portability and, what is worse, to provide a hotbed of bugs. When you find such code, do not hesitate to correct it. Note: [1] The functionality of #assert and #ident should be implemented using #pragma, if necessary. The same can be said with #include_next and # warning, but these directives seem to be sometimes used in GNU C, so I grudgingly implemented them in MCPP, however, a warning is issued when they are used. 3.9.1 Preprocessing FreeBSD 2/Kernel Source Taking FreeBSD 2.2.2-R's kernel source code as an example, this section explains some preprocessing problems. All the directories that appear in this section are installed in /sys (/usr/src/sys). Of the items I point out below, 3.9.1.7 and 3.9.1.8 are not necessarily Standard violations and work as expected in MCPP, but MCPP issues a warning because their coding is confusing. 3.9.1.6 is an enhancement and C99 provides the same functionality, but it differs from GNUC/cpp in notation. 3.9.1.1 Multi-Line String Literal Assembly codes are embedded by the following manner in i386/apm/apm.c, i386/isa/npx.c, i386/isa/seagate.c, i386/scsi/aic7xxx.h, dev/aic7xxx/ aic7xxx_asm.c, dev/aic7xxx/symbol.c, gnu/ext2fs/i386- bitops.h, pc98/ pc98/npx.c: asm(" asm code0 #ifdef PC98 asm code1 #else asm code2 #endif ... "); When no " closing a string literal appears by the end of line, GNU C/cpp, by default, interprets that the string literal ends at the end of line. The above coding is based on this specification. In addition, the compiler-proper seems to interpret the whole content of asm() as a string literal spreading across lines. I think that assembler source code should be written in an separate file, but if you want to embed it in ".c" file by all means, write it in the following manner, instead of using the confusing coding shown above. asm( " asm code0\n" #ifdef PC98 " asm code1\n" #else " asm code2\n" #endif " ...\n" ); Standard C conformant preprocessors will accept it. 3.9.1.2 #else junk, #endif junk The following line appears in ddb/db_run.c, netatalk/at.h, netatalk/aarp. c, net/if-ethersubr.c, i386/isa/isa.h, i386/isa/wdreg.h, i386/isa/tw.c, i386/isa/b004.c, i386/isa/matcd/matcd.c, i386/isa/sound/sound_calls.h, i386/isa/pcvt/pcvt_drv.c, pci/meteor.c, and pc98/pc98/pc98.h: #endif MACRO This line should be changed to: #endif /* MACRO */ 3.9.1.3 #ifdef 0 To my surprise, i386/apm/apm.c contains the following strange line: #ifdef 0 Of course, this should be written as: #if 0 This code must have been neither debugged nor used. 3.9.1.4 Duplicate Definition of Macro gnu/i386/isa/dgb.c has a duplicate definition of the following macro: #define DEBUG Some of header files have a macro definition conflicting with this. The Standard C regards duplicate definitions as "undefined", but how they are treated depends on compiler systems; some make the first definition valid after issuing an error message and others, like GNU C 2 /cpp, make the last definition valid without issuing any messages. To make the last definition valid, the following code should be added immediately before the last definition. #undef DEBUG 3.9.1.5 #warning i386/isa/if_ze.c, and i386/isa/if_zp.c have the #warning directive. This is the only Standard violation directive I found in the kernel source. To conform to the Standard C, there is no way but to comment out this line. MCPP V.2.3 or higher accepts #warning. 3.9.1.6 Variable Argument Macros gnu/ext2fs/ext2_fs.h and i386/isa/mcd.c have the following macro that takes variable number of arguments: #define MCD_TRACE(fmt, a...) \ { \ if (mcd_data[unit].debug) { \ printf("mcd%d: status=0x%02x: ", \ unit, mcd_data[unit].status); \ printf(fmt, ## a); \ } \ } # define ext2_debug(fmt, a...) { \ printf("EXT2-fs DEBUG (%s, %d): %s:", \ __FILE__, __LINE__, __FUNCTION__); \ printf(fmt, ## a); \ } This is a GNU C/cpp-specific enhanced specification and cannot be applied to other compiler systems. The above "## a" can be simply written as "a". With ## and in the absence of an argument corresponding to "a..." in a macro invocation, the preceding comma is deleted. C99 also provides for variable argument macros, but their notation differs from that of GNU C/cpp. The above example is written as follows in C99: #define MCD_TRACE( ...) \ { \ if (mcd_data[unit].debug) { \ printf("mcd%d: status=0x%02x: ", \ unit, mcd_data[unit].status); \ printf( __VA_ARGS__); \ } \ } # define ext2_debug( ...) { \ printf("EXT2-fs DEBUG (%s, %d): %s:", \ __FILE__, __LINE__, __FUNCTION__); \ printf( __VA_ARGS__); \ } The most annoying difference is that in C99 requires one or more arguments on a macro invocation corresponding to "..." while GNUC/cpp requires 0 or more arguments corresponding to "a...". To handle this, when there is no argument corresponding to "...", MCPP V.2.3 or higher issues a warning, instead of making it an error. Therefore, you can change the above code as follows: #define MCD_TRACE(fmt, ...) \ { \ if (mcd_data[unit].debug) { \ printf("mcd%d: status=0x%02x: ", \ unit, mcd_data[unit].status); \ printf(fmt, __VA_ARGS__); \ } \ } # define ext2_debug(fmt, ...) { \ printf("EXT2-fs DEBUG (%s, %d): %s:", \ __FILE__, __LINE__, __FUNCTION__); \ printf(fmt, __VA_ARGS__); \ } This is simpler with one-to-one correspondence. However, this way of writing has a disadvantage that a comma immediately before an empty argument remains, resulting in, for example, printf( fmt, ). In this case, there is no other way but to write a macro definition in accordance with C99 specifications, or avoid using an empty argument in a macro invocation. Harmless tokens, such as NULL or 0, are used to write, for example, MCD_TRACE(fmt, NULL). [1] Note: [1] To use MCPP, source code must be rewritten in this way. In addition, with the -Q option, a huge amount of warnings is output not to the screen but to the mcpp.err file. GNU C 2.95.3 or later also implements variable argument macros based on the C99 syntax. It is recommended to use this syntax in the future. GNU specific one provides the flexibility of allowing for zero number of variable argument macros, but its notation is bad in that (1) for the "args..." parameter, a white space must not be inserted between "args" and "...", but such a pp-token is not permitted, and that (2) it is not desirable that the notation for a token concatenation operator is used to indicate a variable argument in a replacement list. It is desirable to allow zero number of variable arguments based on the C99 notation. GNU C 3 introduced a notation for variable argument macros that is a mixture of GNU C 2's traditional notation and C99 one. For details, refer to 3.9.6.3. 3.9.1.7 Empty Arguments during Macro Calls The following macro invocations appear in nfs/nfs.h, nfs/nfsmount.h, nfs /nfsmode.h, netinet/if_ether.c, netinet/in.c, sys/proc.h, sys/socketvars. h, i386/scsi/aic7xxx.h, i386/include/pmap.h, dev/aic7xxx/scan.l, dev/ aic7xxx/aic7xxx_asm.c, kern/vfs_cache.c, pci/wd82371.c, vm/vm_object.h, and vm/device/pager.c. So do in /usr/include/nfs/nfs.h. LIST_HEAD(, arg2) TAILQ_HEAD(, arg2) CIRCLEQ_HEAD(, arg2) SLIST_HEAD(, arg2) STAILQ_HAED(, arg2) The first argument is empty. C99 approved empty arguments but C90 regarded them as undefined. Taking it consideration that an argument may happen to be empty during a nested macro invocation, empty arguments should be approved, however, it is neither necessary nor desirable to write an empty argument in source code. Note that for a one-argument macro, there is syntax ambiguity between an empty argument and a lack of argument. Taking everything into consideration, the following notation is recommended: #define EMPTY LIST_HEAD(EMPTY, arg2) TAILQ_HEAD(EMPTY, arg2) CIRCLEQ_HEAD(EMPTY, arg2) SLIST_HEAD(EMPTY, arg2) STAILQ_HAED(EMPTY, arg2) Any Standard C conformant preprocessor will accept this notation. By the way, some of the header files (in the nfs directory) shown in the previous page neither have the macro definitions shown above nor # include any other header files. This is because such header files assume that these macro definitions exist in sys/queue.h and that *.c programs will #include sys/queue.h first. These files arise ambiguity. kern/kern_mib.c has the following macro definitions: SYSCTL_NODE(, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9) In this case, the first argument cannot be changed to EMPTY. Because the corresponding macro definition in the sys/sysctl.h is as follows: #define SYSCTL_NODE(parent, nbr, name, access, handler, descr) extern struct linker_set sysctl_##parent##_##name; SYSCTL_OID(parent, nbr, name, CTLTYPE_NODE|access, (void*)&sysctl_##parent##_##name, 0, handler, "N", descr); TEXT_SET(sysctl_##parent##_##name, sysctl__##parent##_##name); In other words, these arguments are not macro-expanded. The arguments of the SYSCTL_OLD macro shown above, including the first one, are not macro expanded. In this case, there is no way but to leave the empty argument as it is. [1] Note: [1] C99 approves empty arguments as legitimate. Taking macros, such as SYSCTL_NODE () and SYSCTL_OID (), into consideration, the EMPTY macro is not almighty and using empty arguments has some reason. In addition, even if EMPTY is used, a nested macro invocation may cause empty arguments. However, for source readability, using EMPTY is recommended whenever possible. 3.9.1.8 Object-Like Macros Replaced with Function-like Macro Name i386/include/endian.h, as well as /usr/include/machine/endian.h, has the following macro definitions. (There are four same kinds of definitions.) #define __byte_swap_long(x) (replacement text) #define NTOHL(x) (x) = ntohl ((u_long)x) #define ntohl __byte_swap_long The problem is the ntohl definition. Although ntohl is an object-like macro, it is expanded to a function-like macro name, then rescanned with subsequent text, and is expanded as if it were a function-like macro. This way of macro-expansion has been regarded as an implicit specification since K&R 1st, and the Standard C somehow approved it as legitimate. However, as I discuss in other documents, it is this specification that makes macro-expansion unnecessarily complicated and brings confusion to Standard documents. This is a bug specification. [1] This ntohl is actually a function-like macro, written as an object-like macro omitting the arguments list. You had better define this like a function-like macro that it is: #define ntohl(x) __byte_swap_long(x) This causes no problem. i386/isa/sound/os.h has the same kind of macro definitions: #define INB inb #define INW inb This should be written as follows: #define INB(x) inb(x) #define INW(x) inb(x) Note: [1] ISO 9899:1990 Corrigendum 1:1994 regarded the notation as undefined. C99 replaced this article with other. However, Standard documents are still confusing about this. For details, see 1.7.6 of cpp_test.txt. 3.9.1.9 Preprocessing .S File Some kernel sources are contained in several ".S" files, that is, they are written in assembler. These sources contain #include's or #ifdef's, which require preprocessing. To preprocess them, in FreeBSD 2.2.2-R, 'cc' is called with the '-x assembler-with-cpp' option, and 'cc' calls '/usr/libexec/cpp' with the '-lang-asm' option and then calls 'as'. Of course, this ways of using .S files is non-conformant. This assembler source code must not contain a token that happens to have the same name with a macro. White spaces between tokens and at the beginning of a line must be retained during preprocessing.. In addition, if the first token at the beginning of a line is a # indicating an assembler comment, special processing is required on the preprocessor side. This not only considerably limits available preprocessors but also increases the possibility of unknowingly introducing bugs. So, using .S files in this way is not recommended. [1] To preprocess source code for use with several types of machines, the code should be written in the following manner and be saved in not ".S" but ".c" file. 4.4BSD-Lite actually adopts this way of coding. asm( " asm code0\n" #ifdef Machine_A " asm code1\n" #else " asm code2\n" #endif " ...\n" ); Note: [1] In FreeBSD 2.0-R, these kernel sources are contained not in *.S but in *.s file. The Makefile is so defined as to call 'cpp', instead of 'cc', to process them. Then the 'cc' calls 'as'. When the 'cpp' is called, '/usr/bin/cpp' is invoked. '/usr/bin/cpp' is a shell-script that calls '/usr/libexec/cpp -traditional'. This method was more convenient in that it provides a way to change preprocessors to be used by modifying the script. 3.9.2 Preprocess of FreeBSD 2/libc Source I recompiled all the source files in /usr/src/lib/libc of FreeBSD 2.2.2R. There was no problem, probably because most of them comes from 4.4BSD- Lite without modification. It is quite rare and surprising that a huge amount of source files in excellent quality is gathered together. Only at one place, I found the following coding in gen/getgrent.c. Of course, ";" at the end of line is surplus. #endif; 3.9.3 Problems Concerning GNU C 2/cpp As seen so far, writing a Standard-conformant source code with better portability in a more secure manner neither requires much effort nor provides any demerits. In spite of it, why does source code less conformant to Standards still exist at all? When comparing the FreeBSD 2.0-R kernel sources with those of 2.2.2-R, Standard-non-conformant ones do not decrease in number. The problem is that newer sources are not necessarily more conformant to the Standards. There are few Standard-non-conformant sources in 4.4BSD-Lite. This is probably because the 4.4BSD sources were rewritten to become conformant to the Standard C and POSIX. However, during the process of implementing these sources to FreeBSD, the old writing style revived in some sources. For example, although the ntohl shown above is written as 'ntohl(x)' in 4.4BSD-Lite, it is written as 'ntohl' in FreeBSD. Why did the notation once put away revive? I blame GNU C/cpp for this revival, which passes these Standard-non- conformant sources without issuing a diagnostic. If -pedantic had been a default behavior, the old style source would have never revived. If -pedantic-errors had been a default behavior, although, GNU C/cpp would not have been put into practical use because too many sources failed to be compiled. The gcc's man page describes the -pedantic option as: "There is no reason to use this option except for satisfying pedants." [1] Now that eight years have already passed since Standard C was established, it is a high time that GNU C/cpp should set -pedantic as default, not go so far as to set -pedantic-errors. In FreeBSD 2.0-R, nested comments were sometimes found, but in 2.2.2-R, they disappeared. This is because GNU C/cpp no longer allowed them. This has nothing to do with -pedantic, but I want to say how influential preprocessor's source checking is. Without #pragma once in a header file, MCPP is slow in processing multiple includes. Execute the following command in each directory under /sys: ins_once *.h */*.h */*/*.h Note: [1] I wrote 3.9.3 in 1998. After that, gcc's man page or info deleted this expression, however, the specification remains almost the same. 3.9.4 Preprocessing Linux/glibc 2.1 Source I recompiled glibc 2.1.3 sources on Vine Linux 2.1 (i386). Different from those of FreeBSD libc, I found many problems. Some sources are written based on GNU C/cpp's undocumented specifications, in which case it took me a lot of time to identify them. 3.9.4.1 Multi-Line String Literal sysdeps/i386/dl-machine.h and stdlib/longlong.h have many multi-line string literals as shown below: #define MACRO asm(" instr 0 instr 1 instr 2 ") Some string literals are very long. compile/csu/version-info.h created by make also has a multi-line string literal. Of course, it is Standard non-conforming, but GNU C treats it as a string literal with embedded . The -lang-asm (-x assembler-with-cpp, -a) option allows MCPP V.2.3 or later to convert a multi-line string literal into the following code: #define MACRO asm("\n instr 0\n instr 1\n instr 2\n") However, this option cannot work properly for a string literal with a directive inserted in the middle as shown in 3.9.1.1, in which case there is no way but to rewrite the source. 3.9.4.2 #include_next, #warning #include_next appears in the following files: catgets/config.h, db2/config.h, include/fpu_control.h, include/limits.h, include/bits/ipc.h, include/sys/sysinfo.h, locale/programs/config.h, and sysdeps/unix/sysv/linux/a.out.h sysvipc/sys/ipc.h has #warning. Although these directives are not approved by the Standard C, # include_next, in particular, becomes indispensable for glibc 2. So, MCPP V.2.3 or higher for GNU C implements #include_next and #warning. The problems concerning #include_next is that it is not only a standard violation but also that what headers are actually included depends on the setting of include directories and a search order, which are changed by users via environment variables. When glibc is installed, some files in glibc's include directory are copied to the /usr/include directory. These files are used as system header files. That these header files contain #include_next means system headers become patchy. It seems to be time to reorganize them. 3.9.4.3 Variable Argument Macros The following files contain definitions of macros with variable number of arguments based on the GNU C/cpp specification, as well as macro invocations: elf/dl-lookup.c, elf/dl-version.c, elf/ldsodefs.h, glibc-compat/nss_db/ db-XXX.c, glibc-compat/nss_files/files-XXX.c, linuxthreads/internals.h, locale/loadlocale.c, locale/programs/linereader.h, locale/programs/ locale.c, nss/nss_db/db-XXX.c, nss/nss_files/files-XXX.c, sysdeps/unix/ sysdep.h, sysdeps/unix/sysv/linux/i386/sysdep.h, and sysdeps/i386/fpu/ bits/mathinline.h This is a deviation from the C99 Standard. You must rewrite the source code before you can use MCPP. 3.9.4.4 Empty Argument During Macro Calls The following files have macro invocations with empty arguments: catgets/catgetsinfo.h, elf/dl-open.c, grp/fgetgrent_r.c, libio/ clearerr_u.c, libio/rewind.c, libio/clearerr.c, libio/iosetbuffer.c, locale/programs/ld-ctype.c, locale/setlocale.c, login/getutent_r.c, malloc/thread-m.h, math/bits/mathcalls.h, misc/efgcvt_r.c, nss/nss_files /files-rpc.c, nss/nss_files/files-network.c, nss/nss_files/files-hosts.c, nss/nss_files/files-proto.c, pwd/fgetpwent_r.c, shadow/sgetspent_r.c, sysdeps/unix/sysv/linux/bits/sigset.h, sysdeps/unix/dirstream.h math/bits/mathcalls.h, in particular, contains as much as 79 empty arguments. This header file is installed in /usr/include/bits/mathcalls. h and is #included by /usr/include/math.h. Even with an EMPTY macro, nested macro invocations generate a lot of empty arguments. Are there any other ways to write macros more clearly? 3.9.4.5 Object-Like Macros Replaced with Function-like Macro Name The following files contain object-like macro definitions replaced with function-like macro names: argp/argp-fmtstream.h, ctype/ctype.h, elf/sprof.c, elf/dl-runtime.c, elf /do-rel.h, elf/do-lookup.h, elf/dl-addr.c, io/ftw.c, io/ftw64.c, io/sys/ stat.h, locale/programs/ld-ctype.c, malloc/mcheck.c, math/test-*.c, nss/ nss_files/files-*.c, posix/regex.c, posix/getopt.c, stdlib/gmp-impl.h, string/bits/string2.h, string/strcoll.c, sysdeps/i386/i486/bits/string.h, sysdeps/generic/_G_config.h, sysdeps/unix/sysv/linux/_G_config.h Of these, some function-like macros, like math/test-*.c , are first replaced with an object-like macro name and then further replaced with a function-like macro name. Why did these macros have to be written in this way? 3.9.4.6 Macros Expanded to 'defined' sysdeps/generic/_G_config.h, sysdeps/unix/sysv/linux/_G_config.h, and malloc/malloc.c contain the following macro definition expanded to the "defined" pp-token. #define HAVE_MREMAP defined(__linux__) && !defined(__arm__) The intention of this macro definition is that with the following directive, #if HAVE_MREMAP , the above line is expected to be expanded as follows: #if defined(__linux__) && !defined(__arm__) However, the behavior is undefined in Standard C when a #if line has a "defined" pp-token in a macro expansion result. Apart from it, this macro definition is strange in the first place. The HAVE_MREMAP macro is first replaced with the following, defined(__linux__) && !defined(__arm__) (1) , and then the identifiers "defined", "__linux__" and "__arm__" are rescanned for more macro replacement. If any of them is a macro, it is expanded. In this case, "defined" cannot be defined as a macro (Otherwise, it causes another undefined result), and if __linux__ is defined as 1 and __arm__ is not defined, this macro is finally expanded as follows: defined(1) && !defined(__arm__) defined(1), of course, is a syntax error of a #if expression. However, GNU C/cpp stops macro expansion at (1) and regards it as the final macro expansion result of the #if line. Since this is "undefined" anyhow, this GNU specification cannot be described as wrong, but it lacks of consistency in that how to expand a macro differs between macros in a #if line and in other lines. At least, it lacks of portability. [1] The above code should be written as follows: #if defined(__linux__) && !defined(__arm__) #define HAVE_MREMAP 1 #endif I hope this kind of confusing code be eliminated as early as possible. Note: [1] GNU C 2/cpp internally treats "defined" in a #if line as a special macro. For this reason, when GNU C/cpp rescans the following sequence of tokens for macro expansion, it evaluates it as a #if expression, instead of macro expanding it. In other word, distinction between macro expansion and #if expression evaluation is ambiguous. defined(__linux__) && !defined(__arm__) This problem relates to GNU C/cpp' own program structure. GNU C 2/cpp has a de facto main routine rescan(), which is a macro rescanning routine. This routine reads and processes source code from the beginning to the end, during the course of which, it calls a preprocessing directive processing routine. Although implementing everything using macros is a traditional program structure of a macro processor, this structure can be thought to cause mixture of macro expansion and other processing. 3.9.4.7 Preprocessing .S File The files named *.S contain assembler source code requiring preprocessing. Some of these files have preprocessing directives, such as #include, #define, and #if. In addition, the file named compile/csu/ crti.S generated by Make contains the following lines: #APP or #NO_APP From a syntax point of view, preprocessors cannot tell whether these lines are invalid preprocessing directives or valid assembler comments. GNU C seems to leave these lines as they are during preprocessing and treat it as assembler comments. Concatenation of pp-tokens using the ## operator sometimes generates an invalid pp-token. GNU C/cpp outputs these pp-tokens without issuing a diagnostic. For compatibility with GNU C/cpp, I reluctantly decided that, with the -lang-asm (-x assembler-with-cpp, -a) option, MCPP V.2.3 does not treat these non-conformant directives and invalid pp-tokens generated by ## as error, and outputs them as they are and issues a warning. Essentially, these sources should be processed with an assembler macro processor. GNU seems to provide a macro processor called gasp, but it seems to be scarcely used for some reason. 3.9.4.8 Problems of rpcgen and -dM Option When invoked with the -dM option, GNU C/cpp outputs only macro definitions, which is used by stdlib/isomac.c in 'make check' routine. The problem of the isomac.c is that it accepts only GNU C/cpp's macro definition file format and regards a comment or a blank line as an error. [1] Glibc make sometimes uses a program called rpcgen. The problem of rpcgen is that it accepts only GNU C/cpp's output format of preprocessor line number information as follows: #123 "filename" Rpcgen does accept neither: #line 123 nor #line 123 "filename" Rpcgen regards them as error. I reluctantly decided that MCPP V.2.3 and later for GNU C uses the GNU C /cpp format by default. Rpcgen's specification is poor in that it is based on a particular compiler system's format and cannot accept the standard one. Note: [1] MCPP V.2.5 changed the output of -d* options to the same format with GNU C. 3.9.4.9 -include, -isystem and -I- Options Glibc 2 'makefile' often uses the -include option and sometimes uses -isystem and -I- options. The former can be substituted with #include at the beginning of source code. The latter two are less necessary; these are only necessary to update system headers. Only MCPP V.2.3 or later for GNU C implements these two options, but I would like these less necessary options to be made obsolete. [1] Note: [1] GNU C/cpp provides several more options that specify include directories and their search orders, such as -iprefix, -iwithprefix, and -idirafter. It also provides the -remap option that specifies mapping between long-file-names and MS-DOS 8+3 format filenames. On CygWIN systems, specs files contain these options, but it is not necessary to use these options because include directories can be specified with environment variables and because such mapping is no longer necessary on CygWIN. 3.9.4.10 Undocumented Predefined Macros The following macros are GNU C/cpp predefined macros although their names do not appear in documentation. __VERSION__, __SIZE_TYPE__, __PTRDIFF_TYPE__, and __WCHAR_TYPE__ On Vine Linux 2.1 (egcs-1.1.2) systems, __VERSION__ is set to "egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)". On many systems, including Linux/i386, the values of other three macros have types unsigned int, int, and long int, respectively. However, on FreeBSD and CygWIN systems, their types are slightly different from them (I do not know why). Why does those predefines macros remain undocumented? Although GNU C/cpp has the __VERSION__ value in its own source code, MCPP uses the environment variable GCC_VERSION to set the value. [1] Note: [1] As for GNU C V.3, MCPP V.2.5 or later sets this macro by 'configure', the environment variable is not necessary. 3.9.4.11 Undocumented Environment Variables The most strange thing is the undocumented environment variable named SUNPRO_DEPENDENCIES. sysdeps/unix/sysv/linux/Makefile contains the following script: SUNPRO_DEPENDENCIES='$ (@:.h=.d)-t $@' $ (CC) -E -x c $ (sysinclude) $< -D_LIBC -dM | ... etc. The intent of this script is to specify a file name with the environment variable SUNPRO_DEPENDENCIES, and to have cpp output macro definitions in source code and dependency description lines between source files to that file. I had no other way but to read the GNU C/cpp source code (egcs-1.1.2/gcc /cccp.c) to know how this environment variable works. In addition, there is another environment variable, DEPENDENCIES_OUTPUT, which has a similar function. The difference between the two is that SUNPRO_DEPENDENCIES also outputs dependency description lines among system headers, but DEPENDENCIES_OUTPUT does not. Only MCPP V.2.3 and later for the GNU C enables these two environment variables, but I would like these undocumented specifications to be made obsolete as early as possible. 3.9.4.12 Other Problems Linux (i386)/GNU C appends the -Asystem(unix), -Acpu(i386) or -Amachine (i386) to cpp invocation options by using specs file. As long as the glibc 2.1.3 for Linux/x86 is concerned, there seems to be no source code that utilizes this functionality. It is a big problem that glibc's system headers have become patchy and very complicated. A small difference in settings may result in a big difference in preprocessing results. On the other hand, Glibc 2.1.3 did not contain #else junk, #endif junk, or duplicate macro definitions that were found in FreeBSD 2.2.2/kernel sources. In some aspects, Glibc 2 source is better organized than FreeBSD 2/kernel source. However, as a whole, there were not a few sources that are based on GNU C-specific specifications in glibc 2, which impairs portability to other compiler systems although such sources form only a small portion of several thousand source files. Dependence on GNU C local specifications is not desirable for program readability and maintainability. I hope that GNU C V.3 will make obsolete these local specifications and that all the source code based on them will be completely rewritten. 3.9.5 To Use MCPP with GNU C 2 You must modify some source code as follows before you can use MCPP V.2. 3 or later to compile glibc 2 programs: 1. Macro definitions with variable number of arguments: Modify the 14 files in 3.9.4.3 as shown in 3.9.1.6. Of course, you had better save the original files under other name such as *.orig. 2. Macros contained in the three files shown in 3.9.4.6 that has "defined" in its replacement list: /usr/include/_G_config.h is a file generated when sysdeps/unix/sysv/linux/_G_config.h is installed and has the same contents with this. You had better modify /usr/include/ _G_config.h. In addition to the options specified in Makefile or specs file, you must specify the -lang-asm (-xassembler-with-cpp) option to process *.S files containing multi-line string literals or assembler comments before you can invoke MCPP. Normally, you can leave this option specified when preprocessing other files. When you want to use GNU C/cpp or MCPP, or change the default options, you had better perform the following steps: 1. Login as a super-user to move to the directory where cpp resides. (On Vine Linux 2.1, go to /usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91. 66). Let me assume that this directory has GNU C/cpp installed under the name of cpp and MCPP as mcpp_std. 2. Create a file called mcpp.sh with the following contents. [1], [2], [3] #!/bin/sh /usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/mcpp_std -v -Q \ -lang-asm "$@" The -v and -Q options are optional, however, I recommend that you should use -Q to record a large amount of diagnostic messages and -v to confirm successful invocation of MCPP. (Similarly, I recommend you to create a shell-script that invokes gcc with -v. 3. Enter the following commands: chmod a+x mcpp.sh mv cpp cpp_gnuc ln -sf mcpp.sh cpp These commands execute mcpp.sh linked to cpp when gcc calls cpp, and mcpp.sh calls mcpp_std using the above options before the ones specified by gcc. 4. To change default options, modify mcpp.sh or call mcpp_std directly. To use GNU C/cpp do: ln -sf cpp_gnuc cpp With the above settings, staying linked to MCPP may cause a problem; Some programs call cpp with the -traditional option. However, MCPP in STANDARD modes does not implement the -traditional option. So, you had better modify mcpp.sh to automatically select GNU C/cpp when the -traditional option is specified, and MCPP when not: [4] #!/bin/sh for i in $@ do case $i in "-traditional*") /usr/lib/gcc-lib/i386-redhat-linux/2.91.66/cpp_gnuc "$@" exit ;; esac done /usr/lib/gcc-lib/i386-redhat-linux/2.91.66/mcpp_std -Q -lang-asm "$@" Another problem of using MCPP is that it issues a huge amount of warning messages. You can redirect them to a file using the -Q option, but when you uses with the -W3 option to preprocess a large amount of source code, such as glibc, total of several hundred MB of mcpp.err' are created, so it is impossible for you to look through the whole files. Taking a close look at mcpp.err, you will find same warnings being issued repeatedly. This is because the same *.h files are #included by many source programs. To make the files more readable, perform the following procedure: 1. To find error messages, enter the following command: grep fatal `find . -name mcpp.err` grep error `find . -name mcpp.err` 2: To sort warning messages, enter the following command: grep warning `find . -name mcpp.err` | sort -k4 -u | less 3. To find all the source lines causing a warning, enter the following command: grep warning `find . -name mcpp.err` | sort -k4 | uniq | less 4. To find a particular type of warnings, enter the following command, for example: grep 'warning: Replacement' `find . -name mcpp.err` | sort -k4 | uniq | less After you get an overall idea of what source lines are causing what kinds of errors or warnings, you can see a particular mcpp.err by "less" and then, if necessary, see the source file in question. In addition, you can sandwich the source code in question with '#pragma MCPP debug expand' and '#pragma MCPP end_debug' and preprocess it again to see the output, in which case I recommend you to invoke MCPP in the following manner so that preprocessing results and diagnostic messages are output to the same file: mcpp_std <-opts> in-file.c > in-file.i 2>&1 When you use "make", you must temporarily change the above shell-script. Note: [1] VineLinux 2.5/GNU C 2.95.3 has cpp0 and cpp files in the /usr/lib/ gcc-lib/i386-redhat-linux/2.95.3 directory. cpp is linked to cpp0. In other word, cpp0 is a real preprocessor. The gcc calls cpp0 directly. Does this mean other preprocessors like cpp1 and cpp2 are created in the future? Anyway, in this case, it is necessary to change cpp to cpp0 in the above "mv" or "ln" command. FreeBSD 4.4/GNU C 2.95.3 also has cpp and cpp0 in /usr/libexec. [2] VineLinux 2.5/ GNU C 2.95.3 has the following cpps, each of which is a link to other cpps, except the last one (/usr/bin/cpp-2.95.3), which is an executable. To make the matter more complicated, the last one calls /usr/lib/gcc-lib/i386-redhat-linux/2.95.3/cpp0. What brought about this complex situation? /lib/cpp ->/etc/alternatives/libcpp /etc/alternatives/libcpp ->/usr/bin/cpp-2.95.3 /etc/alternatives/cpp ->/usr/bin/cpp-2.95.3 /usr/bin/cpp ->/etc/alternatives/cpp /usr/bin/cpp-2.95.3 [3] When the Validation Suite is applied to MCPP in GNU C/testsuite, use the -23j option to enable digraph and trigraph, and to output a message in the similar format to GNU, which does not contain any additional information, such as source lines. Other options must not be specified in this shell-script. For more information on using Validation Suite in GNU C/testsuite, refer to 2.2.3 of cpp-test.txt. [4] This shell script is for use with VineLinux 2.1/GNU C 2.91.66. For use with other FreeBSD, Linux and CygWIN versions, change the name of the directory where cpp resides and cpp names (cpp or cpp0). GNU C 3.2 has tradcpp0, a separate preprocessor in the traditional mode, so when the -traditional option is used to invoke gcc, tradcpp0 is invoked. Therefore, this shell script is not necessary for GNU C 3.2. 3.9.6 Preprocessing GNU C 3.2 Source I first compiled GNU C 3.2R sources on Linux and FreeBSD, then I used the generated gcc to compile MCPP and then I recompiled GNU C 3.2 sources using MCPP for preprocessing. New GNU C compilers are bootstrapped during various phases of make; gcc and cc1, etc generated in an earlier phase are used to recompile themselves, and those generated compiler drivers and compiler-propers are used again to recompile themselves, and so on. During the bootstrap, gcc exists under the name of xgcc. Other than cc1 and cc1plus, GNU C 2 has a separate preprocessor called cpp. In GNU C 3, cpp was absorbed into cc1 and cc1plus. However, there still exists a separate cpp (or cpp0). To have cpp0 preprocess, the -no- integrated-cpp option must be specified when you invoke gcc or g++. Therefore, to have MCPP preprocess, you must use a shell-script that have gcc (xgcc) or g++ invoke MCPP first then invoke cc1 or cc1plus. [1] In the GNU C compiler system, the settings of system headers and their search order are becoming very complex. So, a small difference in settings may result in a difference in preprocessing results. Even successful compilation was often difficult to attain. In addition, compilation and tests require a lot of other software. Older versions of such software may cause failure in compilation or tests. Actually, compilation sometimes failed due to some hardware problems. Actually, I failed to compile GNU C 3.2 source under FreeBSD 4.4R. I had to upgrade FreeBSD to 4.7R and changed software packages to those for FreeBSD 4.7R before I was able to succeed in compilation. [2] I use VineLinux 2.5 on two PCs. Although compilation of GNU C 3.2 sources using GNU C 2.95.3 was successful on one PC (K6/200MHz), recompilation of GNU C 3.2 sources using the generated GNU C 3.2/cc1 failed, and caused many segmentation faults. Then I changed CPU from K6 to AthlonXP. This time, recompilation was successful; no segmentation faults occurred. Hardware may have caused the problem. When I compiled GNU C 3.2 sources using GNU C 2.95.4 under FreeBSD on K6, "make -k check" of the generated gcc was almost successful. When I recompiled GNU C 3.2 itself using the generated GNU C 3.2, in "make -k check" of g++ and libstdc++-v3, about 20 percent of testsuite was unsuccessful. However, when using AthlonXP, instead of K6, everything went OK. Hardware may have caused the problem. On both VineLinux PCs, when I recompiled GNU C 3.2 sources using GNU C 3. 2 itself and MCPP, "make -k check" of the generated gcc was successful. However, in "make -k check" of g++ and libstdc++-v3, 20 percent of testsuite failed. [3], [4], [5] In anyway, the cause of this testsuite failure seems to lie not in the generated compilers themselves, such as gcc, g++, cc1 and cc1plus, but in the header files or some other settings. MCPP cannot be described as completely compatible with GNU C/cpp, but is highly compatible. So, MCPP and GNU C/cpp can be used interchangeably. GNU C 3.2 sources were compiled in the following environment: OS make library CPU VineLinux 2.5 GNU make glibc 2.2.4 Celeron/1060MHz VineLinux 2.5 GNU make glibc 2.2.4 K6/200MHz, AthlonXP/2.0GHz FreeBSD 4.7R UCB make libc.so.4 K6/200MHz Only C and C++ were compiled. My Validation Suite V 1.3 has the new edition to be used in GNU C testsuite. Validation Suite V 1.3 allows you to perform detailed and systematic preprocessor tests using "make -k check" or "runtest". Validation Suite V 1.3 checks not sources but preprocessor's behaviors. For details, see of 2.2.3 of cpp-test.txt Note: [1] I had to do this for each bootstrap stage. Since makefile is too large and too complex to change, I employed an inelegant method; I kept on sitting in front of PC screen during the entire process of bootstrap. At each end of the stages, I entered ^C and replaced xgcc and others with shell-scripts. [2] Due to dependency between packages, the system falls into confusion unless appropriate versions are installed. Actually, for this reason, my FreeBSD temporarily failed to invoke kterm. [3] "make -k check" cannot be used with MCPP because diagnostics of MCPP are different from those of GNU C/cpp. [4] "make -k check" seems to require an English environment, so the LANG environment variable must be set to C. [5] All the testsuite failures were caused by inability of the pthread_* functions, such as pthread_getspecific and pthread_setspecific, to be linked in the library i686-pc-linux-gnu/libstdc++-v3/src/.libs/libstdc++. so.5.0.0. When a correctly generated library was installed, "make -k check" was successful. On FreeBSD, this problem never happened. This is probably because of small differences in settings. 3.9.6.1 Multi-Line String Literal This very old way of coding was no longer found in GNU C 3.2 sources. Multi-line string literals were made obsolete as late as at GNU C 3.2. GNU C 3.2 processes a source with a multi-line string literal as you expect, but issues a warning. 3.9.6.2 #include_next and #warning limits.h and syslimits.h in build/gcc/include generated during the course of make have #include_next. When GNU C 3.2 is installed, these header files are copied to limits.h and syslimits.h in lib/gcc-lib/i686- pc-linux-gnu/3.2/include. GNU C 3.2 sources does not have #warnings. 3.9.6.3 Variable Argument Macros GNU C 3.2 sources have some variable argument macros, but most of them are found in testsuite and they are nothing but test samples. Although GNU C 3.2 still supports variable argument macros in GNU C 2 notation, the ones using __VA_ARGS__(in C99 notation) are more frequently found in GNU C 3.2 sources. In GNU C 3, variable argument macros in a mixed notation of GNU C 2 and C99 are found: #define eprintf( fmt, ...) fprintf( stderr, fmt, ##__VA_ARGS__) According to the GNU C 3 specification, in the absence of an argument corresponding to "...", the comma immediately before "..." is deleted. So, this is expanded as follows: eprintf( "success!\n") ==> fprintf( stderr, "success!\n") As far as this example is concerned, this specification seems to be convenient, but is not desirable in that (1) a comma in a replacement list of a macro definition is not always used to delimit parameters, (2) it allows a token concatenation operator (##) to have other functionality, (3) it makes rules more complex by allowing exceptions. MCPP does not implement this functionality. MCPP does not regard this macro definition as an error, but it does not delete the comma immediately before the empty argument in a macro invocation. 3.9.6.4 Empty Arguments in Macro Invocation Apart from #included-ed system headers, such as /usr/include/bits/ mathcalls.h and /usr/include/bits/sigset.h, empty arguments in a macro invocation are found only in gcc/libgcc2.h of GNU C 3.2 sources themselves. [1] Note: [1] These two header files are copied into the system header directory when glibc is installed. They do not exist on FreeBSD because glibc is not used. 3.9.6.5 Object-Like Macros Replaced with Function-Like Macros gcc/fixinc/gnu-regex.c and libiberty/regex.c have object-like macros that are replaced with function-like macro name. /usr/lib/bison.simple, a #included file, also has such macros. These macros are all relevant to alloca. For example, libiberty/regex.c has the following macro definitions. #define REGEX_ALLOCATE alloca #define alloca( size) __builtin_alloca( size) This should be written as follows: #define REGEX_ALLOCATE( size) alloca( size) Why did they omit (size)? In addition, regex.c also has another alloca, which is defined as follows: #define alloca __builtin_alloc Their writing style is inconsistent. Furthermore, regex.c has a #include "regex.c" line, which is including itself. regex.c is a strange and unnecessarily complicated source. 3.9.6.6 Macros Expanded to 'defined' GNU C 3.2 sources do not have macros expanded to 'defined'. According to GNU C 3.2 documents, this type of macro is preprocessed in the same way as GNU C 2/cpp, but GNU C 3.2 issues a warning to indicate "may not portable". However, when I tested, GNU C 3.2 did not seem to issue a warning to an example shown in 3.9.4.6. 3.9.6.7 Preprocessing of .S Files The gcc/config directory has several *.S files. 3.9.6.8 rpcgen and -dM Option Make of GNU C 3.2 uses neither rpcgen nor -dM option. However, specifications for rpcgen and the -dM option do not seem to change from the previous versions. 3.9.6.9 -include, -isystem and -I- Options These options are frequently used in make of GNU C 3.2. Sometimes, the -isystem option is used to specify several system include directories at one time. Is it inevitable to use the option during software compilation that updates system headers themselves? I think they had better use an environment variable to specify all the system include directories. On the other hand, GNU C 3/cpp documents discourage to use the -iwithprefix and -iwithprefixbefore options. GNU C provides many options to specify include directories. Does GNU C 3.2 move toward reorganization or reduction in number of them? [1] Note: [1] GNU C 3.2 Makefile uses the -iprefix option in a stand-alone manner (without using -iwithprefix or -iwithprefixbefore), although the -iprefix option makes sense only when used with one of these two options following it. 3.9.6.10 Undocumented Predefined Macros GNU C 2 did not document predefined macros, such as __VERSION__, __SIZE_TYPE__, __PTRDIFF_TYPE__ and __WCHAR_TYPE__. Even with the -dM option, their existence was unknown. GNU C 3 not only documents them but also enhances -dM to show their values. 3.9.6.11 Undocumented Environment Variables GNU C 3 documents the SUNPRO_DEPENDENCIES environment variable GNU C 2 did not. (I do not know why this environment variable is needed.) 3.9.6.12 Other Problems GNU C 3/cpp implements following #pragmas: #pragma GCC poison #pragma GCC dependency #pragma GCC system_header Of these, GNU C 3.2 sources use poison and system_header. MCPP does not support these #pragmas because I do not think them necessary. (I omit explanation of their specifications.) GNU C 3 deprecates assertion directives, such as #assert, although gcc, by default, specifies the -A option. In GNU C 2, the -traditional option is implemented in one and the same cpp, result in a strange mixture of very old specifications and C99 ones. In GNU C 3, its preprocessor was divided into two: non-traditional cpp0 and tradcpp0. The -traditional option is valid only for gcc. cpp0 does not provides it. gcc -traditional invokes tradcpp0 for preprocessing. tradcpp0 is getting closer to a true traditional preprocessor before C90. They say that they no longer maintain tradcpp0 except for serious bugs. The strange specifications of GNU C 2/cpp seem to have been significantly revised. 3.9.7 To Use MCPP with GNU C 3 As seen above, as far as preprocessing is concerned, GNU C 3.2 sources have been much improved than glibc 2.1.3 sources in that the traditional way of writing has been almost eliminated and that meaningless options are no longer used. GNU C 3.2/cpp itself is also much superior to GNU C 2/cpp in that it regards traditional specifications as obsolete and articulates the token- based principle. Undocumented specifications have been significantly reduced. Although these improvements are not still sufficient, GNU C/ cpp is certainly moving toward the right direction. However, GNU system headers become so complex that it is difficult to grasp their entire structure, which may one of the biggest causes of problems in the GNU system. Another pitiful fact is that the preprocessor is absorbed into the compiler-proper. Therefore, to use MCPP, the -no-integrated-cpp option must be specified when invoking gcc or g++. If you compile a large amount of source files with complicated or many makefiles, or if some program automatically invoke gcc, you must create a shell-script that invokes gcc or g++ with the -no-integrated-cpp option automatically specified. Let me take an example of this. Place the following shell-scripts in the directory where gcc and g++ reside (on my Linux, /usr/local/gcc-3.2/ bin), under the names of gcc.sh and g++.sh, respectively. #!/bin/sh /usr/local/gcc-3.2/bin/gcc_proper -no-integrated-cpp "$@" #!/bin/sh /usr/local/gcc-3.2/bin/g++_proper -no-integrated-cpp "$@" Move to this directory and enter the following commands: chmod a+x gcc.sh g++.sh mv gcc gcc_proper mv g++ g++_proper ln -sf gcc.sh gcc ln -sf g++.sh g++ In the directory where cpp is located (on my Linux, /usr/local/gcc-3.2/ lib/gcc-lib/i686-pc-linux-gnu/3.2), create a script that executes MCPP when cpp or cpp0 is invoked, as you did for GNU C 2 (See 3.9.5). By doing this, gcc or g++ first invokes MCPP and then invokes cc1 or cc1plus with the -fpreprocessed option appended. -fpreprocessed indicates the source has been preprocessed already. In addition, as with other GNU C versions, you must set up environment variables (See 3.9.5). To use MCPP in GNU C 3.2, additional include directory settings for C++, as well as PATH settings, are required. Note that when a GNU C version other than the system standard one is installed, additional include directory settings may be required. MCPP V.2.4 now contains these settings when MCPP itself is compiled, thus eliminating the need to set them with environment variables. On my Linux, the /usr/local/gcc-3.2/lib line has been added to /etc/ld.so.conf, and the following settings have been added to ~/.bash_profile. [1] export PATH=/usr/local/gcc-3.2/bin:$PATH If possible, I want to replace the cpplib source, the preprocessing part of cc1 or cc1plus, with MCPP. The source files that define the internal interface between cpplib and ccl or cc1plus, as well as the external interface between cpplib and user programs that use it, amount to as much as 46KB. It is impossible to replace. Why is the interfaces so complex? It is pity. Note: [1] MCPP V.2.5 gets all the necessary informations on GNU C V.3.* by configure, hence environment variables such as GCC_VERSION are no longer needed. 3.9.7.1 To Use MCPP with GNU C 3.3 and 3.4 Although GNU C 3.2 seemed to go in the direction of better portability, GNU C turned its direction to a different goal on 3.3 and 3.4. V.3.3 and 3.4 differ from 3.2 in the following points. 1. Independent preprocessor cpp0 was abolished. The execution option '-no-integrated-cpp' changed its meaning, gcc invokes cc1 (cc1plus) instead of cpp0 as a preprocessor even if this option is specified, and gcc passes to the preprocessor some options which are irrelevant to preprocessing. 2. Many (several dozens of) macros are predefined. The relationship between the system headers and GNU C became more complicated. 3. Tradcpp was also abolished and absorbed to an execution option of cc1. Some old specifications, which were obsoleted or deprecated in V.3.2, were restored. GNU C / cc1 is becoming one huge and complex compiler absorbing preprocessor and some system header's contents. I doubt whether this is a better way of compiler construction, especially of developing open source one. As regards MCPP, it is a nuisance that gcc arbitrarily hands to preprocessor some irrelevant options. Since it is risky to ignore all the options unrecognized by MCPP, I didn't adopt this. Although MCPP ignores the pseudo-options such as -c or -m* which are frequently handed from gcc, it will get an error if other unexpected options are passed on. In order to avoid conflicts with those wrong options, MCPP changed some options, -c to -@compat, -m to -e, and some others. To use MCPP with GNU C 3.2 or former, it is necessary only to replace invoking of cpp0 by MCPP. To use MCPP with GNU C 3.3 or 3.4, it is necessary to divide invoking of cc1 to MCPP and cc1. The shell-script for this purpose is in src/set_mcpp.sh, which will be set in the GNU C libexec directory by 'configure' and 'make'. The configure script will also get GNU C predefined macros using -dM option and set those for MCPP. In addition, GNU C 3.4 changed processing of multi-byte characters. It converts every encodings to UTF-8 at the first phase of preprocessing using libiconv functions ([1]). It has '-finput-charset=' option to specify the source file's encoding, and '-fexec-charset= ' option to specify the output encoding, though the behaviors are still buggy. There is a trend to identify "internationalization" with "unicodization", especially in the Western people who do not use multi-byte characters. It seems that this trend has reached to GNU C. MCPP ported to GNU C 3.3 or former insert to the byte in multi-byte character which has the same value with , '"' or '\\', when the encoding is one of BIG-5, shift-JIS or ISO2022-JP, in order to complement GNU C's inability. MCPP ported to GNU C 3.4, however, does not insert to multi-byte characters nor convert them to UTF-8, it outputs as they are, because: 1. Since '-fexec-charset' option does not work for cc1, the encoding converted to UTF-8 cannot be converted back. If the encoding is not converted, the multi-byte characters are output as they are by cc1, except shift-JIS and ISO2022-JP (i.e. EUC-JP, GB2312, BIG-5, KSC-5601 and UTF-8). In any case, cc1 cannot handle shift-JIS or ISO2022-JP. 2. I hope that GNU C will change the multi-byte character handling in the near future. Note: [1] GNU C 3.4.2 bundled in FreeBSD 5.3 is not enabled multi-byte character conversion to UTF-8. 3.10 Visual C++ .net System Header Problems I used MCPP to preprocess some sample programs provided by Visual C++ . net 2003. The system headers seems to have only a few compatibility problems shown below. These problems are often seen in other compile systems and do not have a serious impact on preprocessing. 1. Although Visual C++ .net system header scarcely implements the C99 specifications, // comments are often used in source code. 2. Object-like macro definitions that are expanded into function-like macro names are sometimes found. 3. There is one erroneous macro definition in limits.h. (See Note 2 in 4.1.3.1 of cpp-test.txt) Although the Linux and glibc system headers often contain GNU C local specification based coding, Visual C++ system headers scarcely has Visual C++ local coding. 3.10.1 Comment Generating Macro? I found only one outrageous macro in Visual C++. Vc7/PlatformSDK/ Include/WTypes.h has the following macro definition: #define _VARIANT_BOOL /##/ This macro definition is used in oaidl.h and propidl.h in Vc7/ PlatformSDK/Include/ as follows: _VARIANT_BOOL bool; What does this macro aim at? This macro seems to expect _VARIANT_BOOL to be expanded into // and the line to be commented out. Actually, this expectation is met in Visual C cl.exe ! In the first place, // is not a token (preprocessing-token). Macro definitions should be processed and expanded after source are parsed into tokens and a comment is converted into one space. Therefore, it is irrational for a macro to generate comments. When this macro is expanded into //, the result is undefined because // is not a valid preprocessing-token. In order to use these header files with MCPP, comment out these macro definitions and change many _VARIANT_BOOL occurrences as follows: #if !__STDC__ && (_MSC_VER <= 1000) _VARIANT_BOOL bool; #endif If you use only Visual C 5.0 or higher, this line can be simply commented out as follows: // _VARIANT_BOOL bool; This macro is, indeed, out of question, however, it is Visual C/cl.exe, which allows such an outrageous macro to be preprocessed as a comment, should be blamed. This example reveals the following serious problems this preprocessor has: 1. Preprocessing is not token-based but character-based. 2. The macro expansion result is treated as comment, which indicates the translation phases are confusing. Probably, the cl.exe preprocessor was developed based on a very old somewhat character-based preprocessor. It is easy to presume that the preprocessor has been upgraded by repeating partial revision to the old preprocessor. There are many preprocessors which presumably have a very old program structure. GNU C 2/cpp, shown in 3.9, is one of such preprocessors. Repeated partial revision of such a preprocessor will only makes its program structure more complicated. However much such revision may be made, there are limits to quality such preprocessor can achieve. Unless a old source is given up and completely rewritten, a clear and well- structured preprocessor cannot be obtained. At GNU C 3/cpp, a total revision was made to GNU C 2; the entire source code was rewritten. So, GNU C 3/cpp has become quite different from GNU C 2. Although MCPP was initially developed based on the source of an old preprocessor, DECUS cpp, the source code was totally rewritten soon. 4. Implementation-defined Behaviors I have neither time nor space to write all the C preprocessor specifications here. For details on Standard C preprocessing, refer to cpp-test.txt. For MCPP behaviors in each pre-Standard mode, refer to 4. 1.3 of mcpp-porting.txt. This chapter covers several preprocessor- related specifications, including those called implementation-defined by Standard C. For more details on MCPP specifications for each compiler system, see Chapter 5, "Diagnostic Messages". 4.1 Status Value on Exit The header file internal.H defines values returned by MCPP to a parent process. MCPP returns 0 on success, and errno for errno != 0 and 1 for errno == 0 on error. Success means that no error has occurred. 4.2 Include Directory Search Path This section explains the order in which MCPP searches directories for an include file when it encounters a #include directive. 1. If a #include directive argument take a form of neither "file-name" nor , and is a macro, the macro is expanded. The resulting filename must take a form of either "file-name" or . Otherwise, it causes an error. 2. If the resulting filename, either in form of "file-name" or , is a full path name, MCPP tries to open it. If it fails, it causes an error. 3. If the resulting filename is not a full path but takes a form of "file-name", MCPP regards it as a filename relative from the current directory or source file directory, and begins searching from that directory. The former is a directory from which MCPP was invoked and the latter is a directory where the source file that includes the "file-name" resides. Depending on the specified options and compiler systems, MCPP begins searching directories as follows: If -I1 is specified, search begins from current directory. If -I2 is specified, source file directory. If -I3 is specified, current first and then source file directory. By default, MCPP ported to UNIX compiler systems, GNU C or Visual C begins searching from the source file directory. For other MCPP, search begins at the current directory. However, for MCPP ported to the Borland C of BC 4 or lower, search begins at current directory. For BC 5, current first and then source file directory. If MCPP fails to find the desired file, it begins searching as shown in step 4. In case of a nested #include, if search begins at current directory, the base directory is always the same. If search begins at a source file directory, the base directory changes each time a header file resides in other directory. 4. If the resulting filename is not a full path name but takes a form of , MCPP searches directories in the following order. If any of these directories are specified as a relative path using "..", then MCPP regards it as a relative directory from the current one. If MCPP fails to find or open the desired file after searching all the directories in these order, it causes an error. 4.1. Directory(s) specified with the -I option on MCPP invocation. If several directories are specified, they are searched in the order in which specified (with the left first). 4.2. For GNU C ported MCPP, directories specified with the -isystem option. If several directories are specified, they are searched in the order specified (from the left). 4.3. Directories specified with an environment variable. ENV_C_INCLUDE_DIR in system.H defines environment variable names. In C++, ENV_CPLUS_INCLUDE_DIR, if defined, takes precedence over ENV_C_INCLUDE_DIR. GNU C ported MCPP uses C_INCLUDE_PATH (and also CPLUS_INCLUDE_PATH for C++) as default environment variable. The Plan 9 ported MCPP uses 'include' as default. Other MCPP uses INCLUDE (and also CPLUS_INCLUDE for C++) as default. If an environment variable specifies several directories with each separated with a delimiter, they are searched in the order in which specified. DOS/Windows, Plan 9 and other OSs use ";", space and ":" as delimiter, respectively. 4.4. Implementation-specific directories defined by the CPLUS_INCLUDE_DIR? macros in system.H. 4.5. Site-specific directories defined by setsysdirs() in system.c (For UNIX systems, /usr/local/include). 4.6. Implementation-specific directories defined by the C_INCLUDE_DIR? macros in system.H. 4.7. System-specific directories defined by setsysdirs() in system.c (For UNIX systems, /usr/include). The total number of include directories above must be equal to or less than the number specified with NINCLUDE in system.H. With the -I- option (-nostdinc option for GNU C ported MCPP and -X for Visual C ported MCPP), the directories specified in 4 and later of Step 4 are not searched. ANSI C Rationale says the ANSI committee intends to define a current directory as base directory. I think this is acceptable, in that the base directory is always constant and that the specification is clearer. However, some implementations, such as UNIX, seem to define a source file directory as base one at least for #include "header". 4.3 How to Construct Header Name This section explains how to construct a header-name pp-token and extract a file name from it. 1. If source code contains a header file name in the string literal format, MCPP regards it as a header-name and removes the " at the both ends to construct a filename. This can be applied to a string literal resulting from macro expansion in source code. 2. If source code contains a header file name in the format, MCPP regards it as a header-name and removes the < and > at the both ends to construct a filename. 3. If source code contains a macro and it is expanded to , MCPP removes the < and > at the both ends, as well as all the spaces, to construct a filename. 4. In any case, MCPP converts \ to /, although both of "\" and "/" can be used as path delimiters under DOS/Windows. 5. Under DOS/Windows, all the uppercased letters in file names are converted into lowercased letters. 4.4 Evaluation of #if Expression If both of host and target compilers have type "unsigned long" (HAVE_UNSIGNED_LONG == TRUE), mcpp_std in C90 mode evaluates a #if expression in "long" or "unsigned long". If HAVE_UNSIGNED_LONG == FALSE, mcpp_std evaluates it only in "long". Mcpp_prestd evaluates it in "long" in all cases. In the compiler-systems having type "long long", if ___STDC_VERSION__ is set to 199901L or higher using the -V199901L option, mcpp_std evaluates a #if expression in "long long" or "unsigned long long", according to the C99 specification. This can be applied also to C++ when mcpp_std is invoked with the -V199901L option. Visual C and Borland C 5.5 do not have a "long long" type, but have an __int64 type of the same length. So, a #if expression for C99 is evaluated as __int64/unsigned __int64. (However, since LL and ULL suffixes cannot be used in Visual C++ .net 2002 or earlier and Borland C 5.5, these suffixes must not be used in coding other than #if lines.) In addition, when you invoke mcpp_std with the -+ option for C++ preprocessing, mcpp_std evaluates pp-tokens 'true' and 'false' in a #if expression to 1L and 0L, respectively. If HAVE_UNSIGNED_LONG == TRUE, mcpp_std in C90 mode evaluates #if expression as follows. In C99 mode, please read "long" and "unsigned long" hereinafter, until the end of 4.5, as "long long" and "unsigned long long", respectively. 1. An integer constant token with a U suffix, including character constants, is evaluated in unsigned long. (Note that mcpp_prestd does not recognize the U suffix). 2. Otherwise, a token within the range of non-negative long is evaluated in long. 3. Otherwise, a token within the range of unsigned long is evaluated in unsigned long. 4. Otherwise, it is diagnosed as an out of range error. 5. In a binary operation, if either operand is unsigned long, both are converted to unsigned long. Otherwise, an operation is performed in signed long. Anyway, an integer constant token always has a non-negative value. If either host or target compiler does not have type unsigned long, an integer constant token is evaluated within the range of non-negative long. A token beyond that range is diagnosed as an out of range error. All the operations are performed within the range of long. If both of host and target compilers have type unsigned long and the range of unsigned long of the host is narrower than that of the target, a token beyond that host range is evaluated to an out of range error. If an operation using constant tokens produces a result out of range of long, an out of range error occurs. If it produces a result out of range of unsigned long, a warning is issued. This can be applied to intermediate operation results. Since a bitwise right shift of a negative value or a division operation using it does not provide portability, mcpp_std issues a warning. If an operation using a mixture of unsigned and signed operands converts a signed negative value to an unsigned positive value, a warning is also issued. How these values are processed depends on the specification of the compiler-proper of the host system. C90 makes it a rule that a preprocessor evaluates a #if expression in long/unsigned long (in C99, the maximum integer type is used). These specifications are rougher than those of compiler-propers. A (#)if expression is often evaluated differently between preprocessor and compiler-proper, especially when sign extension is involved. In addition, since keywords are not used during Standard C preprocessing, sizeof or cast cannot be used in a #if expression. Of course, neither variables, enumeration constants, nor floating point numbers can be used there. Mcpp_std allows the "defined" operator in a #if expression as well as the #elif directive. Except for these differences, MCPP evaluates a #if expression in accordance with priority of and the associative law among operators, just as compiler-propers do. In a binary operation, an arithmetic conversion often takes place to equalize the types on both-hand sides; If one operand is unsigned long and the other is long, the both are converted to unsigned long. The Standard C shows the range of the types long and unsigned long, however, the MCPP source code does not use . This is because so-called Standard C conforming compiler systems sometimes have wrong , and in order to allow pre-Standard compiler systems to compile MCPP. 4.5 Character Constant Evaluation in #if Expression Constant tokens in a #if expression includes identifiers (macros and non- macros), integer tokens and character constants. How to evaluate character constants is implementation-defined and lacks of portability. Even (#)if 'const' is sometimes evaluated differently between preprocessor and compiler-proper. Note that Standard C does not even guarantee that (#)if 'const' is evaluated to the same. Mcpp_std in 'poststd' mode does not evaluate a character constant in a # if expression, which is almost meaningless, and makes it an error. Like other integer constant tokens, MCPP evaluates a character constant in a #if expression within the range of long or unsigned long. (In C99 mode, long long or unsigned long long.) A multi-byte character or a wide character is generally evaluated with 2- bytes type, except for the UTF-8 encoding, which is evaluated with 4- bytes type. Since UTF-8 has a variable length, MCPP evaluates it with 4- byte type. MCPP does not support EUC's 3 byte encoding scheme. (A 3- byte character is recognized as 1 byte + 2 bytes. As a consequence, its value is evaluated correctly.) Although there are some implementations using the 2-byte encoding scheme that define wchar_t as 4-byte, MCPP has no relevance to wchar_t. The following paragraphs describe two-byte multi-byte character encodings. Multi-byte character constants, such as '字', are evaluated to ((First byte value << CHARBIT) + Second byte value). CHARBIT has the value of CHAR_BIT in . Let me take an example of multi-character character constants, such as 'ab', '\x12\x3', and '\x123\x45'. 'a', 'b', '\x12', '\x3' and '\x123' are regarded as one byte. When a multi-character character constant is evaluated, each one byte, starting from the highest one, is evaluated within the range of [0, UCHARMAX] and combined by shifting it to left by CHARBIT. If the value of one escape sequence exceeds UCHARMAX, an out of range error occurs. Therefore, in the implementation with CHARBIT == 8 and the ASCII character set, the above three tokens are evaluated to 0x6162, 0x1203 and error, respectively. L'字' is evaluated to the same value as '字'. Let me take an example of multi-character wide character constants, such as L'ab', L'\x12\x3', and L'\x123\x45'. L'a', L'b', L'\x12', L'\x3', L'\x123', and L'\x45' are regarded as one wide character. When a multi-character wide character constant is evaluated, each wide character, starting from the highest one, is evaluated within the range of [0, (UCHARMAX << CHARBIT) | UCHARMAX] and combined by shifting it to left by CHARBIT *2. If the value of one escape sequence exceeds the maximum value of an unsigned 2- byte integer, an out of range error occurs. Therefore, in the implementation with CHARBIT * 2 == 16 and the ASCII character set, the above three tokens are evaluated to 0x00610062, 0x00120003, and 0x01230045, respectively. If the values of a multi-character character constant and a multi- character wide character constant exceed the range of unsigned long, an out of range error occurs. With __STDC_VERSION__ or __cplusplus set to 199901L or higher, MCPP evaluates a Universal Character Name (UCN) in the form of \uxxxx and \Uxxxxxxxxa as a hex escape sequence. (I know this evaluation is nonsense but no other way.) If the compiler-proper of the target compiler system uses a signed char or signed wchar_t, a character constant in a (#)if expression may be evaluated differently between MCPP and compiler-proper. The range that causes a range error may also differ between them. In addition, evaluation of multi-character character constants and multi-byte character constants varies even among preprocessors and among compilers. Standard C does not stipulates whether, with CHAR_BIT set to 8, 'ab' is evaluated to 'a' * 256 +'b' or 'a' + 'b' * 256. In general, character constants should not be used in an #if expression, as long as you have an alternative method. I think an alternative method always exists. 4.6 #if sizeof (type) Standard C stipulates that preprocessing is a process independent of run- time environments or compiler-proper specifications, thus prohibiting it from using 'sizeof' and cast in an #if expression. However, mcpp_prestd allows sizeof (type) in a #if expression with OK_SIZE set to TRUE in system.H. This was done as a part of my effort to add necessary modifications to the original version, such as adding long long and long double processing, while retaining its original functionality. As to cast, I neither implemented nor had a will to do so because it would require troublesome work. How sizeof is evaluated depends on run-time environments, so you have to be careful when using this. Just like under MS-DOS, sizeof (type) evaluation varies from memory model to memory models. In addition, MCPP does not recognize compiler-system-specific modifiers, such as 'near' and 'far'. For other OSs with things like memory models, mem_model() in system.c must be modified to enable sizeof. A series of macros beginning with S_, such as S_CHAR, in evel.c define the size of each type. Under cross implementation, these macros must be modified to specify size of the types, in integer values, used in the target system. The HAVE_LLONG and HAVE_LDBL macro only indicates whether the compiler system allows sizeof (long long) or sizeof (long double). If the host and target implementations handle these types differently, the macros, such as S_LLINT, and S_PLLINT, must be modified in accordance with those of the target, regardless of macros such as HAVE_LLONG. I have to admit that MCPP does not provide the full functionality of #if sizeof. MCPP just ignores the letter of "signed" or "unsigned" preceding char, short, int, long, and long long when it appears in a #if sizeof. So, when it encounters "signed" or "unsigned" in an implementation without those types, such as unsigned char and unsigned long, MCPP does not make it an error and returns the size of char, long. I know this is a half-hearted implementation but I do not want to increase the number of flags in system.H in vain for this Standard non- conforming function. I initially thought of removing the sizeof code from the original version because I did not intend to support cast at all, but on the second thought, I decided to make a small amount of modifications to make use of the existing code. 4.7 How to Handle White-Space Sequence MCPP compresses a white-space sequence, excluding , as a token separator into one space character during tokenization in the translation phase 3. It also deletes a white-space sequence at the end of a line. How to handle a white-space sequence at the beginning of a line varies among implementations; some compress and other delete. This compression and deletion occurs during the intermediate phase. The next phase 4 involves macro expansion and preprocessor directive line processing. Macro expansion may sometimes produce several space characters before and after the macro. Of course, the number of space characters does not affect compilation results. Standard C stipulates that whether implementation compresses a white- space sequence into one space character during the translation phase 3 is implementation-defined, but you usually do not have to worry about this. or in a preprocessor directive line may adversely affect portability, since this is undefined in Standard C. MCPP converts it to one space character. 4.8 Default Specifications for MCPP Executables This subsection describes the specifications of MCPP executables generated when DIFfile and makefile for each compiler system in the noconfig directory are used to compile MCPP with default settings. When a configure script is used to compile MCPP, the generated MCPP may differ, depending on configure's results, however, as long as OS and compiler system versions are same, generated MCPPs would be same except for include directories. DIFfiles and makefiles are for the following combination of compiler systems and OSs: FreeBSD 5.3 (GNU C V.3.4) VineLinux 3.1-i386 (GNU C V.2.95, V.3.2, V.3.3) CygWIN 1.13 (GNU C V.2.95) LCC-Win32 V.3.2 Visual C++ .net 2003 Borland C++ V.5.5J / Win32 Borland C++ V.4.02J / MS-DOS Plan 9 ed.4 / pcc Borland C ported MCPP has two versions: 32-bit version for BCC 5.5 and 16-bit version for BCC 4.0. The 16-bit version is compiled with the large model. MCPP V.2 allows you to generate various MCPP executables that behave according to various sets of specifications by changing macro definitions in the system.H source. Of these macros, MODE is most important in that it determines the basic behaviors of MCPP. This subsection describes one type of MCPP executable: mcpp_std which is compiled with MODE == STANDARD. Mcpp_std behaves by default according to the Standard C specifications, and has an execution option for 'poststd' mode. The 'poststd' mode is a set of preprocessing specifications I created in order to reorganize the Standard C preprocessing specifications by eliminating or correcting irregular ones. For details, see section 2.4. You can use mcpp_std in the 'poststd' mode as Standard C preprocessor for normal programs. Of all the macros defined in noconfig.H and system.H, the settings of those mentioned below are identical among every MCPP executable, regardless of their compiler systems and modes. HAVE_PRAGMA is set to TRUE since their compiler-propers accept the # pragma line. These executables being Standard C conforming preprocessors, OK_UNTERM_STRING were set to FALSE on compilation. DOLLAR_IN_NAME is also set to FALSE, so '$' cannot be used in names. CONCAT_STRINGS is set to FALSE, so string literals are not concatenated. Mcpp_std is compiled with OK_DIGRAPHS == TRUE and DIGRAPHS_INIT == FALSE, so enables digraph when the -2 (-digraphs) option is specified. With OK_PRAGMA_OP set to TRUE, the -V199901L option enables the _Pragma() operator. With OK_MBIDENT set to FALSE, multi-byte-characters cannot be used in identifiers. With STDC set to 1, the initial value of __STDC__ is 1. With OK_TRIGRAPHS == TRUE and TFLAG_INIT == FALSE, trigraph is enabled with the -3 (-trigraphs) option. With OK_UCN set to TRUE, Universal Character Name (UCN) can be used in C99 and C++. With DEBUG and DEBUG_EVAL set to TRUE, the #pragma MCPP debug directive outputs various debug information. With OK_MAKE == TRUE, MCPP implements an option to output dependency lines for makefile. With TOP_SPACE set to TRUE, a white-space sequence at the beginning of a line is basically compressed into one space character. In 'poststd' mode, however, neither trigraphs nor UCN can be used, and line top white spaces are usually removed. The translation limits are set as follows. NMACPARS (Maximum number of macro arguments) : 255 NEXP (Maximum number of nested levels of #if expressions) : 256 BLK_NEST (Maximum number of nested levels of #if section) : 256 RESCAN_LIMIT (Maximum number of nested levels of macro rescans) : 64 The settings of the macros below are different among compiler systems. ( BC4/16 refers to the 16 bit version of Borland C 4.0.) STDC_VERSION (Initial value of __STDC_VERSION__) GNU C 2 : 199409L Others : 0L HAVE_DIGRAPHS (Is digraphs output as it is?) GNU C and Visual C : TRUE Others : FALSE EXPAND_PRAGMA (Is a #pragma line macro-expanded in C99?) Visual C: TRUE Others : FALSE IDMAX (Valid length of identifier) BC4/16 : 255 Others : 1024 NINCLUDE (Maximum number of include directories) BC4/16 : 16 Others : 64 Both of NBUFF, which specifies the maximum length of a source line after converting a comment into a space character and line splicing, and NWORK, which specifies the maximum length of an output line, are set to the following values (They are set to the same values): BC4/16 : 4096 Others : 65536 NMACWORK, which specifies the size of internal buffers used for macro expansion, is set to four times as long as NBUFF and NWORK. No restrictions are imposed on the number of nested levels of #include, which means it can exceed the limit imposed by OS of the number of simultaneously opened files. GNU C 2.7-2.95 defines __STDC_VERSION__ to 199409L. However, in GNU C V. 3.x, __STDC_VERSION__ is no longer predefined by default and is now defined in accordance with an execution option. MCPP setting for GNU C follows these variations. If STDC_VERSION is set to 0L, MCPP predefines __STDC_VERSION__ as 0L. So, specifying the -V199409L option sets __STDC__ and __STDC_VERSION__ to 1 and 199409L, respectively and allows only predefined macros that begin with '_', resulting in MCPP in the strictly C95 conforming mode. The -V199901L option specifies C99 mode. In C99 mode, MCPP predefines __STDC_HOSTED__ as 1. MCPP itself predefines neither __STDC_ISO_10646__, __STDC_IEC_559__ nor __STDC_IEC_559_COMPLEX__. These values are compiler-system-specific. In glibc 2/ x86, the system header defines __STDC_IEC_559__ and __STDC_IEC_559_COMPLEX__ as 1. Other compiler systems do not define them. If HAVE_DIGRAPHS is set to FALSE, digraph is output after conversion. Include directories are set as follows: System-specific or site-specific directories under UNIX or UNIX-like OSs are as follows: FreeBSD, Linux and CygWIN: /usr/include, /usr/local/include Plan 9: /sys/include/ape, /$objtype/include/ape Implementation-specific directories that vary among compiler systems and their versions are as follows: C_INCLUDE_DIR1 macro is set to: FreeBSD 5.3 / GNU C 3.4.2 : (none) VineLinux 3.1 / GNU C 2.95.3 : /usr/lib/gcc-lib/i386-vine-linux/2.95.3/include VineLinux 3.1 / GNU C 3.2 : /usr/local/gcc-3.2/include VineLinux 3.1 / GNU C 3.3.2 : /usr/lib/gcc-lib/i386-vine-linux/3.3.2/include CygWIN 1.13 / GNU C 2.95.3-5 : /usr/lib/gcc-lib/i686-pc-cygwin/2.95.3-5/include BCC5.5 : /BCC55/INCLUDE BCC4 : /BC4/INCLUDE C_INCLUDE_DIR2 macro is set to: VineLinux 3.1 / GNU C 3.2 : /usr/local/gcc-3.2/lib/gcc-lib/i686-pc-linux-gnu/3.2/include CPLUS_INCLUDE_DIR1 macro is set to: FreeBSD 5.3 / GNU C 3.4.2 : /usr/include/c++ VineLinux 3.1 / GNU C 2.95.3 : (none) VineLinux 3.1 / GNU C 3.2 : /usr/local/gcc-3.2/include/c++/3.2 VineLinux 3.1 / GNU C 3.3.2 : /usr/include/c++/3.3.2 CygWIN 1.13 : /usr/include/g++-3 CPLUS_INCLUDE_DIR2 macro is set to: FreeBSD 5.3 / GNU C 3.4.2 : /usr/include/c++/backward VineLinux 3.1 / GNU C 3.2 : /usr/local/gcc-3.2/include/c++/3.2/i686-pc-linux-gnu VineLinux 3.1 / GNU C 3.3.2 : /usr/include/c++/3.3.2/i386-vine-linux Linux / GNU C 3.2 : /usr/local/gcc-3.2/include/c++/3.2 CPLUS_INCLUDE_DIR3 macro is set to: VineLinux 3.1 / GNU C 3.2 : /usr/local/gcc-3.2/include/c++/3.2/backward VineLinux 3.1 / GNU C 3.3.2 : /usr/include/c++/3.3.2/backward The macros that define these directories are found in noconfig.H or config.h. MCPP for Visual C or LCC-Win32 does not use these macros but uses the following environment variables: INCLUDE, CPLUS_INCLUDE. If these default settings do not suit you, change settings to recompile MCPP, or use environment variables or the -I option. When the length of a preprocessed line exceeds NWORK-1, MCPP divides it into several parts so that each part length becomes equal to or less than NWORK-1. A string literal length must be equal to or less than NWORK-2. Again for confirmation, the macros mentioned above are used only to compile MCPP, and are not built-in macros in a MCPP executable. MCPP has the following built-in macros. The (=value) below indicates that the macro is set to the value. Macros without (=value) are defined as 1. With __STDC__ set to 1 or higher, the macros that do not begin with '_' are deleted. The -N (-undef) option deletes all the macros other than __MCPP. After -N, you can use -D to defines macro symbols over again. When you use a different compiler system version from those specified here, -N and -D allow you to redefine your version macro without recompiling MCPP. The -D option allows you to specify a particular macro without using -N or -U. Maybe this is enough for other Borland C version. When you invoke MCPP without an input file and enter #pragma MCPP put_defines, the following built-in macros are displayed: FreeBSD 5 / GNU C 3.4: __i386__, unix, __unix, __unix__, __FreeBSD__ (=5), __GNUC__ (=3), __GNUC_MINOR__ (=4), __MCPP (=2), __SIZE_TYPE__ (=unsigned int), __PTRDIFF_TYPE__ (=int), __WCHAR_TYPE__ (=int) Linux / GNU C 2.95: __i386__, unix, __unix, __unix__, __linux__, __GNUC__ (=2), __GNUC_MINOR__ (=95), __MCPP (=2) __SIZE_TYPE__ (=unsigned int), __PTRDIFF_TYPE__ (=int), __WCHAR_TYPE__ (=long int) Linux / GNU C 3.2: __i386__, unix, __unix, __unix__, __linux__, __GNUC__ (=3), __GNUC_MINOR__ (=2), __MCPP (=2) __SIZE_TYPE__ (=unsigned int), __PTRDIFF_TYPE__ (=int), __WCHAR_TYPE__ (=long int) Linux / GNU C 3.3.2: __i386__, unix, __unix, __unix__, __linux__, __GNUC__ (=3), __GNUC_MINOR__ (=3), __MCPP (=2) __SIZE_TYPE__ (=unsigned int), __PTRDIFF_TYPE__ (=int), __WCHAR_TYPE__ (=long int) CygWIN 1.3.10: __i386__, __CYGWIN__, __CYGWIN32__, __GNUC__ (=2), __GNUC_MINOR__ (=95), __MCPP (=2), __SIZE_TYPE__ (=unsigned int), __PTRDIFF_TYPE__ (=int), __WCHAR_TYPE__ (=short unsigned int) LCC-Win32: __i386__, __WIN32__, WIN32, _WIN32, __FLAT__, __LCC__, __LCCDEBUGLEVEL (=0), __LCCOPTIMLEVEL (=0), __MCPP (=2) VC .net 2003: __i386__, __WIN32__, _WIN32, WIN32, __FLAT__, _MSC_VER (=1310), _MSC_EXTENSIONS, _M_IX86 (=600), __MCPP (=2) BC 5.5 / 32 bits: __i386__, __WIN32__, WIN32, __FLAT__, __BORLANDC__ (=0x0550), __TURBOC__ (=0x0550), __MCPP (=2) BC 4.0 / 16 bits: __8086__, MSDOS, __MSDOS__, __SMALL__, __BORLANDC__ (=0x0452), __TURBOC__ (=0x0452), __MCPP (=2) Plan 9 / pcc: unix, __unix, __unix__, _PLAN9, __MCPP (=2) When you use the -+ (-lang-c++) option to specify C++ preprocessing, __cplusplus is predefined with its initial value of 1L. In addition, the following macros are also predefined: GNU C V.2.*: __GNUG__ (=2) GNU C V.3.*: __GNUG__ (=3) BC 4.0: __BCPLUSPLUS__ (=0x0320) BC 5.5: __BCPLUSPLUS__ (=0x0550) Although there are some predefined macros in GNU C, those predefined by GNU C/cpp were few, until GNU C V.3.2. Most of them are passed from gcc to cpp by the -D option. So, it is not necessary for MCPP to define them for compatibility. However, MCPP predefines these macros for being used in a stand-alone manner, such as pre-preprocessing. GNU C V.3.3 and later predefines 60 or 70 macros (suddenly). MCPP V.2.5 ported to GNU C V.3.3 or later also includes these predefined macros other than the above ones. These GNU C-specific predefined macros are written in mcpp_g*.h header files, which is generated by compilation of MCPP. Since FreeBSD, Linux, CygWIN / GNU C, LCC-Win32 and Plan 9 / pcc have a type long long, with the -V199901L option, an #if expression is evaluated in long long or unsigned long long. Visual C and Borland C 5. 5 do not have a "long long" type but __int64 and unsigned __int64 instead. These types are used. In other compiler systems, it is evaluated in long or unsigned long, just as in C90. In the above compiler systems with type long ranges: [-2147483647-1, 2147483647 ([-0x7fffffff-1, 0x7fffffff]) and unsigned long ranges: [0, 4294967295 ([0, 0xffffffff]). In the compiler systems with type long long ranges: [-9223372036854775807-1, 9223372036854775807 ([-0x7fffffffffffffff-1, 0x7fffffffffffffff]), and unsigned long long ranges: [0, 18446744073709551615 ([0, 0xffffffffffffffff]). All the compiler-propers of the above compiler systems internally represent a signed integer as two's complement number. So do bit operations. This can be applied to MCPP's #if expression. Right shift of a negative integer is an arithmetic shift. This can be applied to MCPP's #if expression. (Right shifting an integer by one bit halves the value with the sign retained) In an integer division or modulus operation, if either or both operands are negative values, an algebraic operation like Standard C's ldiv() function is performed. This can be applied to MCPP's #if expression. These OSs use the ASCII basic character set. So does MCPP. The default multi-byte character encoding is EUC-JP for FreeBSD or Linux ported MCPP, UTF-8 for Plan 9 ported MCPP and shift-JIS for other MCPP. There is a memory management routine, kmmalloc, that I developed. This routine has malloc(), free(), realloc() and other memory handling functions. If kmmalloc is installed in systems other than CygWIN, kmmalloc is static linked when the MALLOC=KMMALLOC (or -DKMMALLOC=1) option is specified in make; kmmalloc is compiled in the _MEM_DEBUG mode and linked to a heap memory debugging routine. (The XMALLOC macro is also defined as 1.) MCPP for Linux and LCC-Win32 uses EFREEP, EFREEBLK, EALLOCBLK, EFREEWRT and ETRAILWRT with an errno of 2120, 2121,2122, 2123 and 2124 assigned, and other MCPP uses 120, 121, 122, 123, and 124. (Refer to 4.extra of mcpp-porting.txt.) [1] On the systems other than GNU and Visual C, you must preset the environment variable TZ to JST-9. Or, the __DATE__ and __TIME__ macros are not set correctly. Note: [1] CygWIN 1.13 provides malloc() that has an internal routine named _malloc_r() which is called by other library functions. So this malloc() cannot be replaced with my malloc(). 5. Diagnostic Messages 5.1 Diagnostic Messages Format This section covers diagnostic messages issued by MCPP, as well as their meaning. By default, these messages are output to stderr. With the -Q option, they are redirected to the mcpp.err file in the current directory. A diagnostic message is output in the following manner: 1. "filename: line: " is followed by "fatal error: ", "error: " or "warning: " and then by any of the diagnostic messages shown in sections 5.3 to 5.9. Although the specification that a diagnostic message must fit in one line that begins with "filename: line:" seems to lack of flexibility, I followed because it is a traditional way of implementing messages in C on UNIX and because various tools have already assumed that. Some MCPP messages do not fit in a line of usual terminal. 2. If an error occurs during macro expansion, the macro invocation is displayed. For nested macro invocations, MCPP traces back the nested macros in the reverse order to display each macro names. The MCPP compiled with DEBUG == TRUE displays the macro definition, as well as the source filename and line number where the macro definition is found. 3. The source file name, the line number and the line at which an error has occurred are displayed. If an error has occurred in an included file, the names, line numbers and the #include lines of all the including files are displayed. Usually, a logical line with comments replaced with a space character is displayed. The logical line is constructed from one or more physical lines with '\\' at the line end. If a comment spreads over several lines, several logical lines are concatenated into one, which is displayed as the line. In this case, the line number of the last concatenated physical line is displayed. Note that if an error occurs during the translation phase before processing a comment, the line in the phase is displayed. If the -j option is specified, MCPP outputs neither the above 2 nor 3. Diagnostic messages are divided into three levels: fatal error: Indicates an error is so serious that it is no longer meaningful to continue preprocessing. error: Indicates there is a syntax or usage error. warning: Indicates code lacks of portability or may contain a bug. Warnings are further divided into five classes: Class 1: Source code may contain a bug or at least lack portability. Class 2: Probably, source code will present no problem in practical use, but is problematic in terms of Standard conformance. Class 4: Probably, source code will present no problem in practical use, but is problematic in terms of portability. Class 8: Rather surplus warnings to #if groups skipped, sub-expression in #if expression whose evaluation is skipped, concatenation of string literals, and etc. Class 16: Warning to trigraphs and digraphs. Warnings other than Class 1 or 2 are rather specific to MCPP. MCPP has various types of diagnostic messages. For example, mcpp_std provides the following types of diagnostics for each level and class. (This number will increase if they are further subdivided into smaller categories.) fatal error : 19 types error : 75 types warning class 1 : 36 types warning class 2 : 8 types warning class 4 : 15 types warning class 8 : 23 types warning class 16: 2 types Principally, these messages point the coding in question. The diagnostic messages below have a sample value embedded in a token or a numeric value from source code. For the messages with a macro name embedded, a value the macro is expanded into is shown in real messages. Depending on cases, a same message is issued as warning or error, in which case, this manual gives the first occurrence a detailed description. For the subsequent occurrences, the message is only listed. Note that under DOS/Windows, MCPP converts all path-lists and file names in diagnostic messages into lowercased letters for normalization. 5.2 Translation Limits Of all the errors shown below, some errors, such as a buffer overflow, occur due to MCPP specification restrictions. Some macros in system.H define translation limits, such as a buffer size. Expand the buffer size and recompile MCPP if necessary, however, be careful not to expand it too much. A large buffer in a system with a limited amount of memory may cause an "out of memory" error frequently. 5.3 Fatal Errors A fatal error occurs and preprocessing is terminated when it is no longer possible to continue preprocessing due to an I/O error or a shortage of memory, or it is no longer meaningful to do so due to a buffer overflow. A status value of failure is returned to a parent process. 5.3.1 MCPP's Own Bugs Bug: This message has several types. Should it be issued, it would indicate MCPP's own bug. The only MCPP compiled with DEBUG and/or DEBUG_EVAL set to TRUE issues this message. I think this message is rarely issued, but should it be issued, do not hesitate to let me know the situation. 5.3.2 Physical Errors File read error An error has occurred during reading a source file. Disk may have been damaged. File write error An error has occurred during writing to a file. Disk may have been damaged or full. Out of memory (required size is 0x123 bytes) Runs short of memory. MCPP tried to obtain memory of 0x123 bytes from the heap, but in vain. This error occurs when there are too many long macro definitions on a system with a small amount of memory. Divide your source file to decrease the number of macro definitions in one translation unit. 5.3.3 Translation Limits and Internal Buffer Errors Too long header name "long-file-name" The length of the full path name of a file to include (file name concatenated with the specified directory path) has exceeded FILENAMEMAX or NWORK Too long source line The length of a physical line in source file has exceeded NBUFF-2. The source code may not be written in C. Too long logical line The length of a logical line, which is constructed from the several physical lines with \ at the line end, has exceeded NBUFF-2. This error may occur when a defined macro is too long. The code should be written not as a macro but as a function. Too long line spliced by comments The length of a preprocessed line with a comment replaced with a space character has exceeded NBUFF-2. This error occurs when several lines are concatenated into one if a comment spreads over several lines. Divide the comment into several parts and write each on a separate line. Too long output line The length of a preprocessed line has exceeded NWORK-2. Several long macro calls may be contained in a line. Divide the line. Too long token A preprocessed line has a token with a length more than NWORK-2. MCPP compiled with NWORK < NMACWORK tries to divide the preprocessed line into a length that the compiler-proper can accept. However, if a line contains a too long token, it sometimes fails to do so. The following four errors may also be caused by a buffer overflow at a token that is not so particularly long during macro expansion, in which case, you must divide the macro invocation. A buffer overflow may also occur during concatenation of string literals, in which case you must divide the line. Too long quotation "long-string" A string literal, character constant or header-name is too long. In case of a string literal, divide it. Standard C conforming compiler systems concatenate adjacent string literals for you. (Although, MCPP compiled with CONCAT_STRINGS == TRUE does it.) Too long pp-number token "1234567890toolong" A preprocessing-number token is too long. Mcpp_std issues this error. Too long number token "12345678901234......" A number token is too long. Mcpp_prestd issues this error. Buffer overflow scanning token "token" A buffer overflow has occurred during token scan. This message is issued to tokens other than string literals, character constants, header-names and pp-numbers. More than BLK_NEST nesting of #if (#ifdef) sections The depth of nested #if, #ifdef, and #ifndef has exceeded BLK_NEST. (In real message, the macro name BLK_NEST is replaced with an actual numerical value. This is applied to all the messages below with a macro name embedded.) Divide the #if section. Too many include directories "dir" The number of include directories specified has exceeded NINCLUDE. Too many include files The number of #included header files preprocessed in one source file has exceeded NINCLUDE*4. Duplicated included header files are counted as one. 5.3.4 #pragma MCPP preprocessed Related Errors This is not the preprocessed source Although the "#pragma MCPP preprocessed" directive is found, this is not a source preprocessed by MCPP. This preprocessed file is corrupted This seems to be a source preprocessed by MCPP, but cannot be used because it is destroyed. 5.4 Errors MCPP issues an error message when it found a grammatical error. Standard C stipulates that a compiler system should issue a diagnostic message when they encounter a violation of syntax rules or constraints. Principally, mcpp_std issues an error message to this type of violation, but sometimes issues a warning. MCPP issues an error message or warning to most of undefined items in Standard C. However, MCPP issues neither an error nor a warning to the following undefined items: 1. ' or /* in a header name in the form of a string literal - MCPP regards them as characters, resulting in a file open error. (' or /* in a header name enclosed with < and > is regarded as the beginning of a character constant or a comment, resulting in some errors.) Although how to treat \ in a header name is undefined in Standard C, MCPP does not check it because it may eventually cause an error when MCPP actually tries to open the file. DOS/Windows ported MCPP issues a class 2 warning to \ and converts it to /. 2. #undef defined - Although #undef-ing a name "defined" yields an undefined result, MCPP does not issue a message because, in the first place, MCPP does not allow definition of a macro name "defined", so it does not think of revoking the definition. 3. Illegal multi-byte character sequence in a comment - Although how to deal with such character sequence is undefined in Standard C, MCPP does not issue a message because it does no harm. (MCPP issues a warning to an illegal multi-byte character sequence in string literals, character constants and header names.) 4. Identifiers that begin with _ (Reserved for compiler systems) - Although using these identifiers in a user program will cause an undefined result, MCPP does not check it because MCPP does not always have a means to decide whether these identifiers are used in a user program or the compiler-system. 5. __STDC_ISO_10646__, __STDC_IEC_559__, and __STDC_IEC_559_COMPLEX__ - Although #defining or #undef-ing these optional C99 predefined macros yields an undefined result, MCPP does not check it because MCPP does not always have a means to determine whether these macros appear in a user program or the compiler-system. (These macros are most likely to be defined in a header file of a compiler system.) 6. UCN equivalent sequence - Although it is undefined in C99 how to deal with a UCN equivalent sequence generated by deleting during the translation phase 2 or by concatenating string literals, MCPP does not issue a message and regards it as a UCN. For details on what is a violation of syntax rule or constraint, undefined, unspecified or implementation-defined in Standard C preprocessing, refer to cpp_test.txt. Even if an error occurs, MCPP continues preprocessing as long as they are not fatal one. MCPP shows the number of errors and returns the status of failure to the parent process when it exits. 5.4.1 Character and Token Related Errors Illegal control character 0x1b, skipped the character A control code other than a white space character is found in a string literal, character constant, header name or comment. MCPP skips it and continues preprocessing. The following five messages are all token-related errors. For the first four, MCPP skips the line in question and continues preprocessing. The first three are string literal or other token-related errors, indicating that a closing quotation mark is not found by the end of the logical line. This type of error occurs when you write a text that does not take a form of a preprocessing-token sequence in neither a string literal nor comment, as shown below: #error I can't understand. As processing-tokens are not so strictly defined as C tokens in the compiler-proper, most character sequences are regarded as token sequences, as long as they belong to a source character set. Therefore, it is only this type of coding that causes a preprocessing-token error. Pp-token errors may occur in a skipped #if group. Unterminated string literal "string A string literal is unterminated. A string literal cannot spread over several logical lines. If necessary, write a string literal on each of several lines and have the compiler concatenate them. This error may occur during conversion into a string by a #operator, in which case the line in question is not skipped. Mcpp_prestd in 'oldprep' mode does not make an unterminated string literal an error. (Instead, it regards the line end as literal end.) Nor MCPP does when invoked with the -a (-lang-asm, -x assembler-with-cpp) option (it issues a warning); it regards an unterminated string literal as a literal spreading over several lines and concatenates a line with the next by inserting \n. Unterminated character constant 't understand. A character constant is not terminated. Mcpp_prestd in 'oldprep' mode does not make it an error. (Instead, it regards the line end as literal end.) Unterminated header name causes the above two errors, not this one. If /* is found in a header-name enclosed with < and >, MCPP regards it and the following text as a comment. Empty character constant '' A character constant is empty. Illegal UCN sequence Mcpp_std and invoked with __STDC_VERSION__ set to 199901L or in C++ mode can recognizes UCN. This message is issued when the number of orders of a hex sequence that begins with \u and \U in an identifier is less than four and eight, respectively. (If this occurs in a character constant in a #if expression, an undefined escape sequence warning results. Other tokens are not checked by MCPP.) UCN cannot specify the value "0000007f" UCN cannot specify a hex value in the ranges of 0 to 9f, except for 0x24 ($), 0x40 (@) and 0x60 (`), and of d800 to dfff. The former range agrees with the range of the basic source character set. The latter range falls into the reserved area for special characters. Note C++ does not have the latter restriction. (Specifications slightly differ among Standards for an unknown reason.) However, when MCPP is invoked as C++ with -V199901L to preset the __cplusplus macro to 199901L or higher, MCPP behaves in accordance with the C99 specifications in this respect. Illegal multi-byte character sequence "XY" Mcpp_std compiled with OK_MBIDENT == TRUE allows for a multi-byte character in an identifier in C99, however, it will cause an error when it finds a character sequence that cannot be regarded as a multi-byte character. 5.4.2 Unterminated Source File Related Errors This section covers messages issued when a source file ends with an unterminated line, comment, #if section or macro invocation. If the file (not included file) marks the end of input, the message "End of input", not "End of file", is issued. These diagnostic messages are issued as an error or warning, depending on MCPP modes. Mcpp_std issues these messages as error, in which case MCPP skips the line or macro invocation in question and restores relationship between paired directives in a #if section to that of when the file is initially included. On the other hand, mcpp_prestd does not check a source file that ends with \. On other cases, mcpp_prestd issues warnings. In 'oldprep' mode, mcpp_prestd does not even issue warning except on unterminated macro call. End of file with no newline, skipped the line A file must be terminated with a newline. End of file with , skipped the line A file must not be terminated with . End of file with unterminated comment, skipped the line A comment is not terminated. End of file within #if (#ifdef) section started at line 123 #if (#ifdef or #ifndef) on the line 123 does not have a corresponding #endif. End of file within macro invocation started at line 123 A macro invocation that begins at the line 123 is not terminated by the end of the file. This error may occur when an argument has an ill-balanced parenthesis, or when a token error occurs between opening and closing parentheses, in which case, MCPP continues to read tokens for a corresponding parenthesis until it reaches to the file end. (Probably, a buffer overflow may occur before reaching there.) In addition, since macro expansion specifications vary among modes, a macro that is successfully expanded in a mode may not in other modes. 5.4.3 Ill-Balanced Preprocessing Group Related Errors This sections covers errors caused by ill balanced directives of #if, # else and etc. Even if MCPP finds ill balance among these directives, it continues processing, assuming that the processing group so far still continues. MCPP checks to see if directives are balanced even in a skipped #if group. The #if (#ifdef) section is a block between #if (#ifdef or #ifndef) and #endif. The #if (#elif, #else) group is a smaller block, say, between # if (#ifdef or #ifndef) and #elif, between #elif and #else, or between # else and #endif within the #if (#ifdef) section. Already seen #else at line 123 Another #else (#elif) is found after #else at the line 123. #endif may be missing. Not in a #if (#ifdef) section #else (#elif, #endif) is found without #if (#ifdef or #ifndef). Not in a #if (#ifdef) section in a source file An included file has #else (#elif or #endif) without #if (#ifdef or #ifndef). If the included file in question had been in the including source file, this error would never have occurred. In other words, each of these directives contained in a separate file is not balanced by itself. The only mcpp_std issues this error. (mcpp_prestd issues a warning.) The following two errors occur when #asm and #endasm are not balanced. These messages are issued only by mcpp_prestd ported to a particular compiler system. In #asm block started at line 123 A #asm block that begins at the line 123 has another #asm. #asm cannot be nested. Maybe, the programmer forgot to write #endasm. Without #asm #endasm is found in a non #asm block. 5.4.4 Simple Syntax Errors on Directive Lines This section covers simple syntax errors on directive lines that begin with #. The errors hereinafter discussed until 5.4.12 do not occur within a skipped #if group. (MCPP invoked with the -W8 option issues a warning to an unknown directive.) When MCPP finds a directive line with a syntax error, it ignores the line and continues processing, in which case, it neither regards #if as the beginning of a section nor changes line numbers even with a #line. If a #include or #line line has a macro argument, mcpp_std expands the macro and checks the syntax. Mcpp_prestd do not expand the macro. Although the messages below do not show the directive name in question, the source line that follows the message show it. (A control line with a comment converted in a space character always becomes one line, which is called "preprocessed line" here.) Illegal #directive "123" A token that immediately follows # is not a name. The token must be a directive name. ('oldprep' mode regards #123 as #line 123.) Unknown #directive "pseudo-directive" The directive "pseudo-directive" is not implemented. MCPP invoked with the -a (-lang-asm or -x assembler-with-cpp) option issues a warning, not an error. No argument #if, #elif, #ifdef, #ifndef, #assert or #line has no arguments. No header name A #include line does not have an argument, or expansion of a macro argument of a #include line results in no token. Not a header name "UNDEFINED_MACRO" The specified argument is not a header name. This message is issued when a macro that should define a header name is not defined. A header name must be enclosed with < and >, or ", ". Not an identifier "123" #ifdef, #ifndef, #define or #undef requires an identifier as an argument, but 123 is not an identifier. No identifier #define or #undef does not have an argument. No line number #line has a macro argument, but its expansion has resulted in no token. Not a line number "name" The first argument of a #line is not a numeric token (preprocessing number). Line number "0x123" isn't a decimal digits sequence The first argument of a #line must be a decimal integer. Mcpp_std issues this message. In mcpp_prestd, hex and octal integer tokens are allowed although a warning is issued. Line number "2147483648" is out of range of [1,2147483647] The first argument of a #line must be within the range of 1 to 2147483647. 0 is regarded as an error. This is applied to mcpp_std. With __STDC_VERSION__ < 199901L or __cplusplus < 199901L, the valid range will be 1 to 32767, but the range between 32768 and 2147483647 is not regarded as error and a warning is issued. Not a file name "name" The second argument of a #line, if any, must be a string literal. An identifier or wide string literal is not allowed here. The following error occurs only in mcpp_std and this directive is ignored. Mcpp_prestd in 'oldprep' mode issues neither an error nor a warning. Mcpp_prestd in default mode issues a warning and continues preprocessing as if there had been no "junk" text. Excessive token sequence "junk" #else, #endif, #asm, or #endasm line has a junk text, or such text follows a valid argument of #ifdef, #ifndef, #include, #line or # undef line. 5.4.5 Syntax Related Errors in #if Expressions This section covers syntax-related errors in #if, #elif and #assert directives. If a #if (#elif) line has these errors, MCPP evaluates it to false, skips the #if (#elif) group, and continues processing. For a skipped #if (#ifdef, #ifndef, #elif or #else) group, MCPP checks validity of C preprocessing tokens and balance of these directives, but not other grammatical errors. A #if line has a sub-expression whose evaluation is skipped. For example, in case of #if a || b, if "a" is evaluated to true, "b" is not evaluated at all. However, the following 13 types of syntax errors or translation limit errors are checked, even if they are located in a sub- expression whose evaluation is skipped. More than NEXP*2-1 constants stacked at "12" The number of constants in the stack has exceeded NEXP*2-1 when MCPP tried to evaluate "12" in a #if expression. The depth of nested #if expressions is too deep. More than NEXP*3-1 operators and parens stacked at "+" The total number of operators and parenthesis in the stack has exceeded NEXP*3-1 when MCPP tried to evaluate '+' in a #if expression. (A pair of parentheses is counted as two.) The depth of nested #if expressions is too deep. Misplaced constant "12" A #if expression has a constant '12' where no constant should be found. This error occurs when casting, such as (int)0x8000, is used in a #if expression, where casting is not allowed. In this case, (int)0x8000 is evaluated to (0)0x8000, causing this error. The int is regarded as an identifier that is not defined as macro and is evaluated to 0. Operator ">" in incorrect context A #if expression has a > operator where no > should be found. If a macro MACRO is defined as 0 token, #if MACRO > 0 will be expanded to #if > 0, causing this error, which is indicated by the preceding warning -- Macro "MACRO" is expanded to 0 token. Unterminated expression A #if expression is not terminated. This error is caused by, for example, #if a || MACRO with MACRO defined as 0 token. Excessive ")" A #if expression has a ")" that does not corresponds to "(". Missing ")" A #if expression does not have a ")" that corresponds to "(". Misplaced ":", previous operator is "+" : without a corresponding ?. Bad defined syntax A #if defined has a syntax error. This error is caused by an unbalanced parenthesis or missing identifier in an argument. When a macro expansion causes this error, the MCPP compiled with DEBUG == TRUE displays this message followed by an expansion result. Can't use a string literal "string" A string literal is not allowed as a constant in a #if expression. Can't use a character constant 'a' In 'poststd' mode, a character constant, or a wide character constant is not allowed as a constant in a #if expression. Can't use the operator "++" A #if expression has an illegal operator, such as = or ++. Not an integer "1.23" Only integers, including character constants, are allowed as a constant in a #if expression. Can't use the character 0x24 A #if expression contains an illegal character (code 0x24), which is not any of the preprocessing tokens: identifiers, operators, punctuators, string literals, character constants, and preprocessing numbers. (Control codes are excluded since they had been checked before.) To avoid this error, MCPP ported to compiler systems that allows $ as an identifier must be compiled with OK_DOLLAR == TRUE. Of course, this is not checked in a skipped group. The following error messages are relevant to #if sizeof. The only MCPP compiled with OK_SIZE == TRUE issues this error. sizeof: Syntax error A #if sizeof has a syntax error. This error is caused by an unbalanced parenthesis or missing arguments. sizeof: No type specified Like sizeof(*), the "type" of #if sizeof (type) is not specified. Note that sizeof ((*)()) is a valid syntax to determine the size of a pointer to a function. 5.4.6 #if Expression Evaluation Errors The following errors do not occur in a sub-expression whose evaluation is skipped. (MCPP invoked with the -W8 option issues a warning.) Constant "123456789012" is out of range An integer constant has a value that exceeded the range of unsigned long. (For compiler systems without unsigned long, this range will be long, and so forth.) With __STDC_VERSION__ or __cplusplus set to 199901L or higher, this range will be unsigned long long in compiler systems with long long, and so forth. Integer character constant 'abcde' is out of range A character constant 'abcde' has a value that exceeded the range of unsigned long (or unsigned long long). Wide character constant L'abc' is out of range A wide character constant L'abc' has a value that exceeded the range of unsigned long. Two UTF-8 encoded characters exceeds the range of long because one character is evaluated to four bytes. When the -V199901L option is used to indicate C99 conformance, up to two characters fall within the range of long long. This error occurs only in mcpp_std. CHARBIT bits can't represent escape sequence 'x123' An escape sequence in a character constant has exceeded the range of CHARBIT bits ([0, UCHARMAX]). CHARBIT*2 bits can't represent escape sequence L'x12345' An escape sequence in a wide character constant has exceeded the range of CHARBIT*2 bits (CHARBIT*4 bits for UTF-8). This error occurs only in mcpp_std. Division by zero A #if expression contains a division by zero. A division can be expressed using / or %. This error may be caused by a #if dividend/ divisor with the divisor not defined as a macro. To avoid this error, "#if defined divisor && (dividend/divisor ..)" is recommended. Result of "op" is out of range An operation result using the operator op is out of range of long. Op is any of binary operators: *, /, %, +, and -. When two's complement representation is used, the unary operator '-' will cause an overflow with -LONG_MIN (For C99, with -LLONG_MIN). Unsigned long (or unsigned long long) will never cause an overflow, so it does not cause this error. If the result of an algebraic calculation is out of range, a warning is issued. The following errors are relevant to sizeof. They are not issued in a sub-expression whose evaluation is skipped (The -W8 option issues a warning). MCPP compiled with OK_SIZE == TRUE issues these messages. sizeof: Unknown type "type" The "type" of #if sizeof (type) is unknown. sizeof: Illegal type combination with "type" A type combination, like #if sizeof (long float), is invalid. What type combinations are valid, such as long long, depends on MCPP settings. 5.4.7 #define Related Errors This section covers #define related errors. A macro will not be defined if an error occurs at #define. The # and ## operator related errors occurs in mcpp_std. __VA_ARGS__ related errors also occur in mcpp_std. Although variable argument macro is a C99 specification, MCPP allows these macros to be used in C90 and C++ modes for compatibility with GNU C. (A warning is issued.) "defined" shouldn't be defined A macro name "defined" cannot be defined. Mcpp_std checks this. "__STDC__" shouldn't be redefined The __STDC__ macro cannot be #defined. The same can be said with __STDC_VERSION__, __FILE__, __LINE__, __DATE__ and __TIME__ (__STDC_HOSTED__ in C99 mode, and __cplusplus when MCPP is invoked with -+ option). Mcpp_std checks these macros. "__VA_ARGS__" shouldn't be defined C99 allows a variable argument macro with the __VA_ARGS__ parameter in the replacement list, but this identifier cannot be defined as a macro. More than NMACPARS parameters The number of parameters of a macro definition has exceeded NMACPARS. Empty parameter A macro definition has an empty parameter. Illegal parameter "123" A token other than an identifier is used in a parameter of a macro definition. In mcpp_std, even an identifier __VA_ARGS__ cannot be used. Duplicate parameter name "a" A macro definition has a duplicate parameter name "a". Missing "," or ")" in parameter list "(a,b" A macro definition does not have a parenthesis ")" that closes a parameter list. Or, a parameter is followed by neither ',' nor ')'. No token before ## No token precedes the ## operator in the replacement list of a macro definition. No token after ## No token follows the ## operator in the replacement list of a macro definition. ## after ## The replacement list of a macro definition has a token sequence of "## ##". Some may do not regard this as error, but since concatenation of ## with other token always generates an invalid token, when this happens in macro expansion, it always causes an error. MCPP makes it an error when it finds this in a macro definition. Not a formal parameter "id" A function-like macro definition has a # operator whose operand id is not a parameter name. "..." isn't the last parameter "..." must be the last parameter of a macro definition. In mcpp_prestd, "..." causes an illegal parameter error. "__VA_ARGS__" without corresponding "..." "__VA_ARGS__", an identifier in a replacement list, can be used only when it has a corresponding "..." parameter. 5.4.8 #undef Related Errors This section covers #undef related errors. "__STDC__" shouldn't be undefined The __STDC__ macro cannot be #undefined. The same can be said with __STDC_VERSION__, __FILE__, __LINE__, __DATE__ and __TIME__ (__STDC_HOSTED__ in C99 mode, and __cplusplus when invoked with -+ option). Mcpp_std checks these macros. 5.4.9 Macro Expansion Errors This section covers macro expansion errors. The MCPP compiled with DEBUG == TRUE displays a macro definition, as well as the source filename and line number where it is found. The errors related to # or ## operator can occur only in mcpp_std. Less than necessary N argument (s) in macro call "macro( a)" A macro invocation has an insufficient number of arguments. This macro requires N number of arguments. MCPP assigns a zero token to missing arguments and continues to process. MCPP does not regard a macro that takes only one parameter with zero number of arguments specified as error because it cannot distinguish an empty argument from a missing argument. Mcpp_prestd in 'oldprep' mode issues a warning instead of an error on this case. More than necessary N argument (s) in macro call "macro( a, b, c)" A macro invocation has too many arguments. The macro should take N number of arguments. MCPP ignores surplus arguments and continues processing. In 'oldprep' mode, a warning is issued instead of an error. Not a valid preprocessing token "+12" The ## operator has concatenated two pp-tokens, resulting in an invalid token "+12". The token may be separated at a later time. Mcpp_std continues processing. Mcpp_std invoked with the -lang-asm (-x assembler-with-cpp, -a) option issues a warning. Not a valid string literal "\\"str\"" When a # operator tried to convert macro invocation's argument into a string, a token sequence of "\\"str"" has resulted, instead of a single valid string literal. \ that precedes or follows the literal cause the error. (When mcpp_std tries to convert such an argument into a string, it may or may not cause an unterminated string literal error.) Mcpp_std tries to continue processing but maybe an error occurs again in the compilation phase. This error can not occur in 'poststd' mode. (An unterminated string literal error may occur). When the following errors occur, the macro invocation will be skipped. Buffer overflow expanding macro "macro" at "something" A buffer overflow has occurred at "something" during macro expansion. Divide the macro. Unterminated macro call "macro( a, (b, c)" A macro invocation is not terminated. This error usually occurs when a macro invocation on the control line is not terminated at that line. In mcpp_std, a macro in an argument is expanded before argument substitution, in which case, the macro invocation must be terminated in the argument. In MODE == POST_STANDARD, a macro invocation unterminated in a replacement list also causes this error. Rescanning macro "macro" more than RESCAN_LIMIT times at "something" The depth of nested macros is so deep that the number of rescans has exceeded RESCAN_LIMIT at "something" during expansion. This error occurs only in mcpp_std but it is quite rare. Recursive macro definition of "macro" to "macro" A macro definition is recursive. This error occurs only in mcpp_prestd. When the number of rescans has exceeded RESCAN_LIMIT, mcpp_prestd regards it as a recursive macro definition. 5.4.10 #error and #assert #error A #error directive has been executed. Following this message, the # error line is displayed. If an argument itself contains a token error, such as unterminated strings, #error is not executed. The only mcpp_std has #error. Preprocessing assertion failed: A #assert directive has been executed. Following this message, # assert line arguments are displayed. If any of the arguments contains an error, mcpp_prestd regards assertion has failed. The only mcpp_prestd with COMPILER != GNUC has #assert. 5.4.11 Failure of #include Can't open include file "file-name" This error occurs when a file to include does not exist. Probably, this is due to wrong spelling of the file name or an "include directory" should have been specified. 5.4.12 Other Errors Operand of _Pragma() is not a string literal The _Pragma() operator must take an argument of one string literal or wide string literal. This is checked when mcpp_std compiled with OK_PRAGMA_OP == TRUE is invoked with the -V199901L option. The same thing can be said when mcpp_std is invoked with the -V199901L option in C++ mode. 5.5 Warnings (Class 1) A warning is issued when code, although syntactically correct, possibly contains some coding mistakes or has a portability problem. Warnings are divided into five classes: 1, 2, 4, 8, and 16. These classes are enabled when the -W option is specified on MCPP invocation. specifies a ORed value of any of 1, 2, 4, 8, and 16. Class 4, for example, can be specified explicitly with -W4, and implicitly with -W, where is 5, 6, 7, 12, 13, 14, 15, 20, 21, 22, 23, 28, 29, 30, or 31 because the AND-ed value of and 4 is 4 (true). Mcpp_std issues an error message to most of the source code that causes a Standard C undefined behavior, but a warning to some code. Likewise, mcpp_std always issues a warning to the source code based on Standard C unspecified specifications, except for the following: 1. Evaluation order of sub-expressions in a #if expression - Although the evaluation order of the operands of operators other than ||, &&, ? , and : is unspecified in Standard C, MCPP does not issue a warning because #if expression does not cause side-effects and therefore the evaluation order does not affect results. MCPP always evaluates integer constant tokens from left to right in the order they appear and performs an operation using the tokens in accordance with an operator grouping rule when their values are needed. Mcpp_std issues a warning to many implementation-defined behaviors, except for the following: 1. Directories the #include directive searches for a file to include and how to construct a header-name pp-token from #include's argument. - MCPP does not issue a warning because there will be too many warnings if it actually does. Unless a header name is a macro, the source token sequence, including spaces, are used as it is. If it is a macro, the expanded result, including spaces, is used. (In 'poststd' mode, a space character is inserted between pp-tokens during macro expansion. A header-name is constructed by concatenating the resulting pp-tokens from < to > and then by removing space characters. In 'poststd', a header- name enclosed with < and > is obsolete.) When MCPP encounters '#pragma MCPP debug path' or '#debug path', it displays a search path, instead of issuing a warning. 2. Evaluation of a single byte character constant, such as 'a', and of a wide character constant that consists of only one multi-byte character, such as L'字', in a #if expression. - MCPP does not issue a warning because even with the same basic character set used, there are an unlimited number of factors that limits the portability, such as single byte Katakana, presence or absence of a sign, encoding scheme of Kanji, and etc. The same thing can be said with UCN. 3. Bit operations using negative numbers in a #if expression - Although bit operation results depend on internal representation of an integer on a machine, most of the machines use two's complement representation, thus causing no portability problem. However, MCPP issues a warning to a right bit shift operation of a negative value and a division operation involving either or both of negative operands because they lack of portability. 4. A sequence of several white space characters as a token separator - Standard C is clear that it is implementation-defined whether a sequence of white space characters is compressed into one space character in the translation phase 3, but you do not have to worry about this. Portability becomes an issue only when a preprocessing directive line has or . MCPP converts it into one space character and issues a warning. For a sequence of several space characters and tabs, MCPP compresses it into one space character without a warning. 5. Compiler system's own built-in macros will not cause warning. 6. #pragma sub-directive - Principally, mcpp_std does not issue a warning to #pragma sub-directive, however, mcpp_std ported to compiler systems that do not recognizes #pragma issues a warning to them, except for #pragma once, #pragma __setlocale, #pragma MCPP * which mcpp_std itself processes. Mcpp_std issues a warning to these #pragma sub- directives if they have an invalid argument. In addition, mcpp_std issues a warning to GNU C V.3's #pragmas, such as #pragma GCC poison (dependency, system_header), that compiler-specific preprocessors process but mcpp_std does not. 7. Doubled \ - Although it is implementation-defined in C99 whether a single \ is changed into double \ (\\) when the # operator converts a UCN sequence into a string, mcpp_std does not issue a warning to this. mcpp_std does not double \. As you see, MCPP can perform almost all the portability checks necessary at a preprocessing level. Poststd mode is identical with default mode of mcpp_std except for some specification differences described in section 2.4. Regardless of the number of warnings, MCPP always returns a status of success. MCPP invoked with the -W0 option does not issue a warning. 5.5.1 Character, Token and Comment Related Warnings Illegal control character 0x1b in quotation A string literal, character constant or header name has a control code other than a white space character, which may cause an error in the compiler-proper. This way of coding is not desirable. A control code in string literals or character constants should be written using an escape sequence. Illegal multi-byte character sequence "XY" in quotation The first byte (X) of "XY" in a string literal, character constant, or header name is the first byte of a multi-byte character (Kanji), while the second byte (Y) is not the second byte of the character. "XY" may be displayed garbled. MCPP does not regard "XY" as a single multi-byte character. It treats the first byte as a single- byte character and the second byte as the next character. MCPP does not issue a warning to a character in an external character set, as long as it is within the proper range. Even within the proper range, there are some holes (no corresponding characters). MCPP does not check whether such a character is defined or not. The following table shows the range of each multi- byte character set: Encoding first byte second byte shift-JIS 0x81-0x9f, 0xe0-0xfc 0x40-0x7e, 0x80-0xfc EUC-JP 0x8e, 0xa1-0xfe 0xa1-0xfe KS C 5601 0xa1-0xfe 0xa1-0xfe GB 2312-80 0xa1-0xfe 0xa1-0xfe Big Five 0xa1-0xfe 0x40-0x7e, 0xa1-0xfe ISO-2022-JP 0x21-0x7e 0x21-0x7e Beside character codes, ISO-2022-JP has a shift sequence. Apart from the shift sequence, all the multi-byte characters other than UTF-8 are two bytes. In UTF-8, multi-byte characters are two bytes or three bytes. Kanji is encoded in three bytes. The first byte is within the range of 0xc2 to 0xef, second and third 0x80 to 0xbf. Details are omitted here. Anyway, all these bytes must fall within the appropriate ranges. Note that since MCPP is unable to recognize EUC-JP's three byte encoding (JIS X 0213), it regards 0x8f + 0xa1-0xfe + 0xa1-0xfe not as one character but as two characters of 0x8f and 0xa1-0xfe + 0xa1- 0xfe. As a result, MCPP does not issue a warning to the three byte encoding and can evaluate it correctly, except for a wide character constant in a #if expression. In EUC-JP, a character with the first byte of 0x8e (a half-width Katakana) is encoded in two bytes, and treated as a multi-byte character. This warning is not issued in a skipped #if group. "/*" in comment A comment has a sequence of /*. Unless it is intended, the programmer may have forgot to enclose the comment. A comment cannot be nested. Too long identifier, truncated to "very_long_identifier" Since the length of an identifier has exceeded IDMAX, it is truncated to IDMAX. Illegal digit in octal number "089" An octal numeric token contains 8 or 9. The only mcpp_prestd issues this warning. Mcpp_std does not check whether a numerical token on lines other than #if directives is correct or not. If a #if expression has an octal numeric token of 8 or 9, it will cause a "Not an integer" error. Unterminated string literal, catenated to the next line Although an unterminated string literal in a logical line is normally regarded as an error, MCPP invoked with the -lang-asm (-x assembler-with-cpp, -a) option regards it as a multi-line string literal and concatenates the line with the next by inserting '\n'. This way of writing has no advantage. Using a functionality to concatenate adjacent string literals is preferable. 5.5.2 Unterminated Source File Related Warnings The following warning messages are issued by mcpp_prestd. Mcpp_prestd ignores these warnings to continue processing until it reaches the end of input, causing many unexpected results. Mcpp_std issues an error. Mcpp_prestd in 'oldprep' mode does not issue even warning, except on unterminated macro. End of file with no newline, skipped the line End of file with \, skipped the line End of file with unterminated comment, skipped the line End of file within #if (#ifdef) section starting from line 123 End of file within macro invocation starting from line 123 End of file with unterminated #asm block starting from line 123 #asm on the line 123 does not have a corresponding #endasm. 5.5.3 Directive Line Related Warnings The macro is redefined The MCPP compiled with DEBUG == TRUE displays this message followed by the source filename and line number where this macro definition is found. The macro has been redefined with a different contents. Source must not be well organized. The following conditions must be met for macro definitions with the same name to exist. Or, a warning is issued. 1. Have the same number of parameters. 2. Have the same replacement list (one or more white space character between tokens are regarded as one. In 'poststd', the difference of the token separators does not matter because any number of space characters is changed into one, regardless of the presence or absence of the token separators.) 3. In mcpp_std, parameter names must be the same. In 'poststd' and in mcpp_prestd, they are not checked. The following messages are issued only by mcpp_std. No space between macro name "MACRO" and repl-text There is no space between macro name and replacement list of a # define line. Normally, this does not happen, but it does happens when an illegal character is used in a macro name as follows: #define THIS$AND$THAT(a, b) ((a) + (b)) MCPP interprets this as follows: #define THIS $AND$THAT(a, b) ((a) + (b)) and issues a warning. Of course, this is a quite rare case. The following four warnings are issued only by mcpp_std. No sub-directive A #pragma line does not have any argument. The line is ignored. Unknown argument "name" There is no such directive as #pragma name. The only MCPP compiled with PRAGMA == FALSE issues this warning. Unknown encoding "encoding" The encoding name, "encoding", specified with #pragma __setlocale ( "encoding") is not implemented. Too long encoding name "encoding" The encoding name, "long-long-encoding", specified with #pragma __setlocale( "long-long-encoding") exceeds 19 bytes. MCPP ignores it. Bad push_macro syntax Bad pop_macro syntax There is a syntax error in #pragma MCPP push_macro, #pragma MCPP pop_macro, #pragma push_macro or #pragma pop_macro. To use these # pragma directives, first enclose a macro name in an argument with ", " and then further enclose with ( ). For example, ("MACRO"). (A redundant specification.) "MACRO" has not been defined MACRO in the argument,("MACRO"), for #pragma MCPP push_macro, # pragma MCPP pop_macro, #pragma push_macro, or #pragma pop_macro is not defined as a macro. "MACRO" is already pushed MACRO of #pragma MCPP push_macro ("MACRO") has been pushed and then further #undef-ed. Without redefining the MACRO, push would not be possible. "MACRO" has not been pushed MACRO in #pragma MCPP pop_macro( "MACRO") has not been pushed. It may have been already popped. The GNU C ported MCPP issues the following warnings: Ignored #ident Ignored #sccs #ident or #sccs lines are ignored. For the following three #pragma related warnings, MCPP compiled with PRAGMA == TRUE outputs the line in question as it is. Unknown argument "name" There is no such an argument of #pragma MCPP debug or #debug as "name". No argument A #pragma MCPP debug or #debug does not have an argument. Not an identifier "123" The argument of a #pragma MCPP debug or #debug is not an identifier. The above three warnings are issued only by MCPP compiled with DEBUG == TRUE and/or DEBUG_EVAL == TRUE. GNU C ported MCPP issues a Class 2 warning to a line with #pragma GCC followed by either poison, dependency or system_header and does not output the line. GNU C V.3 resident preprocessor process the line but MCPP does not. The following warnings are issued only by mcpp_prestd. Mcpp_std regards them as errors. Not in a #if (#ifdef) section in a source file Line number "0x123" isn't a decimal digit sequence. Mcpp_prestd issues the following warning. Mcpp_std issues the same warning only to #pragma once, #pragma MCPP put_defines, #pragma MCPP push_macro, #pragma MCPP pop_macro, #pragma push_macro, #pragma pop_macro, #pragma MCPP debug, and #pragma MCPP end_debug; for other directives, mcpp_std issues an error. Mcpp_prestd in 'oldprep' mode issues neither an error nor a warning. "Excessive token sequence "junk" 5.5.4 #if Expression Related Warnings The following three warnings are relevant to an argument of #if, #elif, or #assert: Macro "MACRO" is expanded to "defined" The MACRO macro in a #if expression has been expanded to "defined". MCPP treats this strange macro not as identifier but as operator. How to treat it is undefined in Standard C. Macro "MACRO" is expanded to "sizeof" The MACRO macro in a #if expression has been expanded to sizeof. MCPP treats this strange macro not as identifier but as operator. MCPP compiled with OK_SIZE == TRUE issues this warning. Macro "MACRO" is expanded to 0 token The MACRO macro has been expanded to zero token. If this happens in a #if expression, it almost always causes an error. The purpose of this warning is to indicate the cause of an error. The followings warnings are relevant to an argument of #if, #elif or # assert. They are not issued in a sub-expression whose evaluation is skipped. (MCPP invoked with the -W8 option issues them.) Undefined escape sequence '\x' There is no such escape sequence as \x. \x is evaluated to a two byte sequence. (Of course, an escape sequence of "\x" followed by a hex string is valid) This warning is also issued to a UCN with an insufficient number of orders. The followings warnings are relevant to operations in a constant expression on #if, #elif or #assert lines. No warnings are also issued in a skipped sub-expression. (MCPP invoked with -W8 issues them.) How MCPP performs an operation in a constant expression depends on the specifications of the compiler-proper with which MCPP was compiled. Negative number "-1" is converted to positive "4294967295" A mixture of signed and unsigned operations results in conversion of a signed negative value into an unsigned positive value. This is not an error, but indicates source code may contain a bug. For both operands of a binary operator, such as *, /, %, +, -, <, >, <=, >=, ==, !=, &, ^ and | , and the second and third operands of a ternary operator, ? and :, if one operand is unsigned and the other is signed, the signed one is always converted into unsigned. Illegal shift count "-1" The value of the right operand of a bit shift operator, << or >>, is a negative number or has exceeded the bit width of long. Probably, this is also a bug in source code. "op" of negative number isn't portable If an operation using a binary operator (op) results in either or both of negative operands, it lacks of portability. "Op" is any of /, %, and >>. The >> operator with a negative left operand provides portability across compiler systems on computers having an arithmetic shift command, where a one-bit shift means a division by 2. Otherwise, it does not provide portability. Result of "op" is out of range An operation result using "op" is out of range of unsigned long. "op" is any of binary operators (*, /, %, +, or -), or a unary operator (-). Unsigned long will never cause an overflow, so it never causes an error. If the result of an algebraic operation is out of range, MCPP issues a warning. 5.5.5 Macro Expansion Related Warnings The MCPP compiled with DEBUG == TRUE displays a macro definition followed by the source filename and line number where the macro definition is found. Macro started at line 123 swallowed directive-like line MCPP has read a line that begins with # as an argument of the macro that begins at the line 123. Maybe, the macro invocation has a bug. If it had not been for the macro, the line that begins with # would have been interpreted as a directive line. The same thing could be said if the macro had been located in a #if group whose evaluation is skipped, and the line is treated as a directive, because such macro is never expanded. Replacement text "sub(" of macro "head" involved subsequent text Rescanning of the replacement list "sub(" of the macro "head" has involved the text succeeding the macro invocation. K&R 1st to Standard C did not regard this as an error, but if you used this type of macro without having these standards in mind to receive this warning, your macro definition or macro invocation is not correct. If you are intended to use such macro, it is not desirable at all. The only mcpp_std issues this warning. Mcpp_std does not issues this warning, however, if invoked with -@compat option. In mcpp_prestd, the same situation may arise but no warning is issued. Mcpp_std in 'poststd' mode never issues this warning because rescanning does not involve the text succeeding the replacement list. (A macro may be expanded quite differently or causes an "unterminated macro call" error.) Less than necessary N argument (s) in macro call "macro( a)" An insufficient number of arguments of a macro invocation. Normally, this causes an error, but in case of missing only one argument of a macro that takes a variable number of arguments, MCPP issues a warning. This is to decrease migration problems of variable argument macros between GNU C and C99. 5.5.6 Line Number Related Warnings This section covers line number related warnings. Line number "32768" is out of range of [1,32767] In C90 and C++, the first argument of a #line must be within the range of 1 to 32767. 0 is also out of range. With __STDC_VERSION__ >= 199901L or __cplusplus >= 199901L, the valid range is 1 to 2147483647. Therefore, in C90 or C++ mode, MCPP issues a warning, not an error, to the range of 32768 to 2147483647. Mcpp_std issues this warning. In C90, when you use #line to specify a value slightly below 32767, you won't receive an error, but sooner or later, the line number will exceed 32767, in which case, MCPP continues to increase the line number while issuing a warning. Some compiler-proper may not accept this large line number. It is not desirable to specify a large number with #line. Line number 32768 got beyond range The source line number has reached 32768, at which a warning is issued one time. Line number 32769 is out of range When the __LINE__ macro is expanded, the lime number has exceeded 32767. 5.5.7 #pragma MCPP warning, #warning #warning #pragma MCPP warning A #pragma MCPP warning (#warning) directive has been executed. Following the above message, the line is displayed. (If an argument of #pragma MCPP warning has a token error, such as unterminated string, #pragma MCPP warning is not executed.) Although this directive appears in the Warning Level 1 section, this warning is issued at every warning level. Mcpp_std has #pragma MCPP warning, while mcpp_prestd has #warning. 5.6 Warnings (Class 2) This section covers warnings to code that does not contains a bug but causes a portability problem. Only the mcpp_std issues the following five warnings: Parsed "//" as comment A text from // to the end of the line is interpreted as a comment. This is a legal notation in C99 and C++. In C90 mode MCPP treats it as a comment after issuing a warning. Variable argument macro is defined Although it is the C99 Standard that stipulates variable argument macros, a variable argument macro has been defined in C90, C95 or C++ mode. Empty argument in macro call "MACRO( a, ," A macro invocation has an empty argument, in which case, MCPP regards the argument as zero number of pp-token sequences and treats it as such. The empty argument is legal in C99, while it is undefined in C90, thus causing a lack of portability. (MCPP regards an empty argument even without a ',' not as an empty argument, but as a missing argument, thus issuing an error. Since zero number of arguments and one empty argument is syntactically indistinguishable, MCPP does not make both an error.) Writing an empty argument in source code is not generally preferable. I recommend that you should code: #define EMPTY , if possible, and then write EMPTY where an empty argument is written. Skipped the #pragma line GNU C V.3 provides several #pragma directives in the form of #pragma GCC . Its preprocessor processes some of them, but MCPP does not support them. This warning is issued to a #pragma directive compiler-specific preprocessors process but MCPP does not. Not a valid preprocessing token "+12" Concatenating two pp-tokens with the ## operator results in an invalid token "+12", which normally causes an error. However, MCPP invoked with the -lang-asm (-x assembler-with-cpp, -a) option does not regard it as an error. The following warning is issued only by mcpp_std in 'poststd' mode. Header-name enclosed by <, > is an obsolescent feature The header name in the form of is one of the specifications I want to abolish. I recommend to use "stdio.h". The following two warnings are issued only in some compiler systems. Of course, the coding in question is valid in those particular systems, but it lacks of portability, so a warning is issued to remind users of it. #include_next is not allowed by Standard #warning is not allowed by Standard These directives are valid in GNU C but not Standard C-conforming and lack of portability. Converted to / A #include directive contains \ in the header name. MCPP converts \ into /. "\\" is a valid path delimiter in OSs, such as DOS and Windows, but undefined in Standard C. It is safe to use /. The only MCPP ported to DOS/Windows issues this warning only once. (MCPP does not regard " preceded by \ as a delimiter of a string literal, raising an "unterminated string literal" error.) '$' in identifier "THIS$AND$THAT" An identifier has a '$'. The only MCPP compiled with DOLLAR_IN_NAME set to TRUE issues this warning only once because '$' lacks of portability although it is valid in this MCPP. '$' being regarded as a pp-token, other MCPP parses THIS$AND$THAT into five components THIS, $, AND, $ and THAT, resulting in a compiler error. 5.7 Warnings (Class 4) Standard C guarantees some minimum translation limits. It is desirable that a preprocessor imposes translation limits that exceed these values, but source programs that uses preprocessor' own translation limits will restrict portability. MCPP provides some macros in "system.H" that allows you to set translation limits to any values you like. Mcpp_std issues a warning to source code that exceeds a Standard C guaranteed limit. However, these messages are excluded from Class 1 and 2 because they may be issued frequently, depending on standard headers of compiler systems or source programs. Logical source line longer than 509 bytes The length of a logical source line has exceeded 509 bytes. Quotation longer than 509 bytes "very_very_long_string" The length of a string literal, character constant or header name has exceeded 509 bytes. More than 8 nesting of #include The depth of nested #includes has exceeded 8. This warning is issued only when it reaches 9. More than 8 nesting of #if (#ifdef) sections The depth of nested #ifs, #ifdefs, or #ifndefs has exceeded 8. This warning is issued only when it reaches 9. More than 1024 macros defined The number of defined macros has reached 1024. This number includes both of pre-defined macros and those defined in header files. String literal longer than 509 bytes "very_very_long_string" Expansion of a macro using the # operator has generated a string literal longer than 509 bytes. The following warnings are not issued in a skipped #if group. More than 32 nesting of parens in #if expression The depth of nested parentheses in a #if expression has exceeded 32. This warning is issued only when it reaches 33. More than 31 parameters The number of parameters of a macro definition has exceeded 31. Identifier longer than 31 bytes "very_very_long_name" The length of an identifier has exceeded 31 bytes. With __STDC_VERSION__ >= 199901L, the Standard specified translation limits are as follows: Length of logical source line: 4095 bytes Length of string literal, character constant, or header name: 4095 bytes Identifier length: 63 characters Depth of nested #includes: 15 Depth of nested #ifs, #ifdefs, or #ifndefs: 63 Depth of nested parentheses in #if expression: 63 Number of macro parameters: 127 Number of definable macros: 4095 Note that the length of a UCN or multi-byte-character as an identifier is expressed as the number of characters, not bytes. When MCPP is invoked with the -+ option to specify C++ preprocessing, the Standard guideline of translation limits are as follows: Length of logical source line: 65536 bytes Length of string literal, character constant, or header name: 65536 bytes Identifier length: 1024 characters Depth of nested #includes: 256 Depth of nested #ifs, #ifdefs, or #ifndefs: 256 Depth of nested parentheses in #if expression: 256 Number of macro parameters: 256 Number of definable macros: 65536 Note that MCPP allows the maximum number of macro parameters of 255. So, when it reaches 256, MCPP issues an error. The following warnings are excluded from class 1 and 2 because they are issued too frequently. Converted 0x0c to a space [FF], [VT], [CR] (other than '\n'), and [LF] (other than '\n') in source code as token separators are converted into a space character. How to deal with these token separators located on a directive line is undefined in Standard C. If they are located in comments, string literals, or character constants, MCPP does not convert them. (Of course, MCPP can do so, but I do not want MCPP to impose a greater restriction on a character set used since it essentially depends on the compiler-proper.) On the other hand, [TAB] as a token separator is converted into a space character, but no warning is issued because it does not affect compilation at all. ([TAB] means nothing but a space to both of preprocessor and compiler-proper.) Undefined symbol The identifier "name" does not have a macro value defined. It is evaluated to zero. This is not an error at all, but may be a program bug. No warning is issued to an argument of a #if defined. This warning can be avoided by writing #if defined name && (name ..), instead of #if name .., or by invoking MCPP with the -D name=0 option. C++ gives "true" and "false" tokens special treatment and evaluates to 1 and 0, respectively, without a warning. Multi-character wide character constant L'ab' isn't portable A wide character constant value varies even among compiler systems using the same character set because the encoding scheme of wide character constants and how to evaluate multi-characters depend on compiler systems. Therefore, #if expressions using them do not provide portability. The only mcpp_std issues this warning. Poststd mode does not permit character literal in #if expression, so this causes an error. (The next item is also treated the same way.) Multi-character or multi-byte character constant '字' isn't portable Since how to evaluate the value of a multi-character or multi-byte character constant depends on compiler systems, #if expressions using them do not provide portability. The only mcpp_std issues this warning. The following two warnings are issued only by mcpp_std: Macro with mixing of ## and # operators isn't portable A function-like macro has a token sequence of "## #" in the replacement list. These two operators lack of portability because their priority is unspecified in Standard C. MCPP takes precedence # over ##. Note that if a function-like macro has a token sequence in the reverse order "# ##", MCPP regards it as an error because the operand of the # operator must be a parameter. Macro with multiple ## operators isn't portable A macro definition has only one token or parameter inserted between ## operators in the replacement list. This macro may lack of portability because the evaluation order of ## operators is unspecified in Standard C. MCPP applies the ## operator from left to right. 5.8 Warnings (Class 8) There is little chance that the indicated source code contains a bug, but these messages are issued to call attention to it. MCPP invoked with the -W8 option issues these warnings. In a skipped #if group, whether preprocessing directives, such as #ifdef, #ifndef, #elif, #else, and #endif, are balanced or not is checked. However, MCPP invoked with the -W8 option also checks non-conformant or unknown directives. Mcpp_std issues a warning when the depth of nested #ifs exceeds 8. Illegal #directive "123" (in skipped block) Unknown #directive "pseudo-directive" (in skipped block) More than 8 nesting of #if (#ifdef) sections (in skipped block) #include_next is not allowed by Standard #warning is not allowed by Standard The following warnings are related to #if expression. Given an expression of #if a || b, for example, if "a" is true, "b" is not evaluated. However, MCPP invoked with -W8 issues a warning to non- evaluated sub-expressions, in which case, the note saying "in non- evaluated sub-expression" is appended. In case of an overflow, a wrap- rounded value is used in subsequent operations. In case of a division by 0, the maximum integer is assumed as a division result and is used in subsequent operations. Constant "123456789012" is out of range Integer character constant 'abcde' is out of range Wide character constant L'abc' is out of range CHARBIT bits can't represent escape sequence '\x123' CHARBIT*2 bits can't represent escape sequence L'\x12345' Division by zero Undefined symbol "name", evaluated to 0 sizeof: Unknown type "type" sizeof: Illegal type combination with "type" Multi-character wide character constant L'ab' isn't portable Multi-character or multi-byte character constant '字' isn't portable Undefined escape sequence '\x' UCN cannot specify the value "0000007f" Negative number "-1" is converted to positive "4294967295" Result of "op" is out of range Illegal shift count "-1" "op" of negative number isn't portable sizeof is disallowed in C Standard The purpose of this warning is to remind users of the fact that Standard C does not allow for #if sizeof although mcpp_prestd compiled with OK_SIZE == TRUE implements it. "MACRO" wasn't defined An undefined name is specified with #undef. Standard C does not regard it as an error. Macro "macro" needs arguments A token with the same name as a macro with arguments appears in a stand-alone manner. MCPP does not expand it and leave it as it is. The only mcpp_prestd issues this warning. (mcpp_std does not issue a warning since such a token does not cause any problem.) String literals "str1" "str2" are concatenated Adjacent string literals "str1" and "str2" have been concatenated into "str1str2". The purpose of this warning is to make sure that the programmer really intended that, instead of "str1", "str2". Adjacency of a wide character string literal and a character string literal is undefined in C90 and C++ Standards, however, in C99, they are concatenated to be a wide character string literal. MCPP concatenates them into a wide character string literal. Mcpp_std compiled with CONCAT_STRINGS == TRUE issues this warning. 5.9 Warnings (Class 16) Trigraphs and digraphs are not used at all in an environment where they are not need to. If they are found in such an environment, attention needs to be paid. The purpose of the -W16 option is to find such trigraphs and digraphs. On the other hand, these warnings are very bothersome in an environment where trigraphs or digraphs are used on a regular basis because they are issued very frequently. For this reason, I set up a separate class for these warnings. Anyway, MCPP issues these messages only in the state where the trigraphs or digraphs are enabled. These are for mcpp_std only. 2 trigraph(s) converted Two trigraph sequences in this physical line have been converted. Does the programmer really intend to code trigraph? 2 digraph(s) converted Two digraph sequences in this line have been converted. Does the programmer really intend to code digraphs? Mcpp_std compiled with HAVE_DIGRAPHS == FALSE converts a digraph into a regular token in the following manner after preprocessing: <% -> { <: -> [ %: -> # %> -> } :> -> ] %:%: -> ## Therefore, the compiler-proper is not necessary to be able to handle digraphs. However, mcpp_std in 'poststd' mode converts a digraph into a regular pp-token during the translation phase 1. The difference of this behavior between the modes appears when a # operator converts a digraph into a string; mcpp_std in default mode directly converts a digraph sequence into a string, while 'poststd' mode converts it into a regular pp-token, and then into a string. In addition, if a string literal contains a character sequence which is equivalent to a digraph sequence, default mode does not convert it, while 'poststd' mode converts it into a character sequence of the corresponding pp-tokens. Mcpp_std in default mode does not issue a warning to a digraph that appears on a preprocessing-directive line and disappears in a due course because this warning is issued only to converted digraphs. 5.10 Diagnostic Messages Index Diagnostic Messages Fatal Error Warning class error 1 2 4 8 16 "..." isn't the last parameter [5.4.7] "/*" in comment [5.5.1] "MACRO" has not been defined [5.5.3] "MACRO" has not been pushed [5.5.3] "MACRO" is already pushed [5.5.3] "MACRO" wasn't defined [5.8] "__STDC__" shouldn't be redefined [5.4.7] "__STDC__" shouldn't be undefined [5.4.8] "__VA_ARGS__" without corresponding "..." [5.4.7] "and" is defined as macro [5.5.3] "defined" shouldn't be defined [5.4.7] "op" of negative number isn't portable [5.5.4] [5.8] ## after ## [5.4.7] #error [5.4.10] #warning [5.5.7] #include_next is not allowed by Standard [5.6] [5.8] '$' in identifier "THIS$AND$THAT" [5.6] 2 digraph(s) converted [5.9] 2 trigraph(s) converted [5.9] CHARBIT bits can't represent escape sequence '\x123' [5.4.6] [5.8] CHARBIT*2 bits can't represent escape sequence L'\x12345' [5.4.6] [5.8] Already seen #else at line 123 [5.4.3] Bad defined syntax [5.4.5] Bad push_macro syntax [5.5.3] Bad pop_macro syntax [5.5.3] Buffer overflow expanding macro "macro" at "something" [5.4.9] Buffer overflow scanning token "token" [5.3.3] Bug: [5.3.1] Can't open include file "file-name" [5.4.11] Can't use the character 0x24 [5.4.5] Can't use a character constant 'a' [5.4.5] Can't use a string literal "string" [5.4.5] Can't use the operator "++" [5.4.5] Constant "123456789012" is out of range [5.4.6] [5.8] Converted 0x0c to a space [5.7] Converted \ to / [5.6] Division by zero [5.4.6] [5.8] Duplicate parameter names "a" [5.4.7] Empty argument in macro call "MACRO( a, ," [5.6] Empty character constant '' [5.4.1] Empty parameter [5.4.7] End of file with \, skipped the line [5.4.2] [5.5.2] End of file with unterminated comment, skipped the line [5.4.2] [5.5.2] End of file with no newline, skipped the line [5.4.2] [5.5.2] End of file with unterminated #asm block started at line 123 [5.4.2] [5.5.2] End of file within #if (#ifdef) section started at line 123 [5.4.2] [5.5.2] End of file within macro call started at line 123 [5.4.2] [5.5.2] Excessive ")" [5.4.5] Excessive token sequence "junk" [5.4.4] [5.5.3] File read error [5.3.2] File write error [5.3.2] Header-name enclosed by <, > is an obsolescent feature [5.6] Identifier longer than 31 bytes "very_very_long_name" [5.7] Ignored #ident [5.5.3] [5.8] Ignored #sccs [5.5.3] [5.8] Illegal #directive "123" [5.4.4] [5.8] Illegal control character 0x1b in quotation [5.5.1] Illegal control character 0x1b, skipped the character [5.4.1] Illegal digit in octal number "089" [5.5.1] Illegal multi-byte character sequence "XY" [5.4.1] Illegal multi-byte character sequence "XY" in quotation [5.5.1] Illegal parameter "123" [5.4.7] Illegal shift count "-1" [5.5.4] [5.8] Illegal UCN sequence [5.4.1] In #asm block started at line 123 [5.4.3] Integer character constant 'abcde' is out of range [5.4.6] [5.8] The macro is redefined [5.5.4] Less than necessary N argument(s) in macro call "macro( a)" [5.4.9] [5.5.5] Line number "32768" got beyond range [5.5.6] Line number "0x123" isn't a decimal digits sequence [5.4.4] [5.5.6] Line number "32769" is out of range [5.5.6] Line number "2147483648" is out of range of [1,2147483647] [5.4.4] Line number "32768" is out of range of [1,32767] [5.5.6] Logical source line longer than 509 bytes [5.7] Macro "MACRO" is expanded to "defined" [5.5.4] Macro "MACRO" is expanded to "sizeof" [5.5.4] Macro "MACRO" is expanded to 0 token [5.5.4] Macro "macro" needs arguments [5.8] Macro started at line 123 swallowed directive-like line [5.5.5] Macro with mixing of ## and # operators isn't portable [5.7] Macro with multiple ## operators isn't portable [5.7] Misplaced ":", previous operator is "+" [5.4.5] Misplaced constant "12" [5.4.5] Missing ")" [5.4.5] Missing "," or ")" in parameter list "(a,b" [5.4.7] More than BLK_NEST nesting of #if (#ifdef) sections [5.3.3] More than 8 nesting of #if (#ifdef) sections [5.7] [5.8] More than 8 nesting of #include [5.7] More than 32 nesting of parens in #if expression [5.7] More than NEXP*2-1 constants stacked at "12" [5.4.5] More than NEXP*3-1 operators and parens stacked at "+" [5.4.5] More than 1024 macros defined [5.7] More than NMACPARS parameters [5.4.7] More than 31 parameters [5.7] More than necessary N argument(s) in macro call "macro( a, b, c) [5.4.9] Multi-character or multi-byte character constant '字' isn't portable [5.7] [5.8] Multi-character wide character constant L'ab' isn't portable [5.7] [5.8] Negative number "-1" is converted to positive "4294967295" [5.5.4] [5.8] No argument [5.4.4] [5.5.3] No header name [5.4.4] No identifier [5.4.4] No line number [5.4.4] No space between macro name "MACRO" and repl-text [5.5.3] No sub-directive [5.5.3] No token after ## [5.4.7] No token before ## [5.4.7] Not a file name "name" [5.4.4] Not a formal parameter "id" [5.4.7] Not a header name "UNDEFINED_MACRO" [5.4.4] Not a line number "name" [5.4.4] Not a valid preprocessing token "+12" [5.4.9] [5.6] Not a valid string literal [5.4.9] Not an identifier "123" [5.4.4] [5.5.3] Not an integer "1.23" [5.4.5] Not in a #if (#ifdef) section [5.4.3] Not in a #if (#ifdef) section in a source file [5.4.3] [5.5.3] Operand of _Pragma() is not a string literal [5.4.12] Operator ">" in incorrect context [5.4.5] Out of memory (required size is 0x123 bytes) [5.3.2] Parsed "//" as comment [5.6] Preprocessing assertion failed [5.4.10] Quotation longer than 509 bytes "very_very_long_string" [5.7] Recursive macro definition of "macro" to "macro" [5.4.9] Replacement text "sub(" of macro "head" involved subsequent text [5.5.5] Rescanning macro "macro" more than RESCAN_LIMIT times at "something" [5.4.9] Result of "op" is out of range [5.4.6] [5.5.4] [5.8] sizeof is disallowed in C Standard [5.8] sizeof: Illegal type combination with "type" [5.4.6] [5.8] sizeof: No type specified [5.4.5] sizeof: Syntax error [5.4.5] sizeof: Unknown type "type" [5.4.6] [5.8] Skipped the #pragma line [5.6] String literal longer than 509 bytes "very_very_long_string" [5.7] String literals "str1" "str2" are concatenated [5.8] This is not a preprocessed source [5.3.4] This preprocessed file is corrupted [5.3.4] Too long header name "long-file-name" [5.3.3] Too long identifier, truncated to "very_long_identifier" [5.5.1] Too long line spliced by comments [5.3.3] Too long logical line [5.3.3] Too long number token "12345678901234" [5.3.3] Too long output line [5.3.3] Too long pp-number token "1234toolong" [5.3.3] Too long quotation "long-string" [5.3.3] Too long source line [5.3.3] Too long token [5.3.3] Too many include directories "dir" [5.3.3] Too many include files [5.3.3] UCN cannot specify the value "0000007f" [5.4.1] [5.8] Undefined escape sequence '\x' [5.5.4] [5.8] Undefined symbol "name", evaluated to 0 [5.7] [5.8] Unknown #directive "pseudo-directive" [5.4.4] [5.5.4] [5.8] Unknown argument "name" [5.5.3] Unterminated character constant 't understand. [5.4.1] Unterminated expression [5.4.5] Unterminated header name