C++ Programming/Programming Languages/C++/Code/Compiler/Preprocessor

The Preprocessor
The preprocessor is either a separate program invoked by the compiler or part of the compiler itself. It performs intermediate operations that modify the original source code and internal compiler options before the compiler tries to compile the resulting source code.

The instructions that the preprocessor parses are called directives and come in two forms: preprocessor and compiler directives. Preprocessor directives direct the preprocessor on how it should process the source code, and compiler directives direct the compiler on how it should modify internal compiler options. Directives are used to make writing source code easier (by making it more portable, for instance) and to make the source code more understandable. They are also the only valid way to make use of facilities (classes, functions, templates, etc.) provided by the C++ Standard Library.

All directives start with '#' at the beginning of a line. The standard directives are:

Inclusion of Header Files (#include)
The #include directive allows a programmer to include contents of one file inside another file. This is commonly used to separate information needed by more than one part of a program into its own file so that it can be included again and again without having to re-type all the source code into each file.

C++ generally requires you to declare what will be used before using it. So, files called headers usually include declarations of what will be used in order for the compiler to successfully compile source code. This is further explained in the File Organization Section of the book. The standard library (the repository of code that is available with every standards-compliant C++ compiler) and 3rd party libraries make use of headers in order to allow the inclusion of the needed declarations in your source code, allowing you to make use of features or resources that are not part of the language itself.

The first lines in any source file should usually look something like this:

The above lines cause the contents of the files iostream and other.h to be included for use in your program. Usually this is implemented by just inserting into your program the contents of iostream and other.h. When angle brackets ( <> ) are used in the directive, the preprocessor is instructed to search for the specified file in a compiler-dependent location. When double quotation marks (" ") are used, the preprocessor is expected to search in some additional, usually user-defined, locations for the header file and to fall back to the standard include paths only if it is not found in those additional locations. Commonly when this form is used, the preprocessor will also search in the same directory as the file containing the #include directive.

The iostream header contains various declarations for input/output (I/O) using an abstraction of I/O mechanisms called streams. For example, there is an output stream object called std::cout (where "cout" is short for "console output") which is used to output text to the standard output, which usually displays the text on the computer screen.

A list of standard C++ header files is listed below:

#pragma
The pragma (pragmatic information) directive is part of the standard, but the meaning of any pragma directive depends on the software implementation of the standard that is used.

Pragma directives are used within the source program.

You should check the software implementation of the C++ standard you intend to use for a list of the supported tokens.

For example, one of the most widely used preprocessor pragma directives,, when placed at the beginning of a header file, indicates that the file where it resides will be skipped if included several times by the preprocessor.

Macros
The C++ preprocessor includes facilities for defining "macros", which roughly means the ability to replace a use of a named macro with one or more tokens. This has various uses from defining simple constants (though const is more often used for this in C++), conditional compilation, code generation and more -- macros are a powerful facility, but if used carelessly can also lead to code that is hard to read and harder to debug!

#define and #undef
The #define directive is used to define values or macros that are used by the preprocessor to manipulate the program source code before it is compiled:

The #undef directive deletes a current macro definition:

It is an error to use #define to change the definition of a macro, but it is not an error to use #undef to try to undefine a macro name that is not currently defined. Therefore, if you need to override a previous macro definition, first #undef it, and then use #define to set the new definition.

\ (line continuation)
If for some reason it is needed to break a given statement into more than one line, use the \ (backslash) symbol to "escape" the line ends. For example,

will use what you write here \ and here etc...
 * 1) define MULTIPLELINEMACRO \

is equivalent to

because the preprocessor joins lines ending in a backslash ("\") to the line after them. That happens even before directives (such as #define) are processed, so it works for just about all purposes, not just for macro definitions. The backslash is sometimes said to act as an "escape" character for the newline, changing its interpretation.

In some (fairly rare) cases macros can be more readable when split across multiple lines. Good modern C++ code will use macros only sparingly, so the need for multi-line macro definitions will not arise often.

It is certainly possible to overuse this feature. It is quite legal but entirely indefensible, for example, to write

That is an abuse of the feature though: while an escaped newline can appear in the middle of a token, there should never be any reason to use it there. Do not try to write code that looks like it belongs in the International Obfuscated C Code Competition.

Warning: there is one occasional "gotcha" with using escaped newlines: if there are any invisible characters after the backslash, the lines will not be joined, and there will almost certainly be an error message produced later on, though it might not be at all obvious what caused it.

Function-like Macros
Another feature of the #define command is that it can take arguments, making it rather useful as a pseudo-function creator. Consider the following code:

Notice that in the above example, the variable "x" is always within its own set of parentheses. This way, it will be evaluated in whole, before being compared to 0 or multiplied by -1. Also, the entire macro is surrounded by parentheses, to prevent it from being contaminated by other code. If you're not careful, you run the risk of having the compiler misinterpret your code.

Macros replace each occurrence of the macro parameter used in the text with the literal contents of the macro parameter without any validation checking. Badly written macros can result in code which will not compile or creates hard to discover bugs. Because of side-effects it is considered a very bad idea to use macro functions as described above. However, as with any rule, there may be cases where macros are the most efficient means to accomplish a particular goal.

If ABSOLUTE_VALUE was a real function 'z' would now have the value of '-9', but because it was an argument in a macro z++ was expanded 3 times (in this case) and thus (in this situation) executed twice, setting z to -8, and y to 9. In similar cases it is very easy to write code which has "undefined behavior", meaning that what it does is completely unpredictable in the eyes of the C++ Standard.

and

-- the result of "a" should be "2" (b + c passed to PART -> ((b + c) / SLICES) -> result is "2")

# and ##
The # and ## operators are used with the #define macro. Using # causes the first argument after the # to be returned as a string in quotes. For example:


 * 1) define as_string( s ) # s

will make the compiler turn

std::cout << as_string( Hello World! ) << std::endl; into

std::cout << "Hello World!" << std::endl;

Using ## concatenates what's before the ## with what's after it; the result must be a well-formed preprocessing token. For example:

... int xy = 10; ... will make the compiler turn
 * 1) define concatenate( x, y ) x ## y

std::cout << concatenate( x, y ) << std::endl;

into

std::cout << xy << std::endl;

which will, of course, display 10 to standard output.

String literals cannot be concatenated using ##, but the good news is that this is not a problem: just writing two adjacent string literals is enough to make the preprocessor concatenate them.

The dangers of macros
To illustrate the dangers of macros, consider this naive macro

and the code

Take a look at this and consider what the value after execution might be. The statements are turned into

Thus, after execution i=8 and <tt>j=3</tt> instead of the expected result of <tt>i=j=8</tt>! This is why you were cautioned to use an extra set of parenthesis above, but even with these, the road is fraught with dangers. The alert reader might quickly realize that if <tt>a,b</tt> contains expressions, the definition must parenthesize every use of <tt>a,b</tt> in the macro definition, like this:

This works, provided <tt>a,b</tt> have no side effects. Indeed,

would result in <tt>k=4</tt>, <tt>i=3</tt> and <tt>j=5</tt>. This would be highly surprising to anyone expecting <tt>MAX</tt> to behave like a function.

So what is the correct solution? The solution is not to use macro at all. A global, inline function, like this has none of the pitfalls above, but will not work with all types. A template (see below) takes care of this Indeed, this is (a variation of) the definition used in STL library for std::max. This library is included with all conforming C++ compilers, so the ideal solution would be to use this. std::max(3,4);

Another danger on working with macro is that they are excluded form type checking. In the case of the MAX macro, if used with a string type variable, it will not generate a compilation error.

It is then preferable to use an inline function, which will be type checked. Permitting the compiler to generate a meaningful error message if the inline function is used as stated above.

String literal concatenation
One minor function of the preprocessor is in joining strings together, "string literal concatenation" -- turning code like

into

Apart from obscure uses, this is most often useful when writing long messages, as a normal C++ string literal is not allowed to span multiple lines in your source code (i.e., to contain a newline character inside it). The exception to this is the C++11 raw string literal, which can contain newlines, but does not interpret any escape characters. Using string literal concatenation also helps to keep program lines down to a reasonable length; we can write

function_name("This is a very long string literal, which would not fit "               "onto a single line very nicely -- but with string literal "                "concatenation, we can split it across multiple lines and "                "the preprocessor will glue the pieces together");

Note that this joining happens before compilation; the compiler sees only one string literal here, and there's no work done at runtime, i.e., your program will not run any slower at all because of this joining together of strings.

Concatenation also applies to wide string literals (which are prefixed by an L):

L"this " L"and " L"that"

is converted by the preprocessor into

L"this and that".

Conditional compilation
Conditional compilation is useful for two main purposes:
 * To allow certain functionality to be enabled/disabled when compiling a program
 * To allow functionality to be implemented in different ways, such as when compiling on different platforms

It is also used sometimes to temporarily "comment-out" code, though using a version control system is often a more effective way to do so.

statement(s) statement(s) ...  statement(s) statement(s) statement(s) statement(s) statement(s) statement(s)
 * Syntax:
 * 1) if condition
 * 1) elif condition2
 * 1) elif condition
 * 1) else
 * 1) endif
 * 1) ifdef defined-value
 * 1) else
 * 1) endif
 * 1) ifndef defined-value
 * 1) else
 * 1) endif

#if
The #if directive allows compile-time conditional checking of preprocessor values such as created with #define. If condition is non-zero the preprocessor will include all statement(s) up to the #else, #elif or #endif directive in the output for processing. Otherwise if the #if condition was false, any #elif directives will be checked in order and the first condition which is true will have its statement(s) included in the output. Finally if the condition of the #if directive and any present #elif directives are all false the statement(s) of the #else directive will be included in the output if present; otherwise, nothing gets included.

The expression used after #if can include boolean and integral constants and arithmetic operations as well as macro names. The allowable expressions are a subset of the full range of C++ expressions (with one exception), but are sufficient for many purposes. The one extra operator available to #if is the defined operator, which can be used to test whether a macro of a given name is currently defined.

#ifdef and #ifndef
The #ifdef and #ifndef directives are short forms of '#if defined(defined-value)' and '#if !defined(defined-value)' respectively. defined(identifier) is valid in any expression evaluated by the preprocessor, and returns true (in this context, equivalent to 1) if a preprocessor variable by the name identifier was defined with #define and false (in this context, equivalent to 0) otherwise. In fact, the parentheses are optional, and it is also valid to write defined identifier without them.

(Possibly the most common use of #ifndef is in creating "include guards" for header files, to ensure that the header files can safely be included multiple times. This is explained in the section on header files.)

#endif
The #endif directive ends #if, #ifdef, #ifndef, #elif and #else directives.


 * Example:

This can be used for example to provide multiple platform support or to have one common source file set for different program versions. Another example of use is using this instead of the (non-standard) #pragma once.

foo.hpp:
 * Example:

bar.hpp:

foo.cpp:

When we compile foo.cpp, only one copy of foo.hpp will be included due to the use of include guard. When the preprocessor reads the line, the content of foo.hpp will be expanded. Since this is the first time which foo.hpp is read (and assuming that there is no existing declaration of macro FOO_HPP) FOO_HPP will not yet be declared, and so the code will be included normally. When the preprocessor read the line  in foo.cpp, the content of bar.hpp will be expanded as usual, and the file foo.h will be expanded again. Owing to the previous declaration of FOO_HPP, no code in foo.hpp will be inserted. Therefore, this can achieve our goal - avoiding the content of the file being included more than one time.

Compile-time warnings and errors

 * Syntax:

#error and #warning
The #error directive causes the compiler to stop and spit out the line number and a message given when it is encountered. The #warning directive causes the compiler to spit out a warning with the line number and a message given when it is encountered. These directives are mostly used for debugging.


 * Example:

Source file names and line numbering macros
The current filename and line number where the preprocessing is being performed can be retrieved using the predefined macros __FILE__ and __LINE__. Line numbers are measured before any escaped newlines are removed. The current values of __FILE__ and __LINE__ can be overridden using the #line directive; it is very rarely appropriate to do this in hand-written code, but can be useful for code generators which create C++ code base on other input files, so that (for example) error messages will refer back to the original input files rather than to the generated C++ code.