Optimizing C++/Writing efficient code/Performance improving features

Some features of the C++ language, if properly used, allow to increase the speed of the resulting software.

In this section guidelines to exploit such features are presented.

The most efficient types
'''When defining an object to store an integer number, use the  or the   type, except when a longer type is needed; when defining an object to store a character, use the   type, except when the   type is needed; and when defining an object to store a floating point number, use the   type, except when the   type is needed. If the resulting aggregate object is of medium or large size, replace each integer type with the smallest integer type that is long enough to contain it (but without using bit-fields) and replace the floating point types with the  type, except when greater precision is needed.'''

The  and   types are, by definition, the most efficient ones available on the platform that can hold at least a 16-bit range. If you only need 8-bit width and are compiling for an 8-bit processor then  might be more efficient, but otherwise one of the   types is likely to be the most efficient type you can use.

The  type is two to three times less efficient than the   type, but it has greater precision.

Some processor types handle  objects more efficiently, while others handle   objects more efficiently. Therefore, both in C and in C++, the  type, which differs from the   type, was defined as the most efficient character type for the target processor.

The  type can contain only small character sets; typically up to a maximum of 255 distinct characters. To handle bigger character sets, you should use the  type, although that is less efficient.

In the case of numbers contained in a medium or large aggregate object or in a collection that will be probably be of medium or large size, it is better to minimize the size in bytes of the aggregate object or collection. This can be done by replacing primitives larger than word size with those that are word size for the processor. A  actually takes up the same amount of memory as word size even though the field size of a short is less.

Bit-fields can also be used to minimize the size of aggregate objects, but as their handling is slower this can be counterproductive. Therefore, postpone their introduction until the optimization stage.

Function-objects
Instead of passing a function pointer as an argument to a function, pass a function-object (or, if using the C++11 standard, a lambda expression).

For example, if you have the following array of structures:

&hellip; and you want to sort it by the  field, you could define the following comparison function:

&hellip; and pass it to the standard sort algorithm:

However, it is probably more efficient to define the following function-object class (aka functor):

&hellip; and pass a temporary instance of it to the standard sort algorithm:

Function-objects are usually expanded inline and are therefore as efficient as in-place code, while functions passed by pointers are rarely inlined. Lambda expressions are implemented as function-objects, so they have the same performance.

and functions
Instead of the  and   C standard library functions, use the   and   C++ standard library functions.

The former two functions require a function pointer as an argument, whereas the latter two may take a function-object argument (or, using the C++11 standard, a lambda expression). Pointers to functions are often not expanded inline and are therefore less efficient than function-objects, which are almost always inlined.

Encapsulated collections
Encapsulate (using a class) a collection that is accessible from several compilation units.

At design time, it is difficult to decide which data structure will have optimal performance when the software is used. At optimization time, performance can be measured and it can be seen whether changes to the container type result in improvements, for example changing from  to. Such implementation changes can however propagate to users of the code.

If a collection is private to one compilation unit, implementation changes will only impact the source code of that unit and encapsulation of the collection is unnecessary. If, however, the collection is not private (in other words, it is directly accessible from other compilation units) an implementation change could result in extensive change being necessary. To make such optimization feasible, therefore, encapsulate the collection in a class whose interface does not change when the container implementation is changed.

STL containers already use this principle, but certain operations are still available only for some containers (like, existing for  , but not for  ).

STL container usage
When using an STL container, if several equivalent expressions have the same performance, choose the more general expression.

For instance, call  instead of , call   instead of  , and call   instead of. The former expressions are valid for every container type, while the latter are valid only for some. The former are also no less efficient than the latter and may even be more efficient. For example, to get the size of a linked list the list must be traversed, whereas to see that it is empty is a constant time operation.

Unfortunately, it is not always possible to write code that is equally correct and efficient for every type of container. Nevertheless, decreasing the number of statements that are dependent on the container type will decrease the number of statements that must be changed if the type of the container is later changed.

Choice of the default container
When choosing a variable-length container, if in doubt, choose a .

For a data-set with a small number of elements,  is the most efficient variable-length container for any operation.

For larger collections, other containers may become more efficient for certain operations, but  still has the lowest space overhead (as long as there is no excess capacity) and the greatest locality of reference.

Inlined functions
'''If your compiler allows whole program optimization and automatic inline-expansion of functions, use such options and do not declare any functions. If such compiler features are not available, declare suitable functions as  in a header; suitable functions contain no more than three lines of code and have no loops.'''

Inline function-expansion avoids the function call overhead. The overhead grows as the number of function arguments increases. In addition, since inline code is near to the caller code, it has better locality of reference. And because the intermediate code generated by the compiler for inlined functions is merged with the caller code, it can be optimized more easily by the compiler.

Expanding inline a tiny function, such as a function containing only a simple assignment or a simple  statement, can result in a decrease in the size of the generated machine code.

Conversely, every time a function containing substantial code is inlined the machine code is duplicated and the total size of the program increases. Increasing the size of the program also will likely decrease the performance of your instruction cache, increasing latency.

Inlined code is more difficult to profile. If a non-inlined function is a bottleneck, it can be found by the profiler. But if the same function is inlined wherever it is called, its run-time is scattered among many functions and the bottleneck cannot be detected by the profiler.

For functions containing substantial amounts of code, only performance critical ones should be declared  during optimization.

Symbols representation
To represent internal symbols, use enumerations instead of strings.

For example, instead of the following code:

use the following code:

An enumeration is implemented as an integer. Compared to an integer, a string occupies more space and is slower to copy and compare. (In addition, using strings instead of integers to represent internal state may introduce string comparison errors in code that deals with multiple locales.)

and statements
If you have to compare an integer value with a set of constant values, instead of a sequence of  statements, use a   statement.

For example, instead of the following code:

write the following code:

Compilers may exploit the regularity of  statements to apply some optimizations, in particular if the guideline "Case values for   statements" in this section is applied.

Case values of statements
As constants for  statements cases, use compact sequences of values, that is, sequences with no gaps or with few small gaps.

When compiling a  statement whose case values comprise most of the values in an integer interval, instead of generating a sequence of   statements, an optimizing compiler will generate a jump-table. The table is an array containing the start address of the code for each case. When executing the  statement, the table is used to jump to the code associated with the case number.

For example, the following C++ code:

probably generates machine code corresponding to the following pseudo-code:

Instead, the following C++ code:

probably generates machine code corresponding to the following code:

For so few cases, there is probably little difference between the two situations, but as the case count increases, the former code becomes more efficient, as it performs only one computed goto instead of a sequence of branches.

Case order in statement
In  statements, put typical cases first.

If the compiler does not use a jump-table, cases are evaluated in order of appearance; therefore, fewer comparisons are performed for the more frequent cases.

Grouping function arguments
In a loop that calls a function with more arguments than there are registers, consider passing a struct or object instead.

For example, instead of the following code:

consider writing the following:

If all function arguments can be placed directly into processor registers, the arguments can be passed and manipulated quickly. If there are more arguments than available registers, those arguments that could not be placed into registers will be pushed onto the stack at the start of every function call and removed from the stack at the end of the call. If a structure or object is passed, a register may be used and after initialization of the structure or object, only those parts of the structure or object that change between successive calls must be updated.

Compilers vary in the number of registers used for function arguments. Relying on the number used by any particular compiler version is unwise. Assuming that 4 registers are used is reasonable.

Use of container member functions
To search for an element in a container, use a container member function instead of an STL algorithm.

If a container provides a member function that duplicates a generic STL algorithm it is because the member function is more efficient.

For example, to search a  object, you can use the   generic algorithm, or the   member function. The former has linear complexity (O(n)), while the latter has logarithmic complexity (O(log(n))).

Search in sorted sequences
To search a sorted sequence, use the,  ,  , or   generic algorithms.

Given that all the cited algorithms use a logarithmic complexity (O(log(n))) binary search, they are faster than the  algorithm, which uses a linear complexity (O(n)) sequential scan.

Ottimizzare C++/Scrivere codice C++ efficiente/Costrutti che migliorano le prestazioni

member functions
In every class, declare every member function that does not access the non- members of the class as   .

In other words, declare all the member functions that you can as.

In this way, the implicit  argument is not passed.