Ada Style Guide/Improving Performance

Introduction
In many ways, performance is at odds with maintainability and portability. To achieve improved speed or memory usage, the most clear algorithm sometimes gives way to confusing code. To exploit special purpose hardware or operating system services, non-portable implementation dependencies are introduced. When concerned about performance, you must decide how well each algorithm meets its performance and maintainability goals. Use the guidelines in this chapter with care; they may be hazardous to your software.

The best way to build a system that satisfies its performance requirements is through good design. You should not assume that speeding up your code will result in a visible increase in system execution. In most applications, the overall throughput of the system is not defined by the execution speed of the code but by the interaction between concurrent processes and the response time of the system peripherals.

Most of the guidelines in this chapter read "... when measured performance indicates." "Indicates" means that you have determined that the benefit in increased performance to your application in your environment outweighs the negative side effects on understandability, maintainability, and portability of the resulting code. Many of the guideline examples show the alternatives that you will need to measure in order to determine if the guideline is indicated.

Performance Issues
Performance has at least four aspects: execution speed, code size, compilation speed, and linking speed. Although all four are important, most people think of execution speed when performance is mentioned, and most of the guidelines in this chapter focus on execution speed.

Performance is influenced by many factors, including the compilation software, hardware, system load, and coding style. While only coding style is typically under the control of the programmer, the other factors have so much influence that it is impossible to make flat statements such as "case statements are more efficient than if-then-else structures." When performance is critical, you cannot assume that a coding style that proves more efficient on one system will also be more efficient on another. Decisions made for the sake of performance must be made on the basis of testing the alternatives on the actual system on which the application will be fielded.

Performance Measurement
While most well-known tools for measuring performance are stand-alone programs that concentrate on execution speed, there is a comprehensive tool that covers all four aspects of performance. The Ada Compiler Evaluation System (ACES) is the result of merging two earlier products: the United States Department of Defense's Ada Compiler Evaluation Capability and the United Kingdom Ministry of Defence's Ada Evaluation System. It offers a comprehensive set of nearly 2,000 performance tests along with automated setup, test management, and analysis software. This system reports (and statistically analyzes) compilation time, linking time, execution time, and code size. The analysis tools make comparisons among multiple compilation-execution systems and also provide comparisons of the run-time performance of tests using different coding styles to achieve similar purposes.

Performance Issues Working Group (PIWG) suite. The Quick-Look facility is advertised as being easy to download, install, and execute in less than a day, while providing information that is as useful as that generated by the PIWG suite. In addition, sw-eng.falls-church.va.us, directory /public/AdaIC/testing/aces. For World Wide Web access, use the following uniform resource locator (URL): http://sw-eng.falls-church.va.us/AdaIC/testing/aces/.

While measuring performance may seem to be a relatively straightforward matter, there are significant issues that must be addressed by any person or toolset planning to do such measurement. For detailed information, see the following sources: ACES (1995a, 1995b, 1995c); Clapp, Mudge, and Roy (1990); Goforth, Collard, and Marquardt (1990); Knight (1990); Newport (1995); and Weidermann (1990).

guideline

 * Use blocks (see Guideline 5.6.9) to introduce late initialization when measured performance indicates.

rationale
Late initialization allows a compiler more choices in register usage optimization. Depending on the circumstance, this may introduce a significant performance improvement.

Some compilers incur a performance penalty when declarative blocks are introduced. Careful analysis and timing tests by the programmer may identify those declarative blocks that should be removed.

guideline

 * Use constrained arrays when measured performance indicates.

rationale
If array bounds are not known until run-time, then calculations of these bounds may affect run-time performance. Using named constants or static expressions as array bounds may provide better performance than using variables or nonstatic expressions. Thus, if the values of Lower and Upper are not determined until run-time, then:

may cause address and offset calculations to be delayed until run-time, introducing a performance penalty. See NASA (1992) for a detailed discussion of the tradeoffs and alternatives.

guideline

 * Use zero-based indexing for arrays when measured performance indicates.

rationale
For some compilers, offset calculations for an array whose lower bound is 0 (either the integer zero or the first value of an enumeration type) are simplified. For other compilers, optimization is more likely if the lower bound is 1.

guideline

 * Use fixed-size components for records when measured performance indicates.

rationale
Determine the size and speed impact of unconstrained records having components depending on discriminants. Some compilers will allocate the maximum possible size to each object of the type; others will use pointers to the dependent components, incurring a possible heap performance penalty. Consider the possibility of using fixed-size components.

guideline

 * Define arrays of records as parallel arrays when measured performance indicates.

rationale
Determine the impact of structuring data as arrays of records, records containing arrays, or parallel arrays. Some implementations of Ada will show significant performance differences among these examples.

guideline

 * Use a sequence of assignments for an aggregation when measured performance indicates.

rationale
Determine the impact of using an aggregate versus a sequence of assignments. Using an aggregate generally requires the use of a temporary variable. If the aggregate is "static" (i.e., its size and components are known at compile- or link-time, allowing link-time allocation and initialization), then it will generally be more efficient than a sequence of assignments. If the aggregate is "dynamic," then a series of assignments may be more efficient because no temporary variable is needed.

See Guideline 5.6.10 for a discussion of aggregates from the point of view of readability and maintainability.

See Guideline 10.6.1 for a discussion of extension aggregates.

guideline

 * Use incremental schemes instead of mod and rem when measured performance indicates.

rationale
Determine the impact of using the mod and rem operators. One of the above styles may be significantly more efficient than the other.

guideline

 * Use the short-circuit control form when measured performance indicates.

rationale
Determine the impact of using nested if statements versus using the  or   operator. One of the above may be significantly more efficient than the other.

guideline

 * Use the case statement when measured performance indicates.

rationale
Determine the impact of using case statements versus the elsif construct. If the case statement is implemented using a small jump table, then it may be significantly more efficient than the if .. then .. elsif construct.

See also Guideline 8.4.6 for a discussion of the table-driven programming alternative.

guideline

 * Use hard-coded constraint checking when measured performance indicates.

rationale
Determine the impact of using exception handlers to detect constraint errors. If the exception handling mechanism is slow, then hard-coded checking may be more efficient.

guideline

 * Use column-first processing of two-dimensional arrays when measured performance indicates.

rationale
Determine the impact of processing two-dimensional arrays in row-major order versus column-major order. While most Ada compilers are likely to use row-major order, it is not a requirement. In the presence of good optimization, there may be no significant difference in the above examples. Using static array bounds is also likely to be significant here. See Guidelines 10.4.1 and 10.4.2.

guideline

 * Use overwriting for conditional assignment when measured performance indicates.

rationale
Determine the impact of styles of assigning alternative values. The examples illustrate two common methods of doing this; for many systems, the performance difference is significant.

guideline

 * When measured performance indicates, perform packed Boolean array shift operations by using slice assignments rather than repeated bit-wise assignment.

rationale
Determine the impact of slice manipulation when shifting packed Boolean arrays. For Ada 83 implementations using packed Boolean arrays, shift operations may be much faster when slice assignments are used as opposed to for loop moving one component at a time. For Ada 95 implementations, consider using modular types instead (see Guideline 10.6.3).

guideline

 * Use static subprogram dispatching when measured performance indicates.

example
The term "static dispatching" in this example refers to the use of if/elsif sequences to explicitly determine which subprograms to call based on certain conditions:

rationale
Determine the impact of dynamic and static subprogram dispatching. The compiler may generate much more efficient code for one form of dispatching than the other.

guideline

 * Use only simple aggregates when measured performance indicates.

rationale
Determine the impact of using extension aggregates. There may be a significant performance difference between evaluation of simple aggregates and evaluation of extension aggregates.

guideline

 * For mutual exclusion, when measured performance indicates, use protected types as an alternative to tasking rendezvous.
 * To implement an interrupt handler, when performance measurement indicates, use a protected procedure.

rationale
Protected objects are meant to be much faster than tasks used for the same purpose (see Guideline 6.1.1). Determine the impact of using protected objects to provide access safely to encapsulated data in a concurrent program.

guideline

 * Use modular types rather than packed Boolean arrays when measured performance indicates.

rationale
Determine the impact of performing bit-wise operations on modular types. The performance of these operations may be significantly different from similar operations on packed Boolean arrays. See also Guideline 10.5.7.

guideline

 * Use the predefined bounded strings when predictable performance is an issue and measured performance indicates.

rationale
The unbounded strings may be allocated on the heap. If bounded strings are not allocated on the heap, then they may provide better performance. Determine the impact of using the string type declared in instantiations of Ada.Strings.Bounded.Generic_Bounded_Length versus the type declared in Ada.Strings.Unbounded.

The predefined Ada 95 language environment defines packages that support both bounded and unbounded strings. Using bounded strings may avoid the unpredictable duration of delays associated with using heap storage.

guideline

 * Use the procedural form of the string handling subprograms when measured performance indicates.

rationale
Determine the relative performance cost of functions and procedures having the same name and functionality in Ada.Strings.Fixed, Ada.Strings.Bounded, Ada.Strings.Unbounded and the corresponding child packages whose names include Wide.

While functional notation typically leads to clearer code, it may cause the compiler to generate additional copying operations.

guideline

 * Use strong typing with carefully selected constraints to reduce run-time constraint checking when measured performance indicates.

example
In this example, two potential constraint checks are eliminated. If the function Get_Response returns String, then the initialization of the variable Input would require constraint checking. If the variable Last is type Positive, then the assignment inside the loop would require constraint checking:

rationale
Because run-time constraint checking is associated with slow performance, it is not intuitive that the addition of constrained subtypes could actually improve performance. However, the need for constraint checking appears in many places regardless of the use of constrained subtypes. Even assignments to variables that use the predefined subtypes may need constraint checks. By consistently using constrained subtypes, many of the unnecessary run-time checking can be eliminated. Instead, the checking is usually moved to less frequently executed code involved in system input. In the example, the function Get_Response may need to check the length of a user-supplied string and raise an exception.

Some compilers can do additional optimizations based on the information provided by constrained subtypes. For example, although an unconstrained array does not have a fixed size, it has a maximum size that can be determined from the range of its index. Performance can be improved by limiting this maximum size to a "reasonable" number. Refer to the discussion on unconstrained arrays found in NASA (1992).

guideline

 * For cases where both rendezvous and protected types are inefficient, consider the use of the Real-Time Systems Annex.

rationale
The packages Ada.Synchronous_Task_Control and Ada.Asynchronous_Task_Control have been defined to provide an alternative to tasking and protected types for use in applications where a minimal run-time is desired.

guideline

 * When measured performance indicates, use pragma Inline when calling overhead is a significant portion of the routine's execution time.

rationale
If calling overhead is a significant portion of a subprogram's execution time, then using pragma Inline may reduce execution time.

Procedure and function invocations include overhead that is unnecessary when the code involved is very small. These small routines are usually written to maintain the implementation hiding characteristics of a package. They may also simply pass their parameters unchanged to another routine. When one of these routines appears in some code that needs to run faster, either the implementation-hiding principle needs to be violated or a pragma Inline can be introduced.

The use of pragma Inline does have its disadvantages. It can create compilation dependencies on the body; that is, when the specification uses a pragma Inline, both the specification and corresponding body may need to be compiled before the specification can be used. As updates are made to the code, a routine may become more complex (larger) and the continued use of a pragma Inline may no longer be justified.

exceptions
Although it is rare, inlining code may increase code size, which can lead to slower performance caused by additional paging. A pragma Inline may actually thwart a compiler's attempt to use some other optimization technique, such as register optimization.

When a compiler is already doing a good job of selecting routines to be inlined, the pragma may accomplish little, if any, improvement in execution speed.

guideline

 * Use pragma Restrictions to express the user's intent to abide by certain restrictions.

rationale
This may facilitate the construction of simpler run-time environments (Ada Reference Manual 1995,, , and ).

guideline

 * Use pragma Preelaborate where allowed.

rationale
This may reduce memory write operations after load time (Ada Reference Manual 1995, and ).

guideline

 * Use pragma Pure where allowed.

rationale
This may permit the compiler to omit calls on library-level subprograms of the library unit if the results are not needed after the call.

guideline

 * Use pragma Discard_Names when the names are not needed by the application and data space is at a premium.

rationale
This may reduce the memory needed to store names of Ada entities, where no operation uses those names.

guideline

 * Use pragma Suppress where necessary to achieve performance requirements.

rationale
See Guideline 5.9.5.

guideline

 * Use pragma Reviewable to aid in the analysis of the generated code.

rationale
See the Ada Reference Manual (.

Summary

 * Use the guidelines in this chapter with care; they may be hazardous to your software.

program structure

 * Use blocks to introduce late initialization when measured performance indicates.

data structures

 * Use constrained arrays when measured performance indicates.
 * Use zero-based indexing for arrays when measured performance indicates.
 * Use fixed-size components for records when measured performance indicates.
 * Define arrays of records as parallel arrays when measured performance indicates.
 * Use a sequence of assignments for an aggregation when measured performance indicates.

algorithms

 * Use incremental schemes instead of mod and rem when measured performance indicates.
 * Use the short-circuit control form when measured performance indicates.
 * Use the case statement when measured performance indicates.
 * Use hard-coded constraint checking when measured performance indicates.
 * Use column-first processing of two-dimensional arrays when measured performance indicates.
 * Use overwriting for conditional assignment when measured performance indicates.
 * When measured performance indicates, perform packed Boolean array shift operations by using slice assignments rather than repeated bit-wise assignment.
 * Use static subprogram dispatching when measured performance indicates.<

types

 * Use only simple aggregates when measured performance indicates.
 * For mutual exclusion, when measured performance indicates, use protected types as an alternative to tasking rendezvous.
 * To implement an interrupt handler, when measured performance indicates, use a protected procedure.
 * Use modular types rather than packed Boolean arrays when measured performance indicates.
 * Use the predefined bounded strings when predictable performance is an issue and measured performance indicates.
 * Use the procedural form of the string handling subprograms when measured performance indicates.
 * Use strong typing with carefully selected constraints to reduce run-time constraint checking when measured performance indicates.
 * For cases where both rendezvous and protected types are inefficient, consider the use of the Real-Time Systems Annex (.

pragmas

 * When measured performance indicates, use pragma Inline when calling overhead is a significant portion of the routine's execution time.
 * Use pragma Restrictions to express the user's intent to abide by certain restrictions.
 * Use pragma Preelaborate where allowed.
 * Use pragma Pure where allowed.
 * Use pragma Discard_Names when the names are not needed by the application and data space is at a premium.
 * Use pragma Suppress where necessary to achieve performance requirements.
 * Use pragma Reviewable to aid in the analysis of the generated code.