ROSE Compiler Framework/LoopProcessor

Where is the tool
Source file
 * https://github.com/rose-compiler/rose-develop/blob/master/tutorial/LoopProcessor.C

Binary, not built or installed by default. You have to build it:
 * cd rose_buildtree/tutorial
 * make loopProcessor

Documentation
See more at
 * Chapter 38 of http://rosecompiler.org/ROSE_Tutorial/ROSE-Tutorial.pdf

Command line options
..[buildtree/tutorial]./loopProcessor --help

loopProcessor

-gobj: generate object file -orig: copy non-modified statements from original file

-splitloop: applying loop splitting to remove conditionals inside loops
 * 1) split loop

-annot -pre: apply partial redundancy elimination -fd: apply finite differencing to array index expressions

-debugloop: print debugging information for loop transformations; -debugdep: print debugging information for dependence analysis;
 * 1) Debugging options

-tmloop: print timing information for loop transformations;

-arracc : use function to denote multi-dimensional array access;
 * 1) Use special function to denote array access (the special function can be replaced
 * 2) with macros after transformation). This option is for circumventing complex
 * 3) subscript expressions for linearized multi-dimensional arrays.

opt : the level of loop optimizations to apply; by default, only the outermost level is optimized;

-unroll [-locond] [-nvar] [poet] <-unrollsize> : unrolling innermost loops at
 * 1) unroll loop:

-bs : break up statements in loops at
 * 1) break up statements in loops

-bk_poet : parameterize the blocking transformation

-par_poet : paralleization transformation using POET

-bk1 :block outer loops -bk2 :block inner loops -bk3 :block all loops
 * 1) loop blocking

-cp :copy array regions with dimensions <= -cp_poet :parameterize array copy array regions; to be applied together with blocking.
 * 1) copy array

-ic1 :loop interchange for more reuses // ***
 * 1) loop interchange

-fs0 : maximum distribution at all loops -fs01 : maximum distribution at inner-most loops
 * 1) loop fission

-fs1 :single-level loop fusion for more reuses -fs2 :multi-level loop fusion for more reuses
 * 1) loop fusing

-ta :split limit for transitive dep. analysis
 * 1) Max number of nodes to split for transitive dependence analysis (to limit the overhead of transitive dep. analysis)

-clsize :set cache line size
 * 1)  set cache line size in evaluating spatial locality (affect decisions in applying loop optimizations)

-reuse_dist :set reuse distance
 * 1) set maximum distance of reuse that can exploit cache (used to evaluate temporal locality of loops)

-dt :perform dynamic tuning

Example use
Loop fusion

// ---test loop fusion input.c ---
 * 1) define N 1024

void foo(double a[N], double b[N], double c[N]) { int i,j; for (i = 0; i < N; i++) a[i - 1] = b[i];

for (j = 0; j < N; j++) c[j] = a[j]; }

// command line

[..buildtree/tutorial]./loopProcessor -fs2 input.c

// output--- // test loop fusion
 * 1) define N 1024

void foo(double a[1024],double b[1024],double c[1024]) { int i;  int j;  for (i = 0; i <= 1024; i += 1) { if (i <= 1023) { a[i - 1] = b[i]; }    else { }   if (i >= 1) { c[-1 + i] = a[-1 + i]; }    else { } } }