ROSE Compiler Framework/Program Analysis

Overview
ROSE have implemented the following compiler analysis
 * call graph analysis
 * control flow graph
 * data flow analysis: including liveness analysis, def-use analysis, etc.
 * dependence analysis
 * side effect analysis

control flow graph
ROSE provides several variants of control flow graphs

Virtual Control Flow Graph
The virtual control flow graph (vcfg) is dynamically generated on the fly when needed. So there is no mismatch between the ROSE AST and its corresponding control flow graph. The downside is that the same vcfg will be re-generated each time it is needed. This can be a potentially a performance bottleneck.

Facts
 * Documentation: virtual CFG is documented in Chapter 19 Virtual CFG of ROSE tutorial pdf
 * Source Files:
 * src/frontend/SageIII/virtualCFG/virtualCFG.h
 * src/frontend/SageIII/virtualCFG/virtualCFG.C //not only give definitions of virtualCFG.h, but also extend AST node support in VirtualCFG
 * src/ROSETTA/Grammar/Statement.code // prototypes of member functions for SgStatement nodes, etc.
 * src/ROSETTA/Grammar/Expression.code // prototypes of member functions for SgExpression nodes, etc.
 * src/ROSETTA/Grammar/Support.code // prototypes of member functions for SgInitialized(LocatedNode) nodes, etc.
 * src/ROSETTA/Grammar/Common.code // prototypes of member functions for other nodes, etc.
 * src/frontend/SageIII/virtualCFG/memberFunctions.C // implementation of virtual CFG related member functions for each AST node
 * This file will help the generation of buildTree/src/frontend/SageIII/Cxx_Grammar.h
 * Test directory: tests/CompileTests/virtualCFG_tests
 * A dot graph generator: generate a dot graph for either the raw or interesting virtual CFG.
 * Source: tests/CompileTests/virtualCFG_tests/generateVirtualCFG.C
 * Installed under rose_ins/bin

How to extend VirtualCFG to support OpenMP

For example, if SgOmpPrivateClause or SgOmpSharedClause are not support in VirtualCFG, it is necessary to check whether buildTree/src/frontend/SageIII/Cxx_Grammar.h has function prototypes for adding CFGEdge of SgOmpClause, like SgOmpClause::cfgInEdge SgOmpClause::cfgOutEdge If there is no prototypes, then that means you CFGNode does not belong to SgExpression, SgStatement and SgExpression. SgOmpClause can be added in src/ROSETTA/Grammar/Support.code, step1: construct SgOmpClause::cfgndexForEnd this index is based on the AST graph of your source code, the index is explicit in AST node real example: SgOmpClauseBodyStatement::cfgIndexForEnd const { int size = this->get_clauses.size; // the number of clauses in #pragma omp parallel return (size + 1); // clauses + body }  step2: construct cfgInEdge for this CFGNode please refer to AST, since AST can show all node information, real example: std::vector SgOmpClauseBodyStatement::cfgInEdges(unsigned int idx) { std::vector result; addIncomingFortranGotos(this, idx, result); if( idx == 0 ) {   makeEdge( getNodeJustBeforeInContainer( this ), CFGNode( this, idx ), result ); }  else {     if( idx == ( this->get_clauses.size + 1 ) ) {       makeEdge( this->get_body->cfgForEnd, CFGNode( this, idx ), result ); //connect variables clauses first, then parallel body }     else {          if( idx < ( this->get_clauses.size + 1 ) ) {              makeEdge( this->get_clauses[idx -1]->cfgForEnd, CFGNode( this, idx ), result );//connect variables clauses first, then parallel body }          else {            ROSE_ASSERT( !" Bad index for SgOmpClauseBodyStatement" ); }      }   } return result; } step3: construct cfgOutEdge for CFGNode For example: std::vector SgOmpClauseBodyStatement::cfgOutEdges(unsigned int idx) {//! edited by Hongyi for edges between SgOmpClauseBodyStatement and SgOmpClause std::vector result; addIncomingFortranGotos( this, idx, result ); if( idx == (this->get_clauses.size + 1 ) ) {    makeEdge( CFGNode( this ,idx), getNodeJustAfterInContainer( this ), result ); } else {   if( idx == this->get_clauses.size  ) {        makeEdge( CFGNode( this, idx ), this->get_body->cfgForBeginning, result ); // connect variable clauses first, parallel body last }     else {       if( idx < this->get_clauses.size ) // connect variables clauses first, parallel body last {           makeEdge( CFGNode( this, idx ), this->get_clauses[idx]->cfgForBeginning, result ); }        else {           ROSE_ASSERT( !"Bad index for SgOmpClauseBodyStatement" ); }      }  } return result; }
 * how to add CFGNode for SgOmpClause in
 * 1. Identify the class name in ROSETTA in frontend
 * 2. add the function definitions in src/frontend/SageIII/virtualCFG/memberFunctions.C to give the definitions of adding CFGNode and CFGEdge

First check AST graph /Users/ma23/Desktop/Screen shot 2012-08-24 at 11.51.33 AM.png In this example, you will find that there are three subtree from SgOmpParallelStatement One is get_body, the other two are SgOmpPrivateClasue and SgOmpSharedClauserespectively. So the index is 3. // the order to visit CFGNode is to visit clauses first, then parallel body
 * 3.How to check the result



Static Control Flow Graph
Due to the performance concern of virtual control flow graph, we developed another static version which persistently exists in memory like a regular graph.

Facts:
 * Documentation: 19.7 Static CFG of ROSE tutorial pdf
 * Test Directory: rose/tests/CompileTests/staticCFG_tests

Static and Interprocedural CFGs
Facts:
 * Documentation: 19.8 Static, Interprocedural CFGs of ROSE tutorial pdf
 * Test Directory: rose/tests/CompileTests/staticCFG_tests

Virtual Function Analysis
Facts
 * Original contributor: Faizur from UTSA, done in Summer 2011
 * Code: at src/midend/programAnalysis/VirtualFunctionAnalysis.
 * Implemented with the techniques used in the following paper: "Interprocedural Pointer Alias Analysis - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.2382". The paper boils down the virtual function resolution to pointer aliasing problem. The paper employs flow sensitive inter procedural data flow analysis to solve aliasing problem, using compact representation graphs to represent the alias relations.
 * Some test files in the roseTests folder of the ROSE repository and he told me that the implementation supports function pointers as well as code which is written across different files (header files etc).
 * Documentation: Chapter 24 Dataflow Analysis based Virtual Function Analysis, of ROSE tutorial pdf

Def-use analysis
If you want a def-use analysis, try this http://www.rosecompiler.org/ROSE_HTML_Reference/classVariableRenaming.html

VariableRenaming v(project); v.run; v.getReachingDefsAtNode(...);

testing
 * cd buildtree/tests/nonsmoke/functional/roseTests/programAnalysisTests/defUseAnalysisTests
 * type ```make check```

liveness analysis
see Liveness analysis

Pointer Analysis
https://mailman.nersc.gov/pipermail/rose-public/2010-September/000390.html

On 9/1/10 11:49 AM, Fredrik Kjolstad wrote: > Hi all, > > I am trying to use Rose as the analysis backend for a refactoring > engine and for one of the refactorings I am implementing I need > whole-program pointer analysis. Rose has an implementation of > steensgard's algorithm and I have some questions regarding how to use > this. > > I looked at the file steensgaardTest2.C to figure out how to invoke > this analysis and I am a bit perplexed: > > 1. The file SteensgaardPtrAnal.h that is included by the test is not > present in the include directory of my installed version of Rose. > Does this mean that the Steensgaard implementation is not a part of > the shipped compiler, or does it mean that I have to retrieve an > instance of it through some factory method whose static return type is > PtrAnal? I believe it is in the shipped compiler. And you're using the correct file to figure out how to use it. It should be in the installed include directory --- if it is not, it's probably something that needs to be fixed. But you can copy the include file from ROSE/src/midend/programAnalysis/pointerAnal/ as a temporary fix

> 2. How do I initialize the alias analysis for a given SgProject? Is > this done through the overloaded ? The steensgaardTest2.C file shows how to set up everything to invoke the analysis. Right now you need to go over each function definition and invoke the analysis explicitly, as illustrated by the main function in the file. > 3. Say I want to query whether two pointer variables alias and I have > SGNodes to their declarations. How do I get the AstNodePtr needed to > invoke the may_alias(AstInterface&, const AstNodePtr&, const > AstNodePtr&) function? Or maybe I should rather invoke the version of > may_alias that takes two strings (varnames)? To convert a SgNode* x to AstNodePtr, wrap it inside an AstNodePtrImpl object, i.e., do AstNodePtrImpl(x), as illustrated inside the operator of TestPtrAnal in steensgaardTest2.C. > 4. How do I query whether two parameters alias? The PtrAnal class has the following interface method may_alias(AstInterface& fa, const AstNodePtr& r1, const AstNodePtr& r2); It is implemented in SteensgaardPtrAnal class, which inherit PtrAnal class. To build AstInterface and AstNodePtr, you simply need to wrap SgNode* with some wrapper classes, illustrated by steensgaardTest2.C -Qing Yi

void func(void) { int* pointer; int* aliasPointer;

pointer = malloc(sizeof(int)); aliasPointer = pointer;
 * aliasPointer = 42;

printf("%d\n", *pointer); }

The SteensgaardPtrAnal::output function returns: c:(sizeof(int )) LOC1=>LOC2 c:42 LOC3=>LOC4 v:func LOC5=>LOC6 (inparams: ) ->(outparams: LOC7) v:func-0 LOC8=>LOC7 v:func-2-1 LOC9=>LOC10 v:func-2-3 LOC11=>LOC12 (pending LOC10 LOC13=>LOC14 =>LOC4 ) v:func-2-4 LOC15=>LOC16 =>LOC17 v:func-2-5 LOC18=>LOC14 =>LOC4 v:func-2-aliasPointer LOC19=>LOC14 =>LOC4 v:func-2-pointer LOC20=>LOC13 =>LOC14 =>LOC4 v:malloc LOC21=>LOC22 (inparams: LOC2) ->(outparams:  LOC12) v:printf LOC23=>LOC24 (inparams: LOC16=>LOC17  LOC14=>LOC4 ) ->(outparams: LOC25)

SSA
ROSE has implemented an SSA form. Some discussions on the mailing list: link.

Rice branch has an implementation of array SSA. We are waiting for their commits to be pushed into Jenkins. --Liao (discuss • contribs) 18:17, 19 June 2012 (UTC)

Side Effect Analysis
There are at least two implementations in ROSE

First one: recommended to use.
 * Inside of SageInterface interface functions: http://rosecompiler.org/ROSE_HTML_Reference/namespaceSageInterface.html
 * such as bool SageInterface::collectReadWriteVariables (SgStatement *stmt, std::set< SgInitializedName * > &readVars, std::set< SgInitializedName * > &writeVars) // you can pass a for loop, its body (a basic block stmt) to this function and get the read/write variables. It returns false if the analysis is not successful. So be sure to check the return value before using the results.
 * This function is a wrapper function for
 * bool AnalyzeStmtRefs(LoopTransformInterface &la, const AstNodePtr& n,CollectObject &wRefs, CollectObject &rRefs) // from DepInfoAnal.C
 * StmtSideEffectCollect(LoopTransformInterface::getSideEffectInterface)(fa,n,&colw,&colr); //src/midend/astUtil/astSupport/StmtInfoCollect.h

Second one: This one is not robust enough for use. It also depends on sqlite library for installation.
 * Source Code: src/midend/programAnalysis/sideEffectAnalysis
 * Tests: tests/roseTests/programAnalysisTests/sideEffectAnalysisTests
 * The algorithm is based on the paper: K. D. Cooper and K. Kennedy. 1988. Interprocedural side-effect analysis in linear time. In Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation (PLDI '88), R. L. Wexelblat (Ed.). ACM, New York, NY, USA, 57-66.

Generic Dataflow Framework
As the ROSE project goes on, we have collected quite some versions of dataflow analysis. It is painful to maintain and use them as they An ongoing effort is to consolidate all dataflow analysis work within a single framework.
 * Duplicate the iterative fixed-point algorithm
 * Scatter in different directories and
 * Use different representations for results.

Quick facts
 * Original author: Greg Bronevetsky
 * Code reviewer: Chunhua Liao
 * Documentation:
 * Source codes: files under ./src/midend/programAnalysis/genericDataflow
 * Tests: tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests
 * Currently implemented analysis
 * Dominator analysis: dominatorAnalysis.h dominatorAnalysis.C
 * Livedead variable analysis, or Liveness analysis: liveDeadVarAnalysis.h liveDeadVarAnalysis.C
 * Constant propagation: constantPropagation.h constantPropagation.C: TODO need to move the files into src/ from /tests

See more at Generic Dataflow Framework

Dependence analysis
TODO: it turns out the interface work is not merged into our master branch. So the following instructions do not apply!

The interface for dependence graph could be found in DependencyGraph.h. The underlying representation is n DepGraph.h. BGL is required to access the graph.

Here are 6 examples attached with this email. In deptest.C, there are also some macros to enable more accurate analysis.

If USE_IVS is defined, the induction variable substitution will be performed. if USE_FUNCTION is defined, the dependency could take a user-specified function side-effect interface. Otherwise, if non of them are defined, it will perform a normal dependence analysis and build the graph.