ROSE Compiler Framework/Generic Dataflow Framework

Introduction
As the ROSE project goes on, we have collected quite some versions of dataflow analysis. It is painful to maintain and use them as they An ongoing effort is to consolidate all dataflow analysis work within a single framework.
 * duplicate the iterative fixed-point algorithm,
 * scatter in different directories,
 * use different representations for results, and
 * has different level of maturity and robustness.

Quick facts
 * original author: Greg Bronevetsky
 * code gatekeeper: Chunhua Liao
 * Documentation:
 * Chapter 18 Generic Dataflow Analysis Framework, of the ROSE tutorial pdf, git commit
 * This wikibook page
 * source codes: files under ./src/midend/programAnalysis/genericDataflow
 * tests: tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests

Implemented analysis
List
 * Constant Propagation
 * dominator analysis: dominatorAnalysis.h dominatorAnalysis.C
 * livedead variable analysis, or liveness analysis: liveDeadVarAnalysis.h liveDeadVarAnalysis.C
 * Pointer Analysis

Function, nodeState and FunctionState
Function and nodeState are two required parameters to run data flow analysis:

They are stored together inside FunctionState //functionState.h

functionState.h

genericDataflow/cfgUtils/CallGraphTraverse.h

function
An abstraction of functions, internally connected to SgFunctionDeclaration *decl

declared in ./src/midend/programAnalysis/genericDataflow/cfgUtils/CallGraphTraverse.h

constructors:
 * Function::Function(string name) based on function name
 * Function::Function(SgFunctionDeclaration* sample) // core constructor
 * Function::Function(SgFunctionDefinition* sample)

CGFunction* cgFunc; // call graph function

Function func(cgFunc);

NodeFact
any information related to a CFG node.
 * It has no dataflow 's IN/OUT concept
 * not meant to evolve during the dataflow analysis

class NodeFact: public printable {       public:

// returns a copy of this node fact virtual NodeFact* copy const=0; };

NodeState
Store information about multiple analyses and their corresponding lattices, for a given node (CFG node ??)

./src/midend/programAnalysis/genericDataflow/state/nodeState.h

It also provide static functions to
 * initialize NodeState for all DataflowNode
 * to retrieve NodeState for a given DataflowNode

class NodeState {    // internal types: map between analysis and set of lattices

typedef std::map > LatticeMap; typedef std::map > NodeFactMap; typedef std::map BoolMap;

// the dataflow information Above the node, for each analysis that // may be interested in the current node LatticeMap dfInfoAbove; // IN set in a dataflow // the Analysis information Below the node, for each analysis that // may be interested in the current node LatticeMap dfInfoBelow; // OUT set in a dataflow,

// the facts that are true at this node, for each analysis that // may be interested in the current node NodeFactMap facts; // Contains all the Analyses that have initialized their state at this node. It is a map because // TBB doesn't provide a concurrent set. BoolMap initializedAnalyses;

// static interfaces

// returns the NodeState object associated with the given dataflow node. // index is used when multiple NodeState objects are associated with a given node // (ex: SgFunctionCallExp has 3 NodeStates: entry, function body, exit) static NodeState* getNodeState(const DataflowNode& n, int index=0);

// most useful interface: retrieve the lattices (could be only one) associated with a given analysis

// returns the map containing all the lattices from above the node that are owned by the given analysis // (read-only access) const std::vector& getLatticeAbove(const Analysis* analysis) const;

// returns the map containing all the lattices from below the node that are owned by the given analysis // (read-only access) const std::vector& getLatticeBelow(const Analysis* analysis) const;

}

FunctionState
./src/midend/programAnalysis/genericDataflow/state/functionState.h

A pair of Function and NodeState.
 * it provides static functions to initialize all FunctionState And retrieve FunctionState

class FunctionState {       friend class CollectFunctions; public: Function func; NodeState state; // The lattices that describe the value of the function's return variables NodeState retState;

private: static std::set allDefinedFuncs; static std::set allFuncs; static bool allFuncsComputed; public: FunctionState(Function &func): func(func), state(/*func.get_declaration->cfgForBeginning*/) {} // We should use this interface --

// 1. returns a set of all the functions whose bodies are in the project static std::set& getAllDefinedFuncs;

// 2. returns the FunctionState associated with the given function // func may be any declared function static FunctionState* getFuncState(const Function& func); ... }

FunctionState* fs = new FunctionState(func); // empty From FuntionState to NodeState

/************************************* *** UnstructuredPassInterAnalysis *** *************************************/ void UnstructuredPassInterAnalysis::runAnalysis {       set allFuncs = FunctionState::getAllDefinedFuncs; // call a static function to get all function state s

// Go through functions one by one, call an intra-procedural analysis on each of them // iterate over all functions with bodies for(set::iterator it=allFuncs.begin; it!=allFuncs.end; it++) {               FunctionState* fState = *it; intraAnalysis->runAnalysis(fState->func, &(fState->state)); } }

// runs the intra-procedural analysis on the given function, returns true if // the function's NodeState gets modified as a result and false otherwise // state - the function's NodeState bool UnstructuredPassIntraAnalysis::runAnalysis(const Function& func, NodeState* state) {       DataflowNode funcCFGStart = cfgUtils::getFuncStartCFG(func.get_definition,filter); DataflowNode funcCFGEnd = cfgUtils::getFuncEndCFG(func.get_definition, filter); if(analysisDebugLevel>=2) Dbg::dbg << "UnstructuredPassIntraAnalysis::runAnalysis function "< nodeStates = NodeState::getNodeStates(n); // Visit each CFG node for(vector::const_iterator itS = nodeStates.begin; itS!=nodeStates.end; itS++) visit(func, n, *(*itS)); }       return false; }

example: retrieve the liveness analysis's IN lattice

void getAllLiveVarsAt(LiveDeadVarsAnalysis* ldva, const NodeState& state, set& vars, string indent)
 * LiveVarsLattice* liveLAbove = dynamic_cast(*(state.getLatticeAbove(ldva).begin));

Lattices
Caveat: lattice vs. lattice value
 * A lattice by definition is a set of values. However, an instance of lattice type in Generic dataflow framework is used to represent an individual value within a lattice also. Sorry for this confusing. We welcome suggestions to fix this.

Basics
See more at ROSE Compiler Framework/Lattice

Store the data flow analysis information attached to CFG nodes.

Fundamental operations:
 * what to store: lattice value set, bottom, up, and anything in between
 * initialization: LiveDeadVarsAnalysis::genInitState
 * creation: transfer function
 * meet operation: a member function of the lattice

Example
 * liveness analysis: the live variable set at the entry point of a CFG node:
 * constant propagation: lattice values from no information (bottom) -> unknown --> constant --> too much information (conflicting constant values, top),

// blindly add all of that_arg's values into current lattice's value set void LiveVarsLattice::incorporateVars(Lattice* that_arg)

// retrieve a subset lattice information for a given expr. This lattice may contain more information than those about a given expr. Lattice* LiveVarsLattice::project(SgExpression* expr)

// add lattice (exprState)information about expr into current lattice's value set: default implementation just calls meetUpdate(exprState) bool LiveVarsLattice::unProject(SgExpression* expr, Lattice* exprState)

below/above vs IN/OUT
The concept is based on the original CFG flow direction
 * above: the incoming edge direction
 * below: the outcoming edge direction

IN and OUT depends on the direction of the problem, forward vs. backward
 * forward direction: IN == above lattice, OUT = below lattice
 * backward direction: IN == below lattice, OUT = above lattice

Common Utility Lattices
the framework provides some pre-defined lattices ready for use.

lattice.h/latticeFull.h
 * BoolAndLattice

LiveVarsLattice
class LiveVarsLattice : public FiniteLattice {       public: std::set liveVars; // bottom is all live variables,  top is the empty set, meet brings down the lattice -> union of variables. ... };

// Meet operation: simplest set union of two lattices:

// computes the meet of this and that and saves the result in this // returns true if this causes this to change and false otherwise bool LiveVarsLattice::meetUpdate(Lattice* that_arg) {       bool modified = false; LiveVarsLattice* that = dynamic_cast(that_arg); // Add all variables from that to this for(set<varID>::iterator var=that->liveVars.begin; var!=that->liveVars.end; var++) { // If this lattice doesn't yet record *var as being live if(liveVars.find(*var) == liveVars.end) { // this if statement gives a chance to set the modified flag. // otherwise, liveVars.insert can be directly called. modified = true; liveVars.insert(*var); }       }        return modified; }

Transfer Function
basics: Data_flow_analysis
 * IN = sum of OUT (predecessors)
 * OUT = GEN + (IN - KILL)

The impact of program constructs on the current lattices (how to change the current lattices).
 * lattices: stores IN and OUT information
 * additional data members are necessary to store GEN and KILL set inside the transfer function.

class hierarchy: class IntraDFTransferVisitor : public ROSE_VisitorPatternDefaultBase { protected: // Common arguments to the underlying transfer function const Function &func; // which function are we talking about const DataflowNode &dfNode; // wrapper of CFGNode NodeState &nodeState;  // lattice element state, context information? const std::vector<Lattice*> &dfInfo; // data flow information

public:

IntraDFTransferVisitor(const Function &f, const DataflowNode &n, NodeState &s, const std::vector<Lattice*> &d) : func(f), dfNode(n), nodeState(s), dfInfo(d) { }

virtual bool finish = 0; virtual ~IntraDFTransferVisitor { }

};

class LiveDeadVarsTransfer : public IntraDFTransferVisitor {

};

class ConstantPropagationAnalysisTransfer : public VariableStateTransfer<ConstantPropagationLattice> {}

Constant Propagation
template <class LatticeType> class VariableStateTransfer : public IntraDFTransferVisitor { ... };

class ConstantPropagationAnalysisTransfer : public VariableStateTransfer<ConstantPropagationLattice> {};

void ConstantPropagationAnalysisTransfer::visit(SgIntVal *sgn) {    ROSE_ASSERT(sgn != NULL); ConstantPropagationLattice* resLat = getLattice(sgn); ROSE_ASSERT(resLat != NULL); resLat->setValue(sgn->get_value); resLat->setLevel(ConstantPropagationLattice::constantValue); }

LiveDead Variable
Functions to convert program point to Generator and KILL set. For liveness analysis
 * Kill (s) = {variables being defined in s}: //
 * Gen (s) = {variables being used in s}

OUT = IN -KILL + GEN
 * OUT is initialized to be IN set,
 * transfer function will apply -KILL + GEN

class LiveDeadVarsTransfer : public IntraDFTransferVisitor

{ LiveVarsLattice* liveLat;  // the result of this analysis

bool modified; // Expressions that are assigned by the current operation std::set<SgExpression*> assignedExprs; // KILL  set // Variables that are assigned by the current operation std::set<varID> assignedVars; // Variables that are used/read by the current operation std::set<varID> usedVars;  // GEN  set

public: LiveDeadVarsTransfer(const Function &f, const DataflowNode &n, NodeState &s, const std::vector<Lattice*> &d, funcSideEffectUses *fseu_) : IntraDFTransferVisitor(f, n, s, d), indent("   "), liveLat(dynamic_cast<LiveVarsLattice*>(*(dfInfo.begin))), modified(false), fseu(fseu_) {       if(liveDeadAnalysisDebugLevel>=1) Dbg::dbg << indent << "liveLat="<<liveLat->str(indent + "    ")<<std::endl; // Make sure that all the lattice is initialized liveLat->initialize; }

bool finish; // operationg on different AST nodes void visit(SgExpression *); void visit(SgInitializedName *); void visit(SgReturnStmt *); void visit(SgExprStatement *); void visit(SgCaseOptionStmt *); void visit(SgIfStmt *); void visit(SgForStatement *); void visit(SgWhileStmt *); void visit(SgDoWhileStmt *); }

// Helper transfer function, focusing on handling expressions. // live dead variable analysis: LDVA, // expression transfer: transfer functions for expressions /// Visits live expressions - helper to LiveDeadVarsTransfer class LDVAExpressionTransfer : public ROSE_VisitorPatternDefaultBase { LiveDeadVarsTransfer &ldva;

public:

// Plain assignment: lhs = rhs, set GEN (read/used) and KILL (written/assigned) sets void visit(SgAssignOp *sgn) { ldva.assignedExprs.insert(sgn->get_lhs_operand); // If the lhs of the assignment is a complex expression (i.e. it refers to a variable that may be live) OR   // if is a known expression that is known to may-be-live // THIS CODE ONLY APPLIES TO RHSs THAT ARE SIDE-EFFECT-FREE AND WE DON'T HAVE AN ANALYSIS FOR THAT YET /*if(!isVarExpr(sgn->get_lhs_operand) ||      (isVarExpr(sgn->get_lhs_operand) && liveLat->isLiveVar(SgExpr2Var(sgn->get_lhs_operand)))) { */   ldva.used(sgn->get_rhs_operand); } ... }

Call Stack
(gdb) bt   at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/simpleAnalyses/liveDeadVarAnalysis.C:228 at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/simpleAnalyses/liveDeadVarAnalysis.C:384 at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/analysis/dataflow.C:282 at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/analysis/dataflow.h:74 at ../../../../sourcetree/src/midend/programAnalysis/genericDataflow/analysis/analysis.C:467 at ../../../../../sourcetree/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests/liveDeadVarAnalysisTest.C:101
 * 1) 0 LDVAExpressionTransfer::visit (this=0x7fffffffcea0, sgn=0xa20320)
 * 1) 1 0x00002aaaac3d9968 in SgAssignOp::accept (this=0xa20320, visitor=...) at Cxx_Grammar.C:143069
 * 2) 2 0x00002aaaadc61c04 in LiveDeadVarsTransfer::visit (this=0xaf9e00, sgn=0xa20320)
 * 1) 3 0x00002aaaadbbaef0 in ROSE_VisitorPatternDefaultBase::visit (this=0xaf9e00, variable_SgBinaryOp=0xa20320) at ../../../src/frontend/SageIII/Cxx_Grammar.h:316006
 * 2) 4 0x00002aaaadbba04a in ROSE_VisitorPatternDefaultBase::visit (this=0xaf9e00, variable_SgAssignOp=0xa20320) at ../../../src/frontend/SageIII/Cxx_Grammar.h:315931
 * 3) 5 0x00002aaaac3d9968 in SgAssignOp::accept (this=0xa20320, visitor=...) at Cxx_Grammar.C:143069
 * 4) 6 0x00002aaaadbcca0a in IntraUniDirectionalDataflow::runAnalysis (this=0x7fffffffd9f0, func=..., fState=0xafbd18, analyzeDueToCallers=true, calleesUpdated=...)
 * 1) 7 0x00002aaaadbbf444 in IntraProceduralDataflow::runAnalysis (this=0x7fffffffda00, func=..., state=0xafbd18)
 * 1) 8 0x00002aaaadbb0966 in UnstructuredPassInterDataflow::runAnalysis (this=0x7fffffffda50)
 * 1) 9 0x000000000040381a in main (argc=2, argv=0x7fffffffdba8)

Control flow graph and call graph
The generic dataflow framework works on virtual control flow graph in ROSE

Filtered Virtual CFG
The raw virtual CFG may not be desirable for all kinds of analyses since it can have too many administrative nodes which are not relevant to a problem.

So the framework provides a filter parameter to the Analysis class. A default filter will be used unless you specify your own filter.

// Example filter funtion deciding if a CFGnNode should show up or not bool gfilter (CFGNode cfgn) { SgNode *node = cfgn.getNode;

switch (node->variantT) {   //Keep the last index for initialized names. This way the def of the variable doesn't propagate to its assign initializer. case V_SgInitializedName: return (cfgn == node->cfgForEnd);

// For function calls, we only keep the last node. The function is actually called after all its parameters are evaluated. case V_SgFunctionCallExp: return (cfgn == node->cfgForEnd);

//For basic blocks and other "container" nodes, keep the node that appears before the contents are executed case V_SgBasicBlock: case V_SgExprStatement: case V_SgCommaOpExp: return (cfgn == node->cfgForBeginning);

// Must have a default case: return interesting CFGNode by default in this example default: return cfgn.isInteresting; } }

// Code using the filter function int main( int argc, char * argv[] ) { SgProject* project = frontend(argc,argv); initAnalysis(project); LiveDeadVarsAnalysis ldva(project); ldva.filter = gfilter; // set the filter to be your own one

UnstructuredPassInterDataflow ciipd_ldva(&ldva); ciipd_ldva.runAnalysis; .... }

Analysis Driver
Key function: bool IntraUniDirectionalDataflow::runAnalysis(const Function& func, NodeState* fState, bool analyzeDueToCallers, set<Function> calleesUpdated) // analysis/dataflow.C Basic tasks: run the analysis by
 * initialize data flow state: lattices and other information
 * walk the CFG : find descendants from a current node
 * call transfer function

Class Hierarchy
class Analysis {}; // an empty abstract class for any analysis
 * Analysis -> IntraProceduralAnalysis -> IntraProceduralDataflow -> IntraUnitDataflow --> IntraUniDirectionalDataflow (INTERESTING level)-> IntraBWDataflow -> LiveDeadVarsAnalysis

class IntraProceduralAnalysis : virtual public Analysis //analysis/analysis.h,  any intra procedural analysis, data flow or not { protected: InterProceduralAnalysis* interAnalysis; public: void setInterAnalysis(InterProceduralAnalysis* interAnalysis) // connection to inter procedural analysis virtual bool runAnalysis(const Function& func, NodeState* state)=0; // run this per function, NodeState stores lattices for each CFG node, etc.    virtual ~IntraProceduralAnalysis; }

//No re-entry. analysis will be executed once??,   data flow, intra-procedural analysis // now lattices are interested class IntraProceduralDataflow : virtual public IntraProceduralAnalysis //analysis/dataflow.h { // initialize lattice etc for a given dataflow node within a function virtual void genInitState (const Function& func, const DataflowNode& n, const NodeState& state,        std::vector<Lattice*>& initLattices, std::vector<NodeFact*>& initFacts);

virtual bool runAnalysis(const Function& func, NodeState* state, bool analyzeDueToCallers, std::set<Function> calleesUpdated)=0; // the analysis on a function could be triggered by the state changes of function's callers, or callees.

std::set<Function> visited; // make sure a function is initialized once when visited multiple times

}

class IntraUnitDataflow : virtual public IntraProceduralDataflow { // transfer function: operate on lattices associated with a dataflow node, considering its current state virtual bool transfer(const Function& func, const DataflowNode& n, NodeState& state, const std::vector<Lattice*>& dfInfo)=0;

};

// Uni directional dataflow: either forward or backward, but not both directions! class IntraUniDirectionalDataflow : public IntraUnitDataflow { public: bool runAnalysis(const Function& func, NodeState* state, bool analyzeDueToCallers, std::set<Function> calleesUpdated);

protected: bool propagateStateToNextNode (            const std::vector<Lattice*>& curNodeState, DataflowNode curDFNode, int nodeIndex,             const std::vector<Lattice*>& nextNodeState, DataflowNode nextDFNode);

std::vector<DataflowNode> gatherDescendants(std::vector<DataflowEdge> edges,                                                   DataflowNode (DataflowEdge::*edgeFn) const);

virtual NodeState*initializeFunctionNodeState(const Function &func, NodeState *fState) = 0; virtual VirtualCFG::dataflow* getInitialWorklist(const Function &func, bool firstVisit, bool analyzeDueToCallers, const set<Function> &calleesUpdated, NodeState *fState) = 0;

virtual vector<Lattice*> getLatticeAnte(NodeState *state) = 0; virtual vector<Lattice*> getLatticePost(NodeState *state) = 0;

// If we're currently at a function call, use the associated inter-procedural // analysis to determine the effect of this function call on the dataflow state. virtual void transferFunctionCall(const Function &func, const DataflowNode &n, NodeState *state) = 0;

virtual vector<DataflowNode> getDescendants(const DataflowNode &n) = 0; virtual DataflowNode getUltimate(const Function &func) = 0; // ultimate what? final CFG node? };

class IntraBWDataflow : public IntraUniDirectionalDataflow {//BW: Backward public: IntraBWDataflow {}

NodeState* initializeFunctionNodeState(const Function &func, NodeState *fState);

VirtualCFG::dataflow* getInitialWorklist(const Function &func, bool firstVisit, bool analyzeDueToCallers, const set<Function> &calleesUpdated, NodeState *fState);

virtual vector<Lattice*> getLatticeAnte(NodeState *state); virtual vector<Lattice*> getLatticePost(NodeState *state);

void transferFunctionCall(const Function &func, const DataflowNode &n, NodeState *state);

vector<DataflowNode> getDescendants(const DataflowNode &n); // next CFG nodes, depending on the direction { return gatherDescendants(n.inEdges, &DataflowEdge::source);  }

DataflowNode getUltimate(const Function &func); // the last CFG should be the start CFG of the function for a backward dataflow problem {  return cfgUtils::getFuncStartCFG(func.get_definition);   }

};

foward intra-procedural data flow analysis: e.g. reaching definition
 * class IntraFWDataflow : public IntraUniDirectionalDataflow

Initialization: InitDataflowState
Used to initialized the lattices/facts for CFG nodes. It is an analysis by itself. unstructured pass

// super class: provides the driver of initialization: visit each CFG node

class UnstructuredPassIntraAnalysis : virtual public IntraProceduralAnalysis { public: // call the initialization function on each CFG node bool runAnalysis(const Function& func, NodeState* state); // to be implemented by InitDataflowState virtual void visit(const Function& func, const DataflowNode& n, NodeState& state)=0;

}

bool UnstructuredPassIntraAnalysis::runAnalysis(const Function& func, NodeState* state) {       DataflowNode funcCFGStart = cfgUtils::getFuncStartCFG(func.get_definition); DataflowNode funcCFGEnd = cfgUtils::getFuncEndCFG(func.get_definition); if(analysisDebugLevel>=2) Dbg::dbg << "UnstructuredPassIntraAnalysis::runAnalysis function "<<func.get_name.getString<<"\n"; // iterate over all the nodes in this function for(VirtualCFG::iterator it(funcCFGStart); it!=VirtualCFG::dataflow::end; it++) {               DataflowNode n = *it; // The number of NodeStates associated with the given dataflow node //int numStates=NodeState::numNodeStates(n); // The actual NodeStates associated with the given dataflow node const vector<NodeState*> nodeStates = NodeState::getNodeStates(n); // Visit each CFG node for(vector<NodeState*>::const_iterator itS = nodeStates.begin; itS!=nodeStates.end; itS++) visit(func, n, *(*itS)); }       return false; } // derived class provide link to a concrete analysis, and visit implementation class InitDataflowState : public UnstructuredPassIntraAnalysis {       IntraProceduralDataflow* dfAnalysis;  // link to the dataflow analysis to be initialized public: InitDataflowState(IntraProceduralDataflow* dfAnalysis/*, std::vector<Lattice*> &initState*/) {               this->dfAnalysis = dfAnalysis; }       void visit(const Function& func, const DataflowNode& n, NodeState& state); };

void InitDataflowState::visit (const Function& func, const DataflowNode& n, NodeState& state) {  ...   dfAnalysis->genInitState(func, n, state, initLats, initFacts); state.setLattices((Analysis*)dfAnalysis, initLats); state.setFacts((Analysis*)dfAnalysis, initFacts); .... }

worklist
list of CFG nodes, accessed through an iterator interface

auto_ptr<VirtualCFG::dataflow> workList(getInitialWorklist(func, firstVisit, analyzeDueToCallers, calleesUpdated, fState));

class iterator //Declared in cfgUtils/VirtualCFGIterator.h { public: std::list<DataflowNode> remainingNodes; std::set<DataflowNode> visited; bool initialized; protected: // returns true if the given DataflowNode is in the remainingNodes list and false otherwise bool isRemaining(DataflowNode n); // advances this iterator in the given direction. Forwards if fwDir=true and backwards if fwDir=false. // if pushAllChildren=true, all of the current node's unvisited children (predecessors or successors,        //    depending on fwDir) are pushed onto remainingNodes void advance(bool fwDir, bool pushAllChildren); public: virtual void operator ++ (int); bool eq(const iterator& other_it) const; bool operator==(const iterator& other_it) const; bool operator!=(const iterator& it) const; ... };

void iterator::advance(bool fwDir, bool pushAllChildren) {       ROSE_ASSERT(initialized); /*printf("  iterator::advance(%d) remainingNodes.size=%d\n", fwDir, remainingNodes.size); cout<<"       visited=\n"; for(set<DataflowNode>::iterator it=visited.begin; it!=visited.end; it++) cout << "           <"<<it->getNode->class_name<<" | "<<it->getNode<<" | "<<it->getNode->unparseToString<<">\n";*/ if(remainingNodes.size>0) {               // pop the next CFG node from the front of the list DataflowNode cur = remainingNodes.front; remainingNodes.pop_front; if(pushAllChildren) {                       // find its followers (either successors or predecessors, depending on value of fwDir), push back // those that have not yet been visited vector<DataflowEdge> nextE; if(fwDir) nextE = cur.outEdges; else nextE = cur.inEdges; for(vector<DataflowEdge>::iterator it=nextE.begin; it!=nextE.end; it++) {                               DataflowNode nextN((*it).target/* need to put something here because DataflowNodes don't have a default constructor*/); if(fwDir) nextN = (*it).target; else nextN = (*it).source; /*cout << "     iterator::advance "<<(fwDir?"descendant":"predecessor")<<": "<< "<"<<nextN.getNode->class_name<<" | "<<nextN.getNode<<" | "<<nextN.getNode->unparseToString<<">, "<< "visited="<<(visited.find(nextN) != visited.end)<< " remaining="<<isRemaining(nextN)<<"\n";*/ // if we haven't yet visited this node and don't yet have it on the remainingNodes list if(visited.find(nextN) == visited.end &&                                       !isRemaining(nextN)) {                                       //printf("   pushing back node <%s: 0x%x: %s> visited=%d\n", nextN.getNode->class_name.c_str, nextN.getNode, nextN.getNode->unparseToString.c_str, visited.find(nextN)!=visited.end); remainingNodes.push_back(nextN); }                       }                }                // if we still have any nodes left remaining if(remainingNodes.size>0) {                       // take the next node from the front of the list and mark it as visited //visited[remainingNodes.front] = true; visited.insert(remainingNodes.front); }       } }

class dataflow : public virtual iterator {};

class back_dataflow: public virtual dataflow {};

void back_dataflow::operator ++ (int) {       advance(false, true);  // backward,  add all children }

class IntraUniDirectionalDataflow : public IntraUnitDataflow { ...   virtual VirtualCFG::dataflow* getInitialWorklist(const Function &func, bool firstVisit, bool analyzeDueToCallers, const set<Function> &calleesUpdated, NodeState *fState) = 0; }

Implemented in derived classes:
 * VirtualCFG::dataflow* IntraFWDataflow::getInitialWorklist
 * VirtualCFG::dataflow* IntraBWDataflow::getInitialWorklist

apply transfer function
b is a basic block in CFG
 * $${\rm IN}[b] = \bigcup_{p \in pred[b]} {\rm OUT}[p]$$   // information goes into b is the union/join of information comes out of all predecessor nodes of b
 * $${OUT}[b] = {\rm GEN}[b] \cup ({\rm IN}[b] - {\rm KILL}[b])$$ // information goes out out S is the information generated by b minus information killed by b. This is the transfer function operating on b!!

bool IntraUniDirectionalDataflow::runAnalysis(const Function& func, NodeState* fState, bool analyzeDueToCallers, set<Function> calleesUpdated) {

// Iterate over the nodes in this function that are downstream from the nodes added above for(it != itEnd; it++) {               DataflowNode n = *it; SgNode* sgn = n.getNode;

...              for(vector<NodeState*>::const_iterator itS = nodeStates.begin; itS!=nodeStates.end; ) {                       state = *itS;

const vector<Lattice*> dfInfoAnte = getLatticeAnte(state); // IN set const vector<Lattice*> dfInfoPost = getLatticePost(state);  // OUT set

// OUT = IN first    // transfer within  the node: from IN to OUT, // Overwrite the Lattices below this node with the lattices above this node. // The transfer function will then operate on these Lattices to produce the // correct state below this node. vector<Lattice*>::const_iterator itA, itP; int j=0; for(itA = dfInfoAnte.begin, itP  = dfInfoPost.begin;                            itA != dfInfoAnte.end && itP != dfInfoPost.end;                             itA++, itP++, j++) {                               if(analysisDebugLevel>=1){  // Dbg::dbg << "   Meet Before: Lattice "<<j<<": \n        "<<(*itA)->str("            ")<<endl; Dbg::dbg << "   Meet After: Lattice "<<j<<": \n        "<<(*itP)->str("            ")<<endl; }                               (*itP)->copy(*itA); /*if(analysisDebugLevel>=1){ Dbg::dbg << "   Copied Meet Below: Lattice "<<j<<": \n        "<<(*itB)->str("            ")<<endl; }*/                       }                        // =================== TRANSFER FUNCTION =================== // (IN - KILL ) + GEN if (isSgFunctionCallExp(sgn)) transferFunctionCall(func, n, state);

boost::shared_ptr<IntraDFTransferVisitor> transferVisitor = getTransferVisitor(func, n, *state, dfInfoPost); sgn->accept(*transferVisitor); modified = transferVisitor->finish || modified;

// =================== TRANSFER FUNCTION =================== ...//   }

}

propagate state to next (meetUpdate)
This is prove to be essential to propagate information along the path. Cannot commenting it out!!

??? not sure about the difference between this step and the step before (Meet Before / Meet After)

meetUpdate is called here also

// Propagates the dataflow info from the current node's NodeState (curNodeState) to the next node's //    NodeState (nextNodeState). // Returns true if the next node's meet state is modified and false otherwise. bool IntraUniDirectionalDataflow::propagateStateToNextNode(                     const vector<Lattice*>& curNodeState, DataflowNode curNode, int curNodeIndex,                      const vector<Lattice*>& nextNodeState, DataflowNode nextNode) {       bool modified = false; vector<Lattice*>::const_iterator itC, itN; if(analysisDebugLevel>=1){ Dbg::dbg << "\n       Propagating to Next Node: "<<nextNode.getNode<<"["<<nextNode.getNode->class_name<<" | "<<Dbg::escape(nextNode.getNode->unparseToString)<<"]"<<endl; int j;               for(j=0, itC = curNodeState.begin; itC != curNodeState.end; itC++, j++) Dbg::dbg << "       Cur node: Lattice "<<j<<": \n            "<<(*itC)->str("            ")<<endl; for(j=0, itN = nextNodeState.begin; itN != nextNodeState.end; itN++, j++) Dbg::dbg << "       Next node: Lattice "<<j<<": \n            "<<(*itN)->str("            ")<<endl; }

// Update forward info above nextNode from the forward info below curNode. // Compute the meet of the dataflow information along the curNode->nextNode edge with the // next node's current state one Lattice at a time and save the result above the next node. for(itC = curNodeState.begin, itN = nextNodeState.begin;           itC != curNodeState.end && itN != nextNodeState.end;             itC++, itN++) {               // Finite Lattices can use the regular meet operator, while infinite Lattices // must also perform widening to ensure convergence. if((*itN)->finiteLattice) modified = (*itN)->meetUpdate(*itC) || modified; else {                       //InfiniteLattice* meetResult = (InfiniteLattice*)itN->second->meet(itC->second); InfiniteLattice* meetResult = dynamic_cast<InfiniteLattice*>((*itN)->copy); Dbg::dbg << "       *itN: " << dynamic_cast<InfiniteLattice*>(*itN)->str("            ") << endl; Dbg::dbg << "       *itC: " << dynamic_cast<InfiniteLattice*>(*itC)->str("            ") << endl; meetResult->meetUpdate(*itC); Dbg::dbg << "       meetResult: " << meetResult->str("            ") << endl; // Widen the resulting meet modified = dynamic_cast<InfiniteLattice*>(*itN)->widenUpdate(meetResult); delete meetResult; }       }        if(analysisDebugLevel>=1) { if(modified) {                       Dbg::dbg << "        Next node's in-data modified. Adding..."<<endl; int j=0; for(itN = nextNodeState.begin; itN != nextNodeState.end; itN++, j++) {                               Dbg::dbg << "        Propagated: Lattice "<<j<<": \n            "<<(*itN)->str("            ")<<endl; }               }                else Dbg::dbg << "       No modification on this node"<<endl; }

return modified; }

stop condition
class IntraUniDirectionalDataflow : public IntraUnitDataflow { public: protected: // propagates the dataflow info from the current node's NodeState (curNodeState) to the next node's NodeState (nextNodeState) // return true if any state is modified. bool propagateStateToNextNode(            const std::vector<Lattice*>& curNodeState, DataflowNode curDFNode, int nodeIndex,             const std::vector<Lattice*>& nextNodeState, DataflowNode nextDFNode);

}

live dead variable
Backward Intra-Procedural Dataflow Analysis: e.g. liveness analysis ( use --> backward --> defined)
 * class IntraBWDataflow : public IntraUniDirectionalDataflow

class LiveDeadVarsAnalysis : public IntraBWDataflow {

protected: funcSideEffectUses* fseu; public: LiveDeadVarsAnalysis(SgProject *project, funcSideEffectUses* fseu=NULL); // Generates the initial lattice state for the given dataflow node, in the given function, with the given NodeState void genInitState(const Function& func, const DataflowNode& n, const NodeState& state,                         std::vector<Lattice*>& initLattices, std::vector<NodeFact*>& initFacts);

boost::shared_ptr<IntraDFTransferVisitor> getTransferVisitor(const Function& func, const DataflowNode& n,                                                              NodeState& state, const std::vector<Lattice*>& dfInfo) { return boost::shared_ptr<IntraDFTransferVisitor>(new LiveDeadVarsTransfer(func, n, state, dfInfo, fseu)); }

bool transfer(const Function& func, const DataflowNode& n, NodeState& state, const std::vector<Lattice*>& dfInfo) { assert(0); return false; }

};

Inter-procedural analysis
Key: transfer function that is applied to call sites to perform the appropriate state transfers across function boundaries.

transfer function
void IntraFWDataflow::transferFunctionCall(const Function &func, const DataflowNode &n, NodeState *state) { vector<Lattice*> dfInfoBelow = state->getLatticeBelow(this);

vector<Lattice*>* retState = NULL; dynamic_cast<InterProceduralDataflow*>(interAnalysis)-> transfer(func, n, *state, dfInfoBelow, &retState, true);

if(retState && !(retState->size==0 || (retState->size == dfInfoBelow.size))) { Dbg::dbg << "#retState="<<retState->size<<endl; for(vector<Lattice*>::iterator ml=retState->begin; ml!=retState->end; ml++) Dbg::dbg << "       "<<(*ml)->str("            ")<<endl; Dbg::dbg << "#dfInfoBelow="<<dfInfoBelow.size<<endl; for(vector<Lattice*>::const_iterator l=dfInfoBelow.begin; l!=dfInfoBelow.end; l++) Dbg::dbg << "       "<<(*l)->str("            ")<<endl; }

// Incorporate information about the function's return value into the caller's dataflow state // as the information of the SgFunctionCallExp ROSE_ASSERT(retState==NULL || retState->size==0 || (retState->size == dfInfoBelow.size)); if(retState) { vector<Lattice*>::iterator lRet; vector<Lattice*>::const_iterator lDF; for(lRet=retState->begin, lDF=dfInfoBelow.begin;        lRet!=retState->end; lRet++, lDF++) { Dbg::dbg << "   lDF Before="<<(*lDF)->str("        ")<<endl; Dbg::dbg << "   lRet Before="<<(*lRet)->str("        ")<<endl; (*lDF)->unProject(isSgFunctionCallExp(n.getNode), *lRet); Dbg::dbg << "   lDF After="<<(*lDF)->str("        ")<<endl; } } }

InterProceduralDataflow
InterProceduralDataflow::InterProceduralDataflow(IntraProceduralDataflow* intraDataflowAnalysis) : InterProceduralAnalysis((IntraProceduralAnalysis*)intraDataflowAnalysis)

// !!! NOTE: cfgForEnd AND cfgForBeginning PRODUCE THE SAME SgFunctionDefinition SgNode BUT THE DIFFERENT INDEXES // !!!      (0 FOR BEGINNING AND 3 FOR END). AS SUCH, IT DOESN'T MATTER WHICH ONE WE CHOOSE. HOWEVER, IT DOES MATTER // !!!      WHETHER WE CALL genInitState TO GENERATE THE STATE BELOW THE NODE (START OF THE FUNCTION) OR ABOVE IT                         // !!! (END OF THE FUNCTION). THE CAPABILITY TO DIFFERENTIATE THE TWO CASES NEEDS TO BE ADDED TO genInitState // !!!      AND WHEN IT IS, WE'LL NEED TO CALL IT INDEPENDENTLY FOR cfgForEnd AND cfgForBeginning AND ALSO TO MAKE // !!!      TO SET THE LATTICES ABOVE THE ANALYSIS

TODO: begin and end func definition issue is mentioned inside of this

simplest form:unstructured
Simplest form: No transfer action at call sites at all class UnstructuredPassInterDataflow : virtual public InterProceduralDataflow {       public: UnstructuredPassInterDataflow(IntraProceduralDataflow* intraDataflowAnalysis) : InterProceduralAnalysis((IntraProceduralAnalysis*)intraDataflowAnalysis), InterProceduralDataflow(intraDataflowAnalysis) {}       // the transfer function that is applied to SgFunctionCallExp nodes to perform the appropriate state transfers // fw - =true if this is a forward analysis and =false if this is a backward analysis // n - the dataflow node that is being processed // state - the NodeState object that describes the dataflow state immediately before (if fw=true) or immediately after //        (if fw=false) the SgFunctionCallExp node, as established by earlier analysis passes // dfInfo - the Lattices that this transfer function operates on. The function propagates them //         to the calling function and overwrites them with the dataflow result of calling this function. // retState - Pointer reference to a Lattice* vector that will be assigned to point to the lattices of       //          the function call's return value. The callee may not modify these lattices. // Returns true if any of the input lattices changed as a result of the transfer function and //   false otherwise. bool transfer(const Function& func, const DataflowNode& n, NodeState& state,                      const std::vector<Lattice*>& dfInfo, std::vector<Lattice*>** retState, bool fw) {                return false; }       void runAnalysis; };

// simply call intra-procedural analysis on each function one by one. void UnstructuredPassInterDataflow::runAnalysis {       set<FunctionState*> allFuncs = FunctionState::getAllDefinedFuncs; // iterate over all functions with bodies for(set<FunctionState*>::iterator it=allFuncs.begin; it!=allFuncs.end; it++) {               const Function& func = (*it)->func; FunctionState* fState = FunctionState::getDefinedFuncState(func); // Call the current intra-procedural dataflow as if it were a generic analysi intraAnalysis->runAnalysis(func, &(fState->state)); } }

ContextInsensitiveInterProceduralDataflow
TODO

Call directly
Direct call: Runs the intra-procedural analysis on the given function and returns true if the function's NodeState gets modified as a result and false otherwise state - the function's NodeState bool IntraProceduralDataflow::runAnalysis(const Function& func, NodeState* state) {  // Each function is analyzed as if it were called directly by the language's runtime, ignoring // the application's actual call graph bool analyzeDueToCallers = true; // We ignore the application's call graph, so it doesn't matter whether this function calls other functions std::set<Function> calleesUpdated; return runAnalysis(func, state, analyzeDueToCallers, calleesUpdated); }
 * bool IntraUniDirectionalDataflow::runAnalysis(const Function& func, NodeState* state, bool analyzeDueToCallers, std::set<Function> calleesUpdated);
 * direct call with a simpler parameter list : not feasible, all intra procedural analysis has to have an inter procedural analysis set interally!

Through inter-procedural analysis
Invoke a simple intra-procedural analysis through the unstructured pass inter-procedural data flow class int main { SgProject* project = frontend(argc,argv); initAnalysis(project);

// prepare debugging support Dbg::init("Live dead variable analysis Test", ".", "index.html"); liveDeadAnalysisDebugLevel = 1; analysisDebugLevel = 1;

// basis analysis LiveDeadVarsAnalysis ldva(project); // wrap it inside the unstructured inter-procedural data flow UnstructuredPassInterDataflow ciipd_ldva(&ldva); ciipd_ldva.runAnalysis; .....

}

Retrieve lattices
Sample code: // Initialize vars to hold all the variables and expressions that are live at DataflowNode n //void getAllLiveVarsAt(LiveDeadVarsAnalysis* ldva, const DataflowNode& n, const NodeState& state, set<varID>& vars, string indent) void getAllLiveVarsAt(LiveDeadVarsAnalysis* ldva, const NodeState& state, set<varID>& vars, string indent) {       LiveVarsLattice* liveLAbove = dynamic_cast<LiveVarsLattice*>(*(state.getLatticeAbove(ldva).begin)); LiveVarsLattice* liveLBelow = dynamic_cast<LiveVarsLattice*>(*(state.getLatticeBelow(ldva).begin));

// The set of live vars AT this node is the union of vars that are live above it and below it       for(set<varID>::iterator var=liveLAbove->liveVars.begin; var!=liveLAbove->liveVars.end; var++) vars.insert(*var); for(set<varID>::iterator var=liveLBelow->liveVars.begin; var!=liveLBelow->liveVars.end; var++) vars.insert(*var); }

Testing
It is essential to have a way to test the analysis results are correct.

We currently use a primitive way to test the correctness of analysis: comparing pragma and lattice string output

Two examples translators testing analysis correctness(comparing pragma and lattice string output):
 * https://github.com/rose-compiler/rose/blob/master/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests/liveDeadVarAnalysisTest.C
 * https://github.com/rose-compiler/rose/blob/master/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests/constantPropagationTest.C

An example test input file for liveness analysis's correctness int bar(int flag) {
 * https://github.com/rose-compiler/rose/blob/master/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests/test5.C

int a =1,b,c; if (flag == 0) // flag is only read here, not written! c = a;  else c = b;  return c; }
 * 1) pragma rose [LiveVarsLattice: liveVars=[flag, a, b]]

Trace the analysis
Turn it on    liveDeadAnalysisDebugLevel = 1; analysisDebugLevel = 1;

// find code with if(analysisDebugLevel>=1) ...

check the web page dump using a browser firefox index.html

How to read the trace file: start from the beginning: information is ordered based on the CFG nodes visited. The order could be forward or backward order. Check if the order is correct first, then for each node visited ==================================   Copying incoming Lattice 0: [LiveVarsLattice: liveVars=[b]] To outgoing Lattice 0: [LiveVarsLattice: liveVars=[]] ==================================   Transferring the outgoing  Lattice ...     liveLat=[LiveVarsLattice: liveVars=[b]] Dead Expression usedVars=<> assignedVars=<> assignedExprs=<> #usedVars=0 #assignedExprs=0 Transferred: outgoing Lattice 0: [LiveVarsLattice: liveVars=[b]] transferred, modified=0 ==================================  Propagating/Merging the outgoing  Lattice to all descendant nodes ...     Descendants (1): Descendant: 0x2b9e8c47f010[SgIfStmt | if(flag == 0) c = a;else c = b;]

Propagating to Next Node: 0x2b9e8c47f010[SgIfStmt | if(flag == 0) c = a;else c = b;] Cur node: Lattice 0: [LiveVarsLattice: liveVars=[b]] Next node: Lattice 0: [LiveVarsLattice: liveVars=[a]] Next node's in-data modified. Adding... Propagated: Lattice 0: [LiveVarsLattice: liveVars=[a, b]] propagated/merged, modified=1 ^^^^^^^^^^^^^^^^^^

A real example: if (flag) c = a; else c = b;  // liveness analysis, a, b are live in two branches, they are propagated backward to if-stmt

--   Descendants (1):  // from c =a back to if-stmt (next node) Descendant: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;]

Propagating to Next Node: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;] Cur node: Lattice 0: [LiveVarsLattice: liveVars=[a]]  // current node's lattice Next node: Lattice 0: [LiveVarsLattice: liveVars=[]]  // next node's lattice before propagation Next node's in-data modified. Adding... Propagated: Lattice 0: [LiveVarsLattice: liveVars=[a]] // propagate a into if-stmt's lattice propagated, modified=1 ^^^^^^^^^^^^^^^^^^

--   Descendants (1):  // from c = b --> if-stmt Descendant: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;]

Propagating to Next Node: 0x2ac8bb95c010[SgIfStmt | if(flag == 0) c = a;else c = b;] Cur node: Lattice 0: [LiveVarsLattice: liveVars=[b]] Next node: Lattice 0: [LiveVarsLattice: liveVars=[a]] Next node's in-data modified. Adding... Propagated: Lattice 0: [LiveVarsLattice: liveVars=[a, b]] // now both a and b are propagated/ merged propagated, modified=1 ^^^^^^^^^^^^^^^^^^

Dump cfg dot graph with lattices
A class analysisStatesToDot is provided generate a CFG dot graph with lattices information. //AnalysisDebuggingUtils.C

class analysisStatesToDOT : public UnstructuredPassIntraAnalysis {   private: //   LiveDeadVarsAnalysis* lda; // reference to the source analysis Analysis* lda; // reference to the source analysis void printEdge(const DataflowEdge& e); // print data flow edge void printNode(const DataflowNode& n, std::string state_string); // print date flow node void visit(const Function& func, const DataflowNode& n, NodeState& state); // visitor function public: std::ostream* ostr; analysisStatesToDOT (Analysis* l): lda(l){ }; };

namespace Dbg { //.... void dotGraphGenerator (::Analysis *a) {   ::analysisStatesToDOT eas(a); IntraAnalysisResultsToDotFiles upia_eas(eas); upia_eas.runAnalysis; }

} // namespace Dbg

Example use
// Liao, 12/6/2011
 * 1) include "rose.h"


 * 1) include
 * 2) include
 * 3) include
 * 4) include
 * 5) include
 * 6) include

using namespace std;

// TODO group them into one header
 * 1) include "genericDataflowCommon.h"
 * 2) include "VirtualCFGIterator.h"
 * 3) include "cfgUtils.h"
 * 4) include "CallGraphTraverse.h"
 * 5) include "analysisCommon.h"
 * 6) include "analysis.h"
 * 7) include "dataflow.h"
 * 8) include "latticeFull.h"
 * 9) include "printAnalysisStates.h"
 * 10) include "liveDeadVarAnalysis.h"

int numFails = 0, numPass = 0;

//--- int main( int argc, char * argv[] ) {

SgProject* project = frontend(argc,argv);

initAnalysis(project);

// generating index.html for tracing the analysis Dbg::init("Live dead variable analysis Test", ".", "index.html"); liveDeadAnalysisDebugLevel = 1; analysisDebugLevel = 1;

LiveDeadVarsAnalysis ldva(project); UnstructuredPassInterDataflow ciipd_ldva(&ldva); ciipd_ldva.runAnalysis; // Output the dot graph ********************* Dbg::dotGraphGenerator (&ldva); return 0; }

TODO

 * Hard to use the generated lattices since many temporary expression objects are generated in lattices. But often users do not care about them (constant propagation, pointer analysis)
 * to see the problem: go to [build64/tests/roseTests/programAnalysisTests/generalDataFlowAnalysisTests]
 * run make check
 * see the dot graph dump of an analysis : run.sh test_ptr4.C_main_0x2b41e651c038_cfg.dot