ROSE Compiler Framework/Abstract Syntax Tree

The main intermediate representation of ROSE is its abstract syntax tree (AST). To use a programming language, you have to get familiar with the language syntax, semantics, etc. To use ROSE, you have to get familiar with its internal representation of an input code.

The best way to know AST is to visualize it using simplest code samples.

Sanity Check
We provide a set of sanity check for AST. We use them to make sure the AST is consistent. It is also highly recommended that ROSE developers add a sanity check after their AST transformation is done. This has a higher standard than just correctly unparsed code to compilable code. It is common for an AST to unparse correctly but then fail on the sanity check.

The recommend sanity check is
 * AstTests::runAllTests(project); from src/midend/astDiagnostics. Internally, it calls the following checks:
 * TestAstForProperlyMangledNames
 * TestAstCompilerGeneratedNodes
 * AstTextAttributesHandling
 * AstCycleTest
 * TestAstTemplateProperties
 * TestAstForProperlySetDefiningAndNondefiningDeclarations
 * TestAstSymbolTables
 * TestAstAccessToDeclarations
 * TestExpressionTypes
 * TestMangledNames::test
 * TestParentPointersInMemoryPool::test
 * TestChildPointersInMemoryPool::test
 * TestMappingOfDeclarationsInMemoryPoolToSymbols::test
 * TestLValueExpressions
 * TestMultiFileConsistancy::test //2009
 * TestAstAccessToDeclarations::test(*i); // named type test

There are some other functions floating around. But they should be merged into AstTests::runAllTests(project)
 * FixSgProject(*project); //in Qing's AST interface
 * Utility::sanityCheck(SgProject* )
 * Utility::consistencyCheck(SgProject*) // SgFile*

Text Output of an AST
Just call: SgNode::unparseToString. You can call it from any SgLocatedNode within the AST to dump partial AST's text format.

print AST as horizontal tree
SageInterface functions

//! Pretty print AST horizontally, output to std output void SageInterface::printAST (SgNode* node);

//! Pretty print AST horizontally, output to a specified text file. void SageInterface::printAST2TextFile (SgNode* node, const char* filename); A translator (textASTGenerator) is also available, with its source code under exampleTranslators/defaultTranslator.

Example use inside of gdb:
 * to print a portion of AST to the screen
 * to print a portion of AST into a text file

(gdb) up
 * 1) 7 0x00007ffff418ab5d in Unparse_ExprStmt::unparseExprStmt (this=0x1a1bf950, stmt=0x7fffda63ce30, info=...) at ../../../sourcetree/src/backend/unparser/CxxCodeGeneration/unparseCxx_statements.C:9889

(gdb) p SageInterface::printAST(stmt) └──@0x7fffda63ce30 SgExprStatement transformation 0:0 └──@0x7fffd8488790 SgFunctionCallExp transformation 0:0 ├──@0x7fffe6211910 SgMemberFunctionRefExp transformation 0:0 └──@0x7fffd7f2c370 SgExprListExp transformation 0:0 └──@0x7fffd8488720 SgFunctionCallExp transformation 0:0 ├──@0x7fffe6211988 SgMemberFunctionRefExp transformation 0:0 └──@0x7fffd7f2c3d8 SgExprListExp transformation 0:0 $2 = void

(gdb) up 10 (gdb) p SageInterface::printAST2TextFile(file,"test.txt")
 * 1) 48 0x00007ffff40dce69 in Unparser::unparseFile (this=0x7fffffff8c60, file=0x7fffeb786010, info=..., unparseScope=0x0) at ../../../sourcetree/src/backend/unparser/unparser.C:945

Example command line use:

textASTGenerator -c test_qualifiedName.cpp

cat test_qualifiedName.cpp.AST.txt

└──@0x7fe9f1916010 SgProject └──@0xb45730 SgFileList └──@0x7fe9f17be010 SgSourceFile ├──@0x7fe9fdf19120 SgGlobal test_qualifiedName.cpp 0:0 │  ├──@0x7fe9f159a010 SgTypedefDeclaration rose_edg_required_macros_and_functions.h 0:0 │  │   └── NULL │  ├──@0x7fe9f159a390 SgTypedefDeclaration rose_edg_required_macros_and_functions.h 0:0 │  │   └── NULL │  ├──@0x7fe9f0f59010 SgFunctionDeclaration rose_edg_required_macros_and_functions.h 0:0 "::feclearexcept" │  │   ├──@0x7fe9f1391010 SgFunctionParameterList rose_edg_required_macros_and_functions.h 0:0 │  │   │   └──@0x7fe9f1258010 SgInitializedName rose_edg_required_macros_and_functions.h 0:0 "::__excepts" │  │   │       └── NULL │  │   ├── NULL │  │   └── NULL │  ├──@0x7fe9f0f59540 SgFunctionDeclaration rose_edg_required_macros_and_functions.h 0:0 "::fegetexceptflag" │  │   ├──@0x7fe9f1391630 SgFunctionParameterList rose_edg_required_macros_and_functions.h 0:0 │  │   │   ├──@0x7fe9f1258420 SgInitializedName rose_edg_required_macros_and_functions.h 0:0 "::__flagp" │  │   │   │   └── NULL │  │   │   └──@0x7fe9f1258628 SgInitializedName rose_edg_required_macros_and_functions.h 0:0 "::__excepts" │  │   │       └── NULL │  │   ├── NULL │  │   └── NULL

...

│  └──@0x7fe9eff218c0 SgFunctionDeclaration test_qualifiedName.cpp 14:1 "::foo" │      ├──@0x7fe9ef5e0320 SgFunctionParameterList test_qualifiedName.cpp 14:1 │      │   ├──@0x7fe9ef495278 SgInitializedName test_qualifiedName.cpp 14:13 "x" │      │   │   └── NULL │      │   └──@0x7fe9ef495480 SgInitializedName test_qualifiedName.cpp 14:20 "y" │      │       └── NULL │      ├── NULL │      └──@0x7fe9ee8f3010 SgFunctionDefinition test_qualifiedName.cpp 15:1 │          └──@0x7fe9ee988010 SgBasicBlock test_qualifiedName.cpp 15:1 │              ├──@0x7fe9eee1ba90 SgVariableDeclaration test_qualifiedName.cpp 16:3 │              │   ├── NULL │              │   └──@0x7fe9ef495688 SgInitializedName test_qualifiedName.cpp 16:3 "z" │              │       └── NULL │              ├──@0x7fe9ee7ad010 SgExprStatement test_qualifiedName.cpp 17:3 │              │   └──@0x7fe9ee7dc010 SgAssignOp test_qualifiedName.cpp 17:5 │              │       ├──@0x7fe9ee8c0010 SgVarRefExp test_qualifiedName.cpp 17:3 │              │       └──@0x7fe9ee813010 SgAddOp test_qualifiedName.cpp 17:9 │              │           ├──@0x7fe9ee8c0078 SgVarRefExp test_qualifiedName.cpp 17:7 │              │           └──@0x7fe9ee84a010 SgMultiplyOp test_qualifiedName.cpp 17:12 │              │               ├──@0x7fe9ee8c00e0 SgVarRefExp test_qualifiedName.cpp 17:11 │              │               └──@0x7fe9ee881010 SgIntVal test_qualifiedName.cpp 17:13 │              └──@0x7fe9ee77e010 SgReturnStmt test_qualifiedName.cpp 18:3 │                  └──@0x7fe9ee8c0148 SgVarRefExp test_qualifiedName.cpp 18:10 ├── NULL ├── NULL └── NULL

AST Iterator
1) The iterator class: The iterator follows the STL iterator pattern and is implemented as pre-order traversal and maintains its own stack. The iterator performs the exact same traversal as the traversal classes in ROSE (it is using the same underlying information):

SgNode* node= .... // any subtree
 * 1) include "RoseAst.h"

RoseAst ast(node);

for(RoseAst::iterator i=ast.begin;i!=ast.end;++i) { cout<<"We are here:"<<(*i)->class_name<<endl; }

Some more features:
 * By default it is not traversing null pointers (you won't see them). However, if you want to see&traverse also all the null pointers, you can use the begin function with: ast.begin.withNullValues
 * It also has a feature to exclude subtrees from traversing during the traversal: You can simply call on the *iterator*:
 * i.skipChildrenOnForward; ++i; // skips the children of current node and goes to the next node that follows in the traversal after all those children

Relevant sourcefiles
 * https://github.com/rose-compiler/rose-develop/blob/master/src/midend/astMatching/RoseAst.h
 * https://github.com/rose-compiler/rose-develop/blob/master/src/midend/astMatching/RoseAst.C

SgType
Some useful member functions typedefs, SgTypedefType reference, SgReferenceType pointers, SgPointerType arrays, SgArrayType modifiers SgModifierType Returns hidden type beneath layers of typedefs, pointers, references, modifiers, array representation, etc.
 * get_base_type :member function on some IR nodes derived from SgType and returns the non-recursively striped (immediate) type under the typedefs, reference, pointers, arrays, modifiers, etc.
 * findBaseType recursively strip away all
 * SgType * stripType (unsigned char bit_array=STRIP_MODIFIER_TYPE|STRIP_REFERENCE_TYPE|STRIP_POINTER_TYPE|STRIP_ARRAY_TYPE|STRIP_TYPEDEF_TYPE) const
 * SgType * stripTypedefsAndModifiers const

File location information
All AST nodes with file location information derive from SgLocatedNode, which has start and end Sg_File_Info to indicate begin and end location information.

You can obtain and printout the pair of location information by calling locatedNode->get_startOfConstruct->display ;

locatedNode->get_endOfConstruct->display ;

// get beginning info only locatedNode->get_file_info->display ;

The output for display may look like Inside of Sg_File_Info::display(debug.......) isTransformation                     = false isCompilerGenerated                  = true (no position information) isOutputInCodeGeneration             = false isShared                             = false isFrontendSpecific                   = true (part of ROSE support for gnu compatability) isSourcePositionUnavailableInFrontend = false isCommentOrDirective                 = false isToken                              = false file_id = 2 filename = /home/liao6/daily-test-rose/upcwork/install/include/gcc_HEADERS/rose_edg_required_macros_and_functions.h     line     = 167  column   = 1

.... // transformation generated, will be outputted by the unparser upcr_pshared_ptr_t gsj; Inside of Sg_File_Info::display(debug.......) isTransformation                     = true (part of a transformation) isCompilerGenerated                  = false isOutputInCodeGeneration             = true (output in code generator) isShared                             = false isFrontendSpecific                   = false isSourcePositionUnavailableInFrontend = false isCommentOrDirective                 = false isToken                              = false file_id = -3 filename = transformation line    = 0  column   = 0

As you can see, there are AST nodes generated by ROSE's frontends or by a translator. A transformation generated located node may not have line or column numbers.

You can get file name, line, column numbers SgLocatedNode* node = .... ;

Sg_File_Info* info_start = node->get_startOfConstruct ; size_t a_start = (size_t)info_start->get_line ;

string filename = node->get_file_info->get_filename;

Sg_File_Info* info_end = node->get_endOfConstruct ; size_t a_end = (info_end == NULL) ? a_start : info_end->get_line ;

Preprocessing Information
See more at ROSE Compiler Framework/PreprocessingInfo

In addition to nodes and edges, ROSE AST may have attributes in addition to nodes and edges that are attached for preprocessing information like #include or #if .. #else. They are attached before, after, or within a nearby AST node (only the one with source location information.)

An example translator will traverse the input code's AST and dump information which may include preprocessing information.
 * https://github.com/rose-compiler/rose/blob/master/exampleTranslators/defaultTranslator/preprocessingInfoDumper.C

For example

exampleTranslators/defaultTranslator/preprocessingInfoDumper -c main.cxx --- Found an IR node with preprocessing Info attached: (memory address: 0x2b7e1852c7d0 Sage type: SgFunctionDeclaration) in file /export/tmp.liao6/workspace/userSupport/main.cxx (line 3 column 1) -PreprocessingInfo #0 --- : classification = CpreprocessorIncludeDeclaration: String format = #include "all_headers.h"

relative position is = before

Source: http://www.rosecompiler.org/ROSE_Tutorial/ROSE-Tutorial.pdf (Chapter 29 - Handling Comments, Preprocessor Directives, And Adding Arbitrary Text to Generated Code)

AST matching
ROSE Compiler Framework/AST Matching

AST Construction
SageBuilder and SageInterface namespaces provide functions to create ASTs and manipulate them. Doxygen docs
 * http://rosecompiler.org/ROSE_HTML_Reference/namespaceSageBuilder.html
 * http://rosecompiler.org/ROSE_HTML_Reference/namespaceSageInterface.html