Rule Mining Best Practices

Below is a general approach towards the semantic capture of technical rules and their abstraction into business rules.

Preparatory Steps
A series of preparatory help to avoid protracted rules mining exercises, ensure that rules are valuable to the business, and speed the process of creating ‘business’ and not simply ‘technical’ rules. These steps are recommended to build the right organizational framework and to align rule mining with business priorities.

1. Project Goals and Desired Output

It is important to define the goals and outputs of the rules mining activity. This will vary depending on the business priorities of organizations. For some, the goal may be to address inflexibility or high downtime in a highly valuable and customized application. This may lead to the decision to move to a SOA -enabled packaged application.

Depending on the goal there will be two primary outcomes that can be derived from a rules mining activity:

Documentation In some cases, the goal may be to develop business-centric documentation of program code segments. This documentation, valuable as reference for other analysts and subject matter experts, may also be used to decorate high-level design models for the modernized, to-be applications. It would not normally result in executable business rules or inputs into development environments.

Business Rules In other cases, the exact semantic behavior of edits and calculations in applications needs to be captured as true business rules. The captured semantics will be used as a basis for, and perhaps even directly fed into, target development environments. The outcome of this step will drive decisions regarding technology adoption, resource planning, and the best methods to apply for either type of rule mining.

2. Rule Mining Technology Selection

Largely Manual Approach A low-cost, low-value application of technology for rule mining is the documentation of rules in word processing or spreadsheet applications. This approach will not help address the most resource-intensive portions of rule mining – like analyzing application behavior and locating rules within source code segments.

Runtime Simulation Runtime simulation technologies that facilitate the capture of user behavior into processes and rules can help from a behavioral aspect. Their main drawback is that they are limited to those test scenarios actually performed within a given time frame – potentially missing out on critical exceptions not usually performed.

Scanning Tools Text scanning tools that locate patterns within sources can accelerate the capture of rules based upon fixed patterns within source code (e.g. showing all moves to a variable). These tools typically generate a large number of false positives and tend to lead to technical rather than business rules.

Repository-Based Rule Mining Tools Repository-based technologies that recompile the sources and build a syntactic parse tree can offer the highest value in rule mining. The most advanced automatically detect rules using semantic analysis, e.g. each statement upstream from a point of interest variable and which potentially impacts its value, is mined as a candidate rule.

Rule mining tools may also offer rule management and auditing capabilities, facilitating project workflow and rule maintenance activities by reviewers. They are also able to retain rule traceability to originating sources even when those undergo change (as they often do in a live environment).

Considering the above, the tradeoff of a low-cost approach to rule mining will be in higher manual effort and lower quality of results. This may be acceptable in one-off, small-scale documentation scenarios, but less likely to be so in enterprise level modernization efforts.

3. Time and Resources Allocation

Project goals and choice of technology will impact resource allocation. All too often, expectations from the business side are to receive a precisely modeled set of business rules derived from the application semantics, without realizing the complexity of such an effort. On the IT side, practitioners without experience in rules mining or familiarity with the application subject matter are ill-equipped to offer reliable estimates.

An iterative approach toward time and resource estimation may be adopted. Mining rules for a representative sub-set of the application over the first few weeks provides insights into the methods that work best for selected applications, as well as a yardstick by which the overall effort can be estimated.

4. Business Processes Mapping and Alignment The logic mined is important because of its business context. After all, logic is a subset of an overarching business process. Mined rules will make sense only when placed into context within the associated process. For example, assigning a prospective customer to an income bracket based upon her zip code might have different significance in credit approval and marketing campaign processes (witness insurance companies advertising that they will only look at past behavior and not credit ratings to set premiums).

Further, viewing applications in a business process context is important in order to identify priorities for rules mining activities. An enterprise commonly views itself in terms of its processes – registering a new customer, underwriting a policy, receiving payment, registering a claim, depositing funds into an account and so on. Some of these processes are of critical importance to the organization, while others are simply commodity functionality. Some processes may be meeting service level agreements set by line of business executives, and others not. Where modernization activities will focus will vary based on this calculus.

Once the business processes have been modeled, application elements that support the chosen processes are identified. Historically, these applications were organized in silos, making the high-level match a straightforward step. At a more granular level, however, a monolithic order management application may handle customer enrollment, credit approval, order entry, and order fulfillment. If an organization is only interested in mining rules for the customer enrollment and order entry components, a mapping exercise between processes and their supporting application portfolio elements will be beneficial.

With repository-based software, application objects are registered and syntactic and semantic relationships within them are automatically captured.

Cases of overlap are noted, where a program or data store serves multiple business processes.

Example – Customer Orders

In Figure 1 below, the Customer Master, Order Master, and Inventory data stores are within the scope for rule mining despite the fact that they support additional processes outside of the scope. Similarly, the Customer Handling program is monolithic and includes logic for Credit Approval, outside of the scope of this effort.

Figure 1: Business Process Mapping to Application Objects



5. Application Analysis

In the previous steps, an application has been inventoried and it has been understood, at a high level, where the required business processes of interest reside within application artifacts. The next task will be to decompose the application into its constituent logical elements. A key factor continues to be contextualization. Elements of the application that are not relevant to organizational priorities are excluded. This scoping via context allows us to also see the ‘boundaries’ of rules and their associated impacts.

Application Decomposition

In this step, the application is further decomposed into its detailed components. The goal is to have a sufficiently detailed collection of artifacts to serve as input to the rule mining step.

Here too, technology can help. Repository-based software can create a parse tree of detailed application objects and their relationships. This information is then presented in multiple graphical and textual views, synchronized with each other to facilitate context-sensitive analysis.

For example, a context view will display all data field declarations, procedures, and procedure calls within a program in a compressed and outlined mode. A detailed source code view will display code segments corresponding to the context view, allowing for quick navigation through the program via the context view. Traversal through the context view will enable a user to gain quick insights into a program's structure and complexity.

Other views available at the detailed program level include diagrammatic control flow between paragraphs, logic flowcharts within paragraphs, execution paths and runtime simulators for chosen conditional outcomes. Such tools are used to gain detailed insights into an application prior to actually starting the rule mining phase.

Without the benefit of automated parsing tools, value can still be gained by conducting a manual inspection and walkthrough of application artifacts.

Identification of Exclusions

In the process of reviewing and analyzing an application, elements not to be included in the scope of rule mining, for functional and technical reasons, are marked. These may include standard utilities, reports, system routines and out-of-scope business processes. In the example from Figure 1, this would include the artifacts related to Credit Approval and Order Fulfillment, out of scope due to their nature as commodity, standardized business processes.

The identification of exclusions illustrates a benefit to be derived from the up-front business contextualization steps described above. If they had not been conducted, a "broad sweep" approach would have resulted in a higher investment at the rule mining and SME review stages, where rules from the commodity business processes would first have been mined and then later discarded as irrelevant.

6. Glossary Creation

A major challenge with mining rules from applications is that it can be difficult to navigate the various variables and naming conventions within. These conventions have often a tenuous link to business terminology, and can make understanding the logic from a business perspective difficult.

A best practice is to refine these technical terms to create more a more business-centric view. This can be achieved through glossary of application objects and related business terms. Objects could be data fields, paragraphs, programs, data sources, and other application objects of interest.

Sources of information for a glossary of terms can be business documentation, data dictionaries, database schemas, user notifications, and even source code comments.

Automated rule mining tools offer a facility to propagate values for repeating patterns (commonly called ‘tokens’) within your application. For instance, the token 'ACCT-' may be replaced everywhere by 'Account-'. A tool would then use the glossary business names to replace technical terminology in the automated construction of candidate rules.

7. Rules Composition and Hierarchy Definition

The desired rules format is established in advance. The same rule to set an order discount may take alternate forms, such as

(i) Declarative form: "Each applicant who is a senior AAA member from California receives a 5% discount."

(ii) If-Then-Else form: If an applicant is a senior, then if she is an AAA member, then if she resides in California, assign a 5% discount.

(iii) As an entry in a decision table:

Figure 2: Entry in Decision Table



It can be useful to attach to a mined rule additional informational and workflow attributes, such as:


 * Reviewer text annotations
 * Rule type (I/O, calculation, validation, security)
 * Audit status (approved, not approved)
 * Workflow status (extracted, working, accepted, rejected)
 * Transition (valid, requires modification, duplicate, complete)
 * Reviewer Identity
 * Program derived from
 * Code segment location (start, end)
 * Code segment text
 * Input and output data elements

A rule will also be placed within a hierarchy. All rules representing a decision or executing under a given set of conditions may be grouped into a Rule Set. Rule Sets will be grouped into higher level activity nodes reflecting the business processes they currently participate in.

If you are planning to populate an executable Business Rules Management System (BRMS), you will want to define a schema that is easily transferable into the specific target environment chosen. If the target environment is not yet known, refer to available business rule standards.

8. Rule Mining Workflow

Enterprise rule mining is usually a multi-step process involving practitioners with disparate skill sets – including consultants, developers, architects, analysts, and subject matter experts. Often key personnel will be distracted by other projects and it is therefore crucial that a common workflow be defined and documented.

Following the guidelines provided further in this document, a high-level workflow may appear as:

Figure 3: Sample Rule Mining Workflow



Each step should be defined in detail, following an adopted methodology, project scope and constraints and rule mining technology usage. There may be multiple iterations of the first few steps until the rules are in an approved, final format.

Rule Mining Steps
9. Mining of Candidate Rules

At this point rules are mined from the application artifacts mapped to the scope of the business processes identified in previous steps. Rule mining tools help you assure that excluded artifacts are not included in scope by enabling the organization of an application into sub-groupings. Rules will be mined for a sub-grouping and not for the entire application.

The specific rule mining approach taken is primarily driven by application patterns and the desired output.

Top-Down Approach

A top-down, or process-oriented approach starts from an examination of the user interface in an online application, or from the job flow in a batch application.

In an online application, a transaction may be invoked by a user selecting a menu option or entering a value to the screen. The fields that define the message or event that is sent from the screen to the interfacing application are identified. Each field in the triggering message may be considered a seed field for rule mining.

Using a seed field as a starting point, all of the downstream data impacts to the field including all conditional permutations are documented. Each data transformation (move into another field or calculation), represents a candidate business rule to be captured.

Rule mining software tools assists in this task by visualizing a data impact path forward for each seed field to each point where it is either populated by new values or used as input to other fields via comparisons, value propagation and calculations. At each such point, the tool can be used to document the underlying business rules. Automated rule detection methods can also be applied to capture each screen field edit as a candidate rule.

In a batch application, the concept is similar. Part of a job flow, e.g. a JCL or group thereof that realizes a business process is identified, and all rules within individual programs relevant to that process are mined.

Figure 4: Top-Down Rule Mining



In Figure 4 above, note the format of the resulting "Derived Candidate Rules". Automatically detected from a Cobol program, they resemble its constructs, with variable names replaced by the Glossary definitions. These will later undergo review and transformation to a more businesslike form. While this example is demonstrated for a Cobol application, advanced mining tools may apply to a broad array of languages from PL/I and Natural to Basic Visual Basic and Java.

Bottom-Up Approach

A bottom-up, or data-oriented approach starts from an examination of system outputs – data sent to files (both batch and online), screens and output messages (online only).

Following this approach, rules are captured by starting from an interesting data point and identifying all logic impacting that point. For example, an Order Discount field is impacted by discounts calculated upstream from it, depending on the customer's location of residence.

Figure 5: Bottom-Up Rule Mining



Rule mining technologies are particularly well suited to this approach. Through visual inspection or a repository query, data outputs of interest can be quickly indentified. Then, automated rule detection routines are able to capture a candidate rule for each statement that impacts the point of interest. Because of the pre-organization into contextualized sub-groupings mentioned above, the search results will be constrained by the subset of business processes deemed relevant for rule mining.

Inspection of relevant DBMS tables may also produce rules embedded in keys and any data rules for referential integrity and value constraints. Once all data points of interest have been covered, all application logic of interest oriented toward existing outputs has essentially been mined.

Hybrid Approach

A hybrid approach combines the two approaches described above:
 * The first step is a top-down oriented capture of the relevant transactions;
 * For each transaction, bottom-up rule mining is performed, including only data outputs that have not yet been already mined for another transaction.

The benefit of this approach is to extend the coverage of rule mining while avoiding repetition.

Relating to the examples shown in Figures 4 and 5 above, following a strictly top-down approach resulted in repetitive efforts for the Quantity and Price fields since they both traversed identical downstream data impacts. Coverage was also partial since not all of the rules for Customer Discount were discovered.

Let’s consider an extended case involving both Order Entry and Proposal Issuance processes. Adopting an exclusive bottom-up approach would have also resulted in repetition, mining rules for upstream data impacts that "hit" multiple outputs (e.g. customer discount rules). Using the hybrid approach, we would first mine rules from all outputs of the order transaction, then only outputs of the proposal transaction particular to it.

Figure 6: Hybrid Rule Mining



10. Candidate Rules Verification

At this point, after mining candidate rules from your application, verification and correction is a necessary step to ensure the correctness and completeness of the rules.

The candidate rules are examined for:

Accuracy Does each rule correctly reflect the underlying application behavior? If automated rule detection technology has been used, a rule at the point of interest (seed field) will be preceded by rules upstream from it, possibly with triggers, control conditions, and automatic rule set groupings. Each one of them (or a chosen subset) is reviewed for accuracy and corrections are made where needed, until the results are deemed satisfactory.

Redundancy Does a rule or rule set appear twice for the same application process? This can occur when rules are mined separately for two separate outputs that share upstream functionality. Or it can be a result of simple oversight like multiple team members inadvertently mining rules from the same code base. A rule attribute is used to mark duplication.

Another form of redundancy occurs when semantically identical rules were mined separately and with different names from different processes (e.g. Order Detail and Proposal). This will be dealt with in the next step, when you transform candidate rules to business rules format.

Completeness Beyond predefined exceptions, has all of the application functionality been covered? A rule coverage report, matching mined rule sources to overall sources, can provide the answer.

Relevance Can each mined rule be considered a candidate business rule? Although this is not yet the SME review step, there may be certain constructs that, upon inspection, are clearly irrelevant and should not be included in the scope of rules for review. Security verification rules, housekeeping routines and out-of-scope operations may all fall into this category. Indicate relevance on one of the rule attributes.

11. Transformation of Candidate Rules to Business Rules Format

In the previous steps, candidate rules have been mined and reviewed, reflecting legacy application behavior. These rules closely follow the application's procedural flow and operations.

A transformational step is now required, to convert candidate rules to actual business rules ready for review. This step is conducted either by application experts, rule architects, or subject matter experts. After review and conversion, the business rules captured reflect the current, as-is state to serve as a baseline or comparison to the target environment.

Reformatting to Business Rule Notation

If the candidate rules were constructed manually, they may already be in the chosen business rules format. In other cases, they may have been captured in a technical format (like cut and paste from source code) and will require some modification and regrouping.

If an automated rule detection tool was used, the resulting candidate rules may somewhat resemble business rules, by using the glossary definitions to place business names within rule names, data elements and controlling conditions. However, even after the rule verification step, most of the approved candidate rules will need to be adapted to conform to a chosen business rule notation.

Figure 7: Business Rule Transformation



Fact Modeling and Rule Normalization Due to their procedural nature, legacy applications tend to lock business logic into process-specific silos. However, true business rules are independent of process and should be maintained as such.

In our example, rules for the Order Detail Entry and Proposal Entry events have been separately mined and placed in Rule sets. Are they all unique? Upon further examination, most of the logic in them is identical by design. Analyzing the results from a business perspective, there is commonality between portions of any customer document – whether Order or Proposal.

From a tooling perspective, at this point it may make sense to switch over to a Business Rule Management System (BRMS), importing the mined rules from the rule mining tool as described in the Integration section below.

Using a BRMS or a visual modeling tool, a Fact Model reflecting the significant business entities and their interrelationships discovered in your existing applications is constructed. These will link to the mined business rules and serve as a baseline for the to-be rules model.

In our example, part of the Fact Model would be:



Once this is done, the business rules are normalized to represent the desired business level semantics:



…whereby the Customer Document Handling rules apply to both Orders and Proposals.

Grouping and Sequencing

At this point, the generated rule grouping and sequencing are considered. One point of attention is the triggering relationships between rules and other rules and rule sets. Since candidate rules are often derived from a 3rd generation language application (like Cobol or PL/I), they are automatically sequenced in a procedural manner. Transformation to a declarative mode will eliminate procedural elements that are non-business in nature.

As shown in Figure 8 below, declarative relationships that reflect true business requirements will be modeled as triggers between rules and other rules or rule sets. In the majority of BRMS environments, a single rule may trigger multiple rules and rule sets, where the sequencing of each triggered rule or rule set is pre-compiled or resolved only at runtime.

Figure 8: As-is Business Rule Model (Event-driven)



Subject Matter Expert Review and Approval
Once mined rules have been transformed into business rules, they are handed over to subject matter experts (SMEs) and / or business analysts for review and approval.

Normally, SMEs will not make major changes to the rules at this point. Rule mining tools may include rule attribution capabilities to aid the SMEs and enable them to mark up the business rules as


 * Approved or rejected;
 * Reclassified to another category;
 * Annotated with additional information in textual description attributes.

Rule mining tools also often offer web portals with a functional focus on predefined SME activities. This can greatly accelerate the review and approval process.

12. Reports Production

Business rule reports are created in either hardcopy or digital formats. Rule mining tools produce reports and diagrams depicting detailed or summary rule information within your chosen context: hierarchy level, grouping, search result. These reports serve both as reference in the review steps and as documentation of record.

13. Integration with a Target Environment

Depending upon the adopted modernization strategy, integration requirements with other environments will vary.

Business Rule Development

Redevelopment with a business rule approach will typically leverage a BRMS authoring environment. These tools typically include XML import capabilities, which will be used to define an "as-is" business rule space, allowing rule developers to selectively re-use candidate rules deemed relevant for the target environment. This will provide valuable (and sometimes crucial) traceability from newly deployed rules back to their legacy origins.

Conventional Development

This approach typically involves building Java and .NET applications with comprehensive developer toolkits. Often, UML models will be used to define logical application views prior to actual code generation.

In these environments, mined business rules can be attached as behaviors in UML classes that leverage them. For example, an Order_Invoice class including Order_Discount as an attribute, may also include Calculate_Order_Discount as a class behavior. This behavior can be derived (and potentially imported) from the mined business rule performing the same function.

BPM

Process Management Business Process Management (BPM) tools facilitate process model creation and linkage to underlying rules and executable services. They also include the ability to define workflow rules (using BPEL) to govern the manual and automated transitions between activity nodes.

In this context, each activity node may be realized by business rules. Many-to-many relationships may exist between rule sets and supported activities. Populating BPM processes with their relevant mined "as-is" business rules can "close the loop" for business analysts and significantly advance IT / business alignment goals.

Requirements Management

Vendor offerings also include requirement tools that enable the definition of high level use cases, detailed flowcharts and activities for effective application development and management. In these environments, mined business rules can be imported and attached as either core requirements or as textual annotations to activity nodes.

SOA Enablement

Service enablement of existing applications involves code refactoring and deployment as service capsules. The rule mining step can be invaluable in locating fine-grained services within source code and serving up the required service components. For example, the results of automatic rule detection for the calculation of an order discount will include all code segments leading up to the final calculation. By creating a component slice with that code (and its dependents) only, the order discount calculation can be redeployed as a service.

14. Proactive Rules Management

We expect most applications to continue and maintained for many years into the future. Having mined rules from them, it is crucial that they continue to be updated and kept synchronized with future application changes. Rule mining tools offer maintenance and management capabilities, including


 * Automatic alignment of rules with their original code segments even when they have moved as a result of overall source code changes;
 * Audit trails for manual rule changes;
 * "Changer" routines allowing for individual or mass changes, post- rule mining.

Conclusion
A well-defined approach to business rule mining will allow for business contextualization early on in the process. Not only will the contextualization step help frame mined rules correctly, it will also reduce the rule mining investment to focus only on critical and dynamic business processes of interest. Regardless of the application modernization strategy adopted, the best practices and tool-assisted approaches described here will help you achieve your goals at a lower cost, with less repetition and higher quality results.

Notes and references

 * 1) NEUER, Mannes. Context is King: A Practical Approach to Rule Mining