Practical DevOps for Big Data/Quality Simulation

Introduction Quality assurance of DIA that use Big Data technologies is still an open issue. We have defined a quality-driven framework for developing DIA based on MDE techniques. Here, we propose the architecture of a tool for predicting quality of DIA. In particular, the quality dimensions we are interested are efficiency and reliability. This tool architecture addresses the simulation of the behaviour of a DIA using Petri net models.

In our view software non-functional properties follow the definition of the ISO /IEC standards and may be summarised as follows:
 * Reliability: The capability of a software product to maintain a specified level of performance, including Availability and Fault tolerance.
 * Performance: The capability of a software product to provide appropriate performance, relative to the amount of resources used, as described by time behaviour and resource utilisation. The terms performance and efficiency are considered synonyms throughout.

This chapter introduces definitions of reliability and performance properties and metrics that are relevant to the analysis. Service-Level Agreements (SLAs) are readily determined by predicating on these quantities. SLAs can be directly annotated in the UML models, thus we do not look at other forms of specification (e.g. Web Services (WS) standards ).

Motivation This section reviews some basic definitions concerning performance prediction metrics. These are standard definitions that are usually used in the context of queueing models, but are also applicable to performance predictions obtained with other formalisms, such as Stochastic Petri Nets. We give definitions for the basic case where requests are considered of a single type (i.e. a single class model). The generalisation for multiple types is simple and in most cases requires only adding an index to each metric to indicate the class of requests it refers to.

Performance Metrics Considering an abstract system where T is the length of time we observe a system, A is the number of arrivals into the system, and C is the number of completions (users leaving the system) we can define the commonly used measures : In a system with a single resource we can measure Bk as the time the resource k was observed to be busy, and denote Ck the number of arrivals at the resource. If Tk is the length of the observation time, we can then define two additional measures: From these measures we can derive the Utilisation Law as U = XS. The above quantities can be made specific to a given resource, for example Uk stands for the utilisation of resource k. One of the most useful fundamental laws is Little's Law which states that N, the average number of customers in a system, is equal to the product of X, the throughput of the system, and R, the average response time a customer stays in the system. Formally this gives: N = XR. The formula is unchanged if one considers a closed model, which has a fixed population N of customers. Another fundamental law of queueing systems is the Forced Flow Law which, informally, states that the throughputs in all parts of the system must be proportional to one another. Formally the Forced Flow Law is given by: Xk = Vk X, where Xk is the throughput at resource k, Vk is the visit count of resource k, i.e., the mean number of times users hit this resource. Combining both Little's Law and the Forced Flow Law allows for a wide range of scenarios to be analysed and solved for a particular desired quantity. For example, it is possible to compute utilisation at a server k in a distributed network directly as Uk=X Dk where Dk=VkSk is called the service demand at server k.
 * Arrival rate: A / T
 * Throughput: X = C / T (simply the rate of request completions)
 * Utilisation, Uk = Bk / Tk
 * Service time per request, Sk = Bk / Ck

Reliability Metrics The area of reliability prediction is established and focuses on determining the value for a number of standard metrics, which we review below. The execution time or calendar time is appropriate to define the reliability as R(t)=Prob(𝞃>t} that is, reliability at time t is the probability that the time to failure 𝞃 is greater than t or, the probability that the system is functioning correctly during the time interval (0,t]. Considering that F(t)= 1-R(t) (i.e., unreliability) is a probability distribution function, we can calculate the expectation of the random variable 𝞃 as $$\int_0^\infty t~dF(t)=\int_0^\infty R(t)~dt$$. This is called Mean Time to Failure (MTTF) and represents the expected time until the next failure will be observed.

The failure rate (called also rate of occurrence of failures) represents the probability that a component fails between (t,dt), assuming that it has survived until the instant t, and is defined as a function of R(t): $$h(t)=-\frac{1}{R(t)}\frac{dR(t)}{dt}$$. The cumulative failure function denotes the average cumulative failures associated with each point in time, E[N(t)].

Maintainability is measured by the probability that the time to repair (𝜽) falls into the interval (0,t] M(t) = Prob { 𝜽 ⪯ t } Similarly, we can calculate the expectation of the random variable 𝜽 as $$\int_0^\infty t~dM(t)$$, that is called MTTR (Mean Time To Repair), and the repair rate as $$\frac{dM(t)}{dt}\frac{1}{1-M(t)}$$.

A key reliability measure for systems that can be repaired or restored is the MTBF (Mean Time Between Failures), that is the expected time between two successive failures of a system. The system/service reliability on-demand is the probability of success of the service when requested. When the average time to complete a service is known, then it might be possible to convert between MTBF and reliability on-demand.

Availability is defined as the probability that the system is functioning correctly at a given instant A(t)=Prob{state=UP,time=t}. In particular, the steady state availability can be expressed as function of MTTF and MTTR (or MTBF): $$Availability_\infty=\frac{MTTF}{MTTF+MTTR}=\frac{MTTF}{MTBF}$$.

Existing Solutions

While there exist multiple tools that can simulate software models and obtain its quality properties, there is not any tool that offers the capabilities to simulate, from software design models, the quality of applications that use the Big Data technologies considered in DICE.

How the tool works Next image shows a possible architecture of a Simulation tool and the internal data flows.



Next, we provide a description of the different modules, the data they share, and their nature:
 * 1) The DICE-IDE is an Eclipse-based environment in which the different components are integrated.
 * 2) A simulation process starts by defining a set of DICE-Profiled UML models. For this stage, a pre-existing modeling tool is used. Papyrus UML is one of the open source UML modelling tools that support the MARTE, in which the DICE profile is based on. As proposed in the Technical Report, this component/tool is used to perform the initial modelling stage.
 * 3) When the user (the QA Engineer) wants to simulate a model, he/she uses the Simulator GUI to start a simulation. The Simulator GUI is an ad hoc Eclipse component that contributes a set of graphical interfaces to the DICE-IDE. These interfaces are tightly integrated within the DICE-IDE providing a transparent way for interacting with the underlying analysis tools. The Simulation Configuration Component is a sub-component of the Simulator GUI. It is in charge of: (i) asking for the model to be simulated (using the DICE-IDE infrastructure, dialogs, etc.); and (ii) asking for any additional data required by the Simulator.
 * 4) When the user has finished the configuration of a simulation, the Configuration Tool passes two different files to the Simulator: the DICE-profiled UML model (i.e., the model to be analysed) and the Configuration model. The Simulator is an ad hoc OSGi component that runs in background. It has been specifically designed to orchestrate the interaction among the different tools that perform the actual analysis.
 * 5) The Simulator executes the following steps: (i) transforms the UML model into a PNML file using a M2M transformation tool; (ii) converts the previous PNML file to a GreatSPN-readable file using a M2T transformation tool; (iii) evaluates the GreatSPN-readable file using the GreatSPN tool; and (iv) builds a tool-independent solution from the tool-specific file produced by GreatSPN. To execute the M2M transformations we have selected the eclipse QVTo transformations engine. QVT is the standard language proposed by the OMG (the same organism behind the UML and MARTE standards) to define M2M transformations. QVT proposes three possible languages to define model transformations: operational mappings (QVTo, imperative, low-level), core (QVTc, declarative, low-level) and relations (QVTr, declarative, high-level). However, although there are important efforts to provide implementations for all of them, only the one for QVTo is production-ready, and as such is the chosen one. To execute the M2T transformations we have selected Acceleo . Starting from Acceleo 3, the language used to defined an Acceleo transformation is an implementation of the MOFM2T standard, proposed by the OMG too. In this sense, we have selected Acceleo to make all our toolchain compliant to the OMG standards, from the definition of the initial (profiled) UML models to the 3rd party analysis tools (which use a proprietary format). The analysis is performed using the GreatSPN tool. GreatSPN is a complete framework for the modeling, analysis and simulation of Petri nets. This tool can leverage those classes of Petri nets needed by our simulation framework, i.e., Generalized Stochastic Petri Nets (GSPN) and their colored version, namely Stochastic Well-formed Nets (SWN). GreatSPN includes a wide range of GSPN/SWN solvers for the computation of performance and reliability metrics (the reader can refer to the ”State of the art analysis” deliverable D1.1 for details about the GreatSPN functionalities).
 * 6) Finally, the tool-independent report produced by the Simulator is presented in the DICE-IDE using a graphical component of the Simulator GUI. This component provides a comprehensive Assesment of Performance and Reliability Metrics report in terms of the concepts defined in the initial UML model.

Open Challenges The open challenges in the simulation of software applications that use Big Data technologies are shared with the general simulation of software systems. Next list describes three of the main challenges: Application domain: known uses
 * Obtain accurate model parameters: Some information that users shall provide in the UML models is not easy to obtain, such as the execution time of activities or probabilities of execution of each branch in operators of condition. For instance, users would need powerful monitors to measure the exact values of model parameters, or process mining techniques acting upon logs of the running application to discover them, or significant expertise to estimate them. Although the Simulation Tool implements what-if analysis to relieve users of knowing the exact value of some parameters, the creation and evaluation of a single exact/accurate model is still an open challenge
 * Usability in model generation: the generation of profiled UML models that are the input of the simulation tool is done though Papyrus tool. These profiled models use the DICE profiles, which are in turn based on the standard MARTE profile. For users who are non-experts on the utilization of MARTE, the meaning, purpose and definition of some attributes that are inherited by DICE from MARTE stereotypes may not be clear from the very beginning. This utilization of standard profiles might reduce the slope of the learning curve for creating correct inputs for the tool.
 * Simulation in presence of rare events: the presence of rare events hinders the evaluation of systems based on discrete event simulation. For instance, this may happen in conditional branches whose execution probability is very low. In this case, the current implementation of the tool would require much more simulation time to generate results with high confidence. Research advances have been achieved in this challenge, and techniques for simulation in presence of rare have been proposed. Equipping the Simulation tool with some of these techniques is at present an open challenge and part of future work.
 * Simulation in presence of rare events: the presence of rare events hinders the evaluation of systems based on discrete event simulation. For instance, this may happen in conditional branches whose execution probability is very low. In this case, the current implementation of the tool would require much more simulation time to generate results with high confidence. Research advances have been achieved in this challenge, and techniques for simulation in presence of rare have been proposed. Equipping the Simulation tool with some of these techniques is at present an open challenge and part of future work.
 * Simulation in presence of rare events: the presence of rare events hinders the evaluation of systems based on discrete event simulation. For instance, this may happen in conditional branches whose execution probability is very low. In this case, the current implementation of the tool would require much more simulation time to generate results with high confidence. Research advances have been achieved in this challenge, and techniques for simulation in presence of rare have been proposed. Equipping the Simulation tool with some of these techniques is at present an open challenge and part of future work.

The Quality Simulation through the implemented Simulation Tool has been applied to the BigBlu application. BigBlu is an e-government software system developed for Tax Fraud Detection which manages large quantity of data from taxpayers. A more detailed description of BigBlu is provided in Fraud Detection.

Next image depicts the UML activity and deployment diagrams created with Papyrus tool that are the inputs for the Simulation Tool. These models are annotated with DICE and MARTE profiles. More in detail, part (a) shows the workflow of the application. It begins with an initial node, followed by a decision node which divides the execution workflow in multiple paths, depending on the decision condition (e.g., "Fraud indicator creation" in the first branch). Each path has several activities that represent the particular execution steps. These activity nodes are stereotyped as <>, a stereotype from MARTE. <> allows capturing performance properties of the application operation, such as the expected execution time of the task or the probability to be carried out. Finally, every path converges to a merge node and then the workflow finishes.

Every partition of the activity diagram in part (a) is mapped to an artefact, which is hosted by a device in part (b). Therefore diagram (b) represents the deployment of the application in physical devices, which can also be refined by MARTE stereotypes (e.g., PaLogicalResource) for capturing the hardware details.



The Simulation Tool has been applied to evaluate the expected performance of BigBlu --in terms of response time of execution requests and utilization of resources-- with respect to both the intensity with which the application is used (i.e., the arrival rate of requests) and the number of Big Data Processing Nodes deployed (see that Big Data Processing Nodes are the computing nodes that execute the Launch fraud detection activity). The type of results obtained are depicted in next figure, whose part (a) depicts the expected response time of BigBlu varying the arrival rate of requests and number of processing nodes and part (b) depicts the expected utilization of resources. A more detailed study, input values in models and experimentation can be found in.



Conclusion This chapter has motivated the simulation of software applications that use Big Data technologies for evaluating their quality in terms of performance and reliability. It has also presented the DICE Simulation Tool, a software tool that implements this quality simulation. The DICE Simulation Tool is able to cover all the steps of a simulation workflow: from the design of the model to simulate, its model transformation to analysable models, the simulation of the analysable model to compute its properties, to the retrieval of the quality results to the user in the domain of the design model through a user-friendly GUI.