Persone Apulia Research Gate

A Framework for Evaluating Design Methodologies for Big Data Warehouses: Measurement of the Design Process

A Parallel Algorithm to Compute Data Synopsis

Abstract: - Business Intelligence systems are based on traditional OLAP, data mining, and approximate query processing. Generally, these activities allow to extract information and knowledge from large volumes of data and to support decisional makers as concerns strategic choices to be taken in order to improve the business processes of the Information System. Among these, only approximate query processing deals with the issue of reducing response time, as it aims to provide fast query answers affected with a tolerable quantity of error. However, this kind of processing needs to pre-compute a synopsis of the data stored in the Data Warehouse. In this paper, a parallel algorithm for the computation of data synopses is presented.

Academic data warehouse design using a hybrid methodology

Accuracy Estimation in Approximate Query Processing

Abstract: - The methodologies used in approximate query processing are able to provide fast responses to queries that require high computational time in the decision making process. However, the approximate answers are affected with a small quantity of error. For this reason, it is important to provide also an accuracy of the approximate value, that is, a confidence degree of the approximation. In this paper, we present a probabilistic model that can be used in order to provide the accuracy measure in a methodology based on polynomial approximation. This probabilistic model is a Bayesian network able to estimate the relative error of the approximate answers.

Approximate Query Answering System Architecture

Business Intelligence is an activity that aims to extract information and knowledge from a central repository, the so-called data warehouse, in order to improve the business processes of an information system. Typical applications are based on reporting, on-line analytical processing, data mining, and approximate query processing. Business Intelligence platforms are software tools that allow to develop such applications. In general, these applications are composed of complex dashboards that provide a synthetic frame to be used in decision making. According to the criteria proposed by the Gartner Group to evaluate Business Intelligence platforms, one of the most important feature is the information delivery strategy, whereby final users can share the environment and access the same resources in real-time. For this reason, more and more traditional vendors include web services along their software packages. At the present time, Web-based platforms focus mainly on the issues related to the applications' deployment on a server. However, platforms that support approximate query processing require a more complex architecture since they generally need to perform a preliminary data reduction process and, then, they require ad hoc metadata. In this paper, the architecture of such a system is presented along with a proposal of standardization of metadata to be used in approximate query processing.

Benchmark for approximate query answering systems

Benchmark for evaluating approximate query processing on data streams

Big Data Warehouse Automatic Design Methodology

Traditional data warehouse design methodologies are based on two opposite approaches. The one is data oriented and aims to realize the data warehouse mainly through a eengineering process of the well-structured data sources solely, while minimizing the involvement of end users. The other is requirement oriented and aims to realize the data warehouse only on the basis of business goals expressed by end users, with no regard to the information obtainable from data sources. Since these approaches are not able to address the problems that arise when dealing with big data, the necessity to adopt hybrid methodologies, which allow the definition of multidimensional schemas by considering user requirements and reconciling them against non-structured data sources, has emerged. As a counterpart, hybrid methodologies may require a more complex design process. For this reason, the current research is devoted to introducing automatisms in order to reduce the design efforts and to support the designer in the big data warehouse creation. In this chapter, the authors present a methodology based on a hybrid approach that adopts a graph-based multidimensional model. In order to automate the whole design process, the methodology has been implemented using logical programming.

Cost-benefit analysis of data warehouse design methodologies

Evaluation of Business Intelligence Systems

Business Intelligence is an activity based on a set of processes and software tools. Its aim is to support the decisional making phase, by extracting information from synthetical data. As the success of such an activity depends on the effectiveness of several business processes and the correct integration of independent software tools, nowadays standardization is strongly needed, in order to define a methodology to obtain high-quality information, really useful for the improvement of the business processes of an Information System. In this context, our study focuses on a framework that encapsulates current emerging criteria to evaluate every facet of Business Intelligence systems. In our case study, we tested the criteria to evaluate Business Intelligence platforms, by developing a real OLAP application in an Academic Information System.

Evaluation of data warehouse design methodologies in the context of big data

GrHyMM: A Graph-Oriented Hybrid Multidimensional Model

The main methodologies for the data warehouse design are based on two approaches which are opposite and alternative each other. The one, based on the data-driven approach, aims to produce a conceptual schema mainly through a reengineering process of the data sources, while minimizing the involvement of end users. The other is based on the requirement-driven approach and aims to produce a conceptual schema only on the basis of requirements expressed by end users. As each of these approaches has valuable advantages, it is emerged the necessity to adopt a hybrid methodology which combines the best features of the two approaches. We introduce a conceptual model that is based on a graph-oriented representation of the data sources. The core of the proposed hybrid methodology is constituted by an automatic process of reengineering of data sources that produces the conceptual schema using a set of requirement-derived constraints.

Hybrid methodology for data warehouse conceptual design by UML schemas

Context: Data warehouse conceptual design is based on the metaphor of the cube, which can be derived from either requirement-driven or data-driven methodologies. Each methodology has its own advantages. The first allows designers to obtain a conceptual schema very close to the user needs but it may be not supported by the effective data availability. On the contrary, the second ensures a perfect traceability and consistence with the data sources—in fact, it guarantees the presence of data to be used in analytical processing—but does not preserve from missing business user needs. To face this issue, the necessity emerged in the last years to define hybrid methodologies for conceptual design. Objective: The objective of the paper is to use a hybrid methodology based on different multidimensional models in order to gather all advantages of each of them. Method: The proposed methodology integrates the requirement-driven strategy with the data-driven one, in that order, possibly performing alterations of functional dependencies on UML multidimensional schemas reconciled with data sources. Results: As case study, we illustrate how our methodology can be applied to the university environment. Furthermore, we evaluate quantitatively the benefits of this methodology by comparing it with some popular and conventional methodologies. Conclusion: In conclusion, we highlight how the hybrid methodology improves the conceptual schema quality. Finally, we outline our present work devoted to introduce automatic design techniques in the methodology on the basis of the logical programming.

Improving database security in web-based environments

Logic Programming for Data Warehouse Conceptual Schema Validation

Abstract. The current lack of a standard methodology for data warehouse design has led to have many possible lifecycles. In some of them, the validation of the data warehouse conceptual schema is a specific process that precedes the translation of such a schema into a logical one. This activity must ensure that the data warehouse to be implemented effectively allows all the analytical queries to be executed correctly. To accomplish this, the validation process takes the preliminary workload into account, that is, a set of queries defined from user requirements to obtain the typical information the users are interested in. The methodologies that perform such a validation process define some guidelines that must be manually executed by an expert. In this paper, we introduce a logic program to automate this activity, by checking a set of predefined issues with an inferential engine.

Metadata for Approximate Query Answering Systems

In business intelligence systems, data warehouse metadata management and representation are getting more and more attention by vendors and designers. The standard language for the data warehouse metadata representation is the Common Warehouse Metamodel. However, business intelligence systems include also approximate query answering systems, since these software tools provide fast responses for decisionmaking on the basis of approximate query processing. Currently, the standard meta-model does not allow to represent the metadata needed by approximate query answering systems. In this paper, we propose an extension of the standard metamodel, in order to define the metadata to be used in online approximate analytical processing. These metadata have been successfully adopted in ADAP, a web-based approximate query answering system that creates and uses statistical data profiles.

Metrics for Approximate Query Engine Evaluation

The performance evaluation of the transaction processing in Database Management Systems used for Decision Support Systems is the aim of the current TPC-H standard. Decision Support Systems also include Approximate Query Answering Systems. However, the TPC-H does not define a methodology to evaluate these systems, nor does it provide useful metrics. In this paper, we address the problem related to the extension of TPC-H, in order to adjust it for the performance evaluation of the engines used in approximate query processing to provide fast and approximate responses to analytical queries.

Parallel Computing for Data Reduction

Abstract: - Data Warehouses are databases used in Business Intelligence systems as a data source to develop analytical applications. These applications consist of multidimensional analyses of data and allow decisional makers to improve the business processes of the Information System. Since multidimensional analyses require to aggregate data on several attributes, techniques based on approximate query answering have been introduced in order to reduce the response time. These techniques use, as a data source, a synopsis of the data stored in the Data Warehouse. In this paper, a parallel algorithm for the computation of data synopsis is presented.

Probabilistic Model for Accuracy Estimation in Approximate Monodimensional Analyses

Abstract: - Approximate query processing is often based on analytical methodologies able to provide fast responses to queries. As a counterpart, the approximate answers are affected with a small quantity of error. Nowadays, these techniques are being exploited in data warehousing environments, because the queries devoted to extract information involve high-cardinality relations and, therefore, require a high computational time. Approximate answers are profitably used in the decision making process, where the total precision is not needed. Thus, it is important to provide decision makers with accuracy estimates of the approximate answers; that is, a measure of how much reliable the approximate answer is. Here, a probabilistic model is presented for providing such an accuracy measure when the analytical methodology used for decisional analyses is based on polynomial approximation. This probabilistic model is a Bayesian network able to estimate the relative error of the approximate answers.

Research Data Mart in an Academic System

Data warehousing is an activity that is getting more and more attention in several contexts. Also Universities are adopting data warehousing solutions for business intelligence purpose. In these contexts, there are specific aspects to be considered, such as the Didactics and the Research evaluation. Indeed, these are the main factors affecting the importance and the quality level of every University. In this paper, we present the architecture of a Business Intelligence system in an academic organization and we illustrate the design of a data mart devoted to the evaluation of the Research activities.

System Architecture for Approximate Query Processing

Decision making is an activity that addresses the problem of extracting knowledge and information from data stored in data warehouses, in order to improve the business processes of information systems. Usually, decision making is based on On-Line Analytical Processing, data mining, or approximate query processing. In the last case, answers to analytical queries are provided in a fast manner, although affected with a small percentage of error. In the paper, we present the architecture of an approximate query answering system. Then, we illustrate our ADAP (Analytical Data Profile) system, which is based on an engine able to provide fast responses to the main statistical functions by using orthogonal polynomials series to approximate the data distribution of multidimensional relations. Moreover, several experimental results to measure the approximation error are shown and the response-time to analytical queries is reported.

Ruolo

Organizzazione

Dipartimento

Area Scientifica

Settore Scientifico Disciplinare

Settore ERC 1° livello

Settore ERC 2° livello

Settore ERC 3° livello