Effettua una ricerca
Corrado Mencar
Ruolo
Ricercatore
Organizzazione
Università degli Studi di Bari Aldo Moro
Dipartimento
DIPARTIMENTO DI INFORMATICA
Area Scientifica
AREA 01 - Scienze matematiche e informatiche
Settore Scientifico Disciplinare
INF/01 - Informatica
Settore ERC 1° livello
Non Disponibile
Settore ERC 2° livello
Non Disponibile
Settore ERC 3° livello
Non Disponibile
DC* (Double Clustering with A*) is an algorithm capable of generating highly interpretable fuzzy information granules from preclassified data. These information granules can be used as bulding-blocks for fuzzy rule-based classifiers that exhibit a good tradeoff between interpretability and accuracy. DC* relies on A* for the granulation process, whose efficiency is tightly related to the heuristic function used for estimating the costs of candidate solutions. In this paper we propose a new heuristic function that is capable of exploiting class information to overcome the heuristic function originally used in DC* in terms of efficiency. The experimental results show that the proposed heuristic function allows huge savings in terms of computational effort, thus making DC* a competitive choice for designing interpretable fuzzy rule-based classifiers.
When approaching real-world problems with intelligent systems, an interaction with user is often expected. However, data-driven models are usually evaluated only in terms of accuracy, thus not involving users. In literature several works have been proposed for defining measures for interpretability assessment, however, such measures are mostly based on a structural evaluation. For this reason, we investigated a new methodology for assessing interpretability based on semantic cointension. The objective of this work is to provide empirical evidence about the usefulness of semantic cointension in facing a medical problem, namely the prediction of prognosis in Immunoglobulin A Nephropathy. An experimental session has been conducted, where fuzzy rule-based classifiers have been modeled, which are highly interpretable from the structural viewpoint. Results show that through the notion of semantic cointension it is possible to perform a semantic-driven assessment of interpretability, which also takes into account the overall fuzzy inference schema.
In this paper we compare two algorithms that are capable of generating fuzzy partitions from data so as to verify a number of interpretability constraints: Hierarchical Fuzzy Partitioning (HFP) and Double Clustering with A* (DC*). Both algorithms exhibit the distinguishing feature of self-determining the number of fuzzy sets in each fuzzy partition, thus relieving the user from the selection of the best granularity level for each input feature. However, the two algorithms adopt very different approaches in generating fuzzy partitions, thus motivating an extensive experimentation to highlight points of strength and weakness of both. The experimental results show that, while HFP is on the average more efficient, DC* is capable of generating fuzzy partitions with a better trade-off between interpretability and accuracy, and generally offers greater stability with respect to its hyper-parameters.
In the era of the Internet of Things and Big Data, data scientists are required to extract valuable knowledge from the given data. This challenging task is not straightforward. Data scientists first analyze, cure and pre-process data. Then, they apply Artificial Intelligence (AI) techniques to automatically extract knowledge from data. However, nowadays the focus is set on knowledge representation and how to enhance the human-machine interaction. Non-expert users, i.e., users without a strong background on AI, require a new generation of explainable AI systems. They are expected to naturally interact with humans, thus providing comprehensible explanations of decisions automatically made. In this paper, we sketch how certain computational intelligence techniques, namely interpretable fuzzy systems, are ready to play a key role in the development of explainable AI systems. Interpretable fuzzy systems have already successfully contributed to build explainable AI systems for cognitive cities.
In computing with words (CWW), knowledge is linguistically represented and has an explicit semantics defined through fuzzy information granules. The linguistic representation, in turn, naturally bears an implicit semantics that belongs to users reading the knowledge base; hence a necessary condition for achieving interpretability requires that implicit and explicit semantics are cointensive. Interpretability is definitely stringent when knowledge must be acquired from data through inductive learning. Therefore, in this paper we propose a methodology for designing interpretable fuzzy models through semantic cointension. We focus our analysis on fuzzy rule-based classifiers (FRBCs), where we observe that rules resemble logical propositions, thus semantic cointension can be partially regarded as the fulfillment of the "logical view", i.e. the set of basic logical laws that are required in any logical system. The proposed approach is grounded on the employment of a couple of tools: DCf, which extracts interpretable classification rules from data, and Espresso, that is capable of fast minimization of Boolean propositions. Our research demonstrates that it is possible to design models that exhibit good classification accuracy combined with high interpretability in the sense of semantic cointension. Also, structural parameters that quantify model complexity show that the derived models are also simple enough to be read and understood. © 2011 Elsevier Inc. All rights reserved.
The adoption of triangular fuzzy sets to define Strong Fuzzy Partitions (SFPs) is a common practice in the research community: due to their inherent simplicity, triangular fuzzy sets can be easily derived from data by applying suitable clustering algorithms. However, the choice of triangular fuzzy sets may be limiting for the modeling process. In this paper we focus on SFPs built up starting from cuts (points of separation between cluster projections on data dimensions), showing that a SFP based on cuts can always be defined by trapezoidal fuzzy sets. Different mechanisms to derive SFPs from cuts are presented and compared by employing DC*, an algorithm for extracting fuzzy information granules from classified data.
DC* is a method for generating interpretable fuzzy information granules from pre-classified data. It is based on the subsequent application of LVQ1 for data compression and an ad-hoc procedure based on A* to represent data with the minimum number of fuzzy information granules satisfying some interpretability constraints. While being efficient in tackling several problems, the A* procedure included in DC* may happen to require a long computation time because the A* algorithm has exponential time complexity in the worst case. In this paper, we approach the problem of driving the search process of A* by suggesting a close-to-optimal solution that is produced through a Genetic Algorithm (GA). Experimental evaluations show that, by driving the A* algorithm embodied in DC* with a GA solution, the time required to perform data granulation can be reduced from 45% to 99%.
Fuzzy relations are simple mathematical structures that enable a very general representation of fuzzy knowledge, and fuzzy relational calculus offers a powerful machinery for approximate reasoning. However, one of the most relevant limitations of approximate reasoning is the efficiency bottleneck. In this paper, we present two implementations for fast fuzzy inference through relational composition, with the twofold objective of being general and efficient. The two implementations are capable of working on full and sparse representations respectively. Further, a wrapper procedure is capable of automatically selecting the best implementation on the basis of the input features. We implemented the code in GNU Octave because it is a high-level language targeted to numerical computations. Experimental results show the impressive performance gain when the proposed implementation is used.
In questo articolo proponiamo l’impiego delle fattorizzazioni matriciali non negative per l’analisi dei dati nell’Educational Data Mining. Il metodo si basa su un processo di decomposizione di un dataset per l’estrazione di informazioni latenti di immediata interpretazione. In particolare, l’applicazione delle fattorizzazioni non negative a score matrix consente di generare in modo automatico le cosiddette question matrix (Q-matrix), che descrivono le abilità necessarie affinché uno studente possa rispondere adeguatamente a questionari di valutazione. Un esempio su dati real-world illustra l’efficacia del metodo.
Granular computing is a problem solving paradigm based on information granules, which are conceptual entities derived through a granulation process. Solving a complex problem, via a granular computing approach, means splitting the problem into information granules and handling each granule as a whole. This leads to a multi-level view of information granulation, which permeates human reasoning and has a significant impact in any field involving both human-oriented and machine- oriented problem solving. In this chapter we examine a view of granular computing as a paradigm of human-inspired problem solving and information processing with multiple levels of granularity, with special focus on fuzzy information granulation. To support the importance of granulation with multiple levels, we present a multi-level approach for extracting well-defined and semantically sound fuzzy information granules from numerical data.
In this paper we face the problem of intelligently analyze Twitter data. We propose a novel workflow based on Nonnegative Matrix Factorization (NMF) to collect, organize and analyze Twitter data. The proposed workflow firstly fetches tweets from Twitter (according to some search criteria) and processes them using text mining techniques; then it is able to extract latent features from tweets by using NMF, and finally it clusters tweets and extracts human-interpretable topics. We report some preliminary experiments demonstrating the effectiveness of the proposed workflow as a tool for Intelligent Data Analysis (IDA), indeed it is able to extract and visualize interpretable topics from some newly collected Twitter datasets, that are automatically grouped together according to these topics. Furthermore, we numerically investigate the influence of different initializations mechanisms for NMF algorithms on the factorization results when very sparse Twitter data are considered. The numerical comparisons confirm that NMF algorithms can be used as clustering method in place of the well known k-means.
Computing with words (CWW) relies on linguistic representation of knowledge that is processed by operating at the semantical level defined through fuzzy sets. Linguistic representation of knowledge is a major issue when fuzzy rule based models are acquired from data by some form of empirical learning. Indeed, these models are often requested to exhibit interpretability, which is normally evaluated in terms of structural features, such as rule complexity, properties on fuzzy sets and partitions. In this paper we propose a different approach for evaluating interpretability that is based on the notion of cointension. The interpretability of a fuzzy rule-based model is measured in terms of cointension degree between the explicit semantics, defined by the formal parameter settings of the model, and the implicit semantics conveyed to the reader by the linguistic representation of knowledge. Implicit semantics calls for a representation of user's knowledge which is difficult to externalise. Nevertheless, we identify a set of properties - which we call "logical view" - that is expected to hold in the implicit semantics and is used in our approach to evaluate the cointension between explicit and implicit semantics. In practice, a new fuzzy rule base is obtained by minimising the fuzzy rule base through logical properties. Semantic comparison is made by evaluating the performances of the two rule bases, which are supposed to be similar when the two semantics are almost equivalent. If this is the case, we deduce that the logical view is applicable to the model, which can be tagged as interpretable from the cointension viewpoint. These ideas are then used to define a strategy for assessing interpretability of fuzzy rule-based classifiers (FRBCs). The strategy has been evaluated on a set of pre-existent FRBCs, acquired by different learning processes from a well-known benchmark dataset. Our analysis highlighted that some of them are not cointensive with user's knowledge, hence their linguistic representation is not appropriate, even though they can be tagged as interpretable from a structural point of view.
We propose an approach to integrate the KEEL software tool for knowledge discovery within the KNIME Analytics platform. The integration approach is non-invasive as it does not require the modification of source code in neither of the tools. As a result of the integration, it is possible to use the algorithms provided with KEEL — including many fuzzy methods — directly in KNIME workflows, thus taking the advantages of both tools. We report two simple integration examples, which show the effectiveness of the proposed approach in building data analysis workflows involving KEEL methods, possibly along with methods provided by other knowledge discovery tools like WEKA.
Recommender systems are systems capable of assisting users by quickly providing them with relevant resources according to their interests or preferences. The efficacy of a recommender system is strictly connected with the possibility of creating meaningful user profiles, including information about user preferences, interests, goals, usage data and interactive behavior. In particular, analysis of user preferences is important to predict user behaviors and make appropriate recommendations. In this paper, we present a fuzzy framework to represent, learn and update user profiles. The representation of a user profile is based on a structured model of user cognitive states, including a competence profile, a preference profile and an acquaintance profile. The strategy for deriving and updating profiles is to record the sequence of accessed resources by each user, and to update preference profiles accordingly, so as to suggest similar resources at next user accesses. The adaption of the preference profile is performed continuously, but in earlier stages it is more sensitive to updates (plastic phase) while in later stages it is less sensitive (stable phase) to allow resource recommendation. Simulation results are reported to show the effectiveness of the proposed approach.
In modern cities, everything is connected to Internet and the amount of data available on-line grows dramatically. Humans face two main challenges: i) to extract valuable knowl- edge from the Big Data; ii) to become part of the equation as active actors in the Internet of Things. Fuzzy intelligent systems are currently used in many applications in the context of Smart Cities. Now, it is time to address the effective interaction between intelligent systems and citizens with the aim of passing from Smart to Cognitive Cities. We claim that the use of interpretable fuzzy systems and natural language generation can facilitate such interaction and pave the way towards Cognitive Cities.
The application of high-performance Next-Generation Sequencing (NGS) technologies is widely used to characterize case-control comparison studies for RNA transcripts, such as mRNAs and small non-coding RNAs. The first step in the analy- sis strategies is mapping NGS reads against a reference database, and a critical issue is choosing how to deal with multiread problem. In this paper we present a novel ap- proach to represent and quantify read mapping ambiguities through the use of fuzzy sets and possibility theory. The aim of this work is to obtain a list of candidate differential expression events, ordered by significance, providing a description of the uncertainty of the results due to the multiread issue. A preliminary experiment on a case-control study of human endobronchial biopsies resulted in the identification of 9 genes with possible differential expression, four of them with an uncertain fold change. This result was con- firmed by FDR adjusted Fisher’s test, while the same data processed with DESeq2 did not provide significant differences between case and control.
Decision support systems in Medicine must be easily comprehensible, both for physicians and patients. In this chapter, the authors describe how the fuzzy modeling methodology called HILK (Highly Interpretable Linguistic Knowledge) can be applied for building highly interpretable fuzzy rule-based classifiers (FRBCs) able to provide medical decision support. As a proof of concept, they describe the case study of a real-world scenario concerning the development of an interpretable FRBC that can be used to predict the evolution of the end-stage renal disease (ESRD) in subjects affected by Immunoglobin A Nephropathy (IgAN). The designed classifier provides users with a number of rules which are easy to read and understand. The rules classify the prognosis of ESRD evolution in IgAN-affected subjects by distinguishing three classes (short, medium, long). Experimental results show that the fuzzy classifier is capable of satisfactory accuracy results – in comparison with Multi-Layer Perceptron (MLP) neural networks – and high interpretability of the knowledge base.
We discuss Non-negative Matrix Factorization (NMF) techniques from the point of view of Intelligent Data Analysis (IDA), i.e. the intelligent application of human expertise and computational models for advanced data analysis. As IDA requires human involvement in the analysis process, the understandability of the results coming from computational models has a prominent importance. We therefore review the latest developments of NMF that try to fulfill the understandability requirement in several ways. We also describe a novel method to decompose data into user-defined --- hence understandable --- parts by means of a mask on the feature matrix, and show the method's effectiveness through some numerical examples.
We face the problem of interpreting parts of a dataset as small selections of features. Particularly, we propose a novel masked non- negative matrix factorization algorithm which is used to explain data as a composition of interpretable parts which are actually hidden in them and to introduce knowledge in the factorization process. Numerical ex- amples prove the effectiveness of the proposed MNMF algorithm as a useful tool for Intelligent Data Analysis.
In this paper we illustrate the use of Nonnegative Matrix Factorization (NMF) to analyze real data derived from an e-learning context. NMF is a matrix decomposition method which extracts latent information from data in such a way that it can be easily interpreted by humans. Particularly, the NMF of a score matrix can automatically generate the so called Q-matrix. In an e-learning scenario, the Q-matrix describes the abilities to be acquired by students to correctly answer evaluation exams. An example on real response data illustrates the effectiveness of this factorization method as a tool for EDM.
Many fuzzy algorithms and models are indeed aimed at extracting knowledge from data, and the acquired knowledge must be usually communicated to users. However, as far as such knowledge is difficult to understand by users, the acceptance of such methods may be seriously compromised. Interpretability must be the central point on fuzzy system modeling. Nowadays, interpretability is recognized as one of the most valuable properties of fuzzy systems. Thus, interpretability issues constitute a fruitful and up-to-date research line in the fuzzy community.
Non-negative matrix factorization is a multivariate analysis method which is proven to be useful in many areas such as bio-informatics, molecular pattern discovery, pattern recognition, document clustering and so on. It seeks a reduced representation of a multivariate data matrix into the product of basis and encoding matrices possessing only non-negative elements, in order to learn the so called part-based representations of data. All algorithms for computing non-negative matrix factorization are iterative, therefore particular emphasis must be placed on a proper initialization of NMF because of its local convergence. The problem of selecting appropriate starting matrices becomes more complex when data possess special meaning as in document clustering. In this paper, we propose the adoption of the subtractive clustering algorithm as a scheme to generate initial matrices for non-negative matrix factorization algorithms. Comparisons with other commonly adopted initializations of non-negative matrix factorization algorithms have been performed and the proposed scheme reveals to be a good trade-off between effectiveness and speed. Moreover, the effectiveness of the proposed initialization to suggest a number of basis for NMF, when data distances are estimated, is illustrated when NMF is used for solving clustering problems where the number of groups in which the data are grouped is not known a-priori. The influence of a proper rank factor on the interpretability and the effectiveness of the results are also discussed.
The integration of fuzzy sets in ontologies for the Semantic Web can be achieved in different ways. In most cases, fuzzy sets are defined by hand or with some heuristic procedure that does not take into account the distribution of available data. In this paper, we describe a method for introducing a granular view of data within an OWL ontology.
Condividi questo sito sui social