Effettua una ricerca
Ciro Castiello
Ruolo
Ricercatore
Organizzazione
Università degli Studi di Bari Aldo Moro
Dipartimento
DIPARTIMENTO DI INFORMATICA
Area Scientifica
AREA 01 - Scienze matematiche e informatiche
Settore Scientifico Disciplinare
INF/01 - Informatica
Settore ERC 1° livello
Non Disponibile
Settore ERC 2° livello
Non Disponibile
Settore ERC 3° livello
Non Disponibile
DC* (Double Clustering with A*) is an algorithm capable of generating highly interpretable fuzzy information granules from preclassified data. These information granules can be used as bulding-blocks for fuzzy rule-based classifiers that exhibit a good tradeoff between interpretability and accuracy. DC* relies on A* for the granulation process, whose efficiency is tightly related to the heuristic function used for estimating the costs of candidate solutions. In this paper we propose a new heuristic function that is capable of exploiting class information to overcome the heuristic function originally used in DC* in terms of efficiency. The experimental results show that the proposed heuristic function allows huge savings in terms of computational effort, thus making DC* a competitive choice for designing interpretable fuzzy rule-based classifiers.
When approaching real-world problems with intelligent systems, an interaction with user is often expected. However, data-driven models are usually evaluated only in terms of accuracy, thus not involving users. In literature several works have been proposed for defining measures for interpretability assessment, however, such measures are mostly based on a structural evaluation. For this reason, we investigated a new methodology for assessing interpretability based on semantic cointension. The objective of this work is to provide empirical evidence about the usefulness of semantic cointension in facing a medical problem, namely the prediction of prognosis in Immunoglobulin A Nephropathy. An experimental session has been conducted, where fuzzy rule-based classifiers have been modeled, which are highly interpretable from the structural viewpoint. Results show that through the notion of semantic cointension it is possible to perform a semantic-driven assessment of interpretability, which also takes into account the overall fuzzy inference schema.
In this paper we compare two algorithms that are capable of generating fuzzy partitions from data so as to verify a number of interpretability constraints: Hierarchical Fuzzy Partitioning (HFP) and Double Clustering with A* (DC*). Both algorithms exhibit the distinguishing feature of self-determining the number of fuzzy sets in each fuzzy partition, thus relieving the user from the selection of the best granularity level for each input feature. However, the two algorithms adopt very different approaches in generating fuzzy partitions, thus motivating an extensive experimentation to highlight points of strength and weakness of both. The experimental results show that, while HFP is on the average more efficient, DC* is capable of generating fuzzy partitions with a better trade-off between interpretability and accuracy, and generally offers greater stability with respect to its hyper-parameters.
The common practices of machine learning appear to be frustrated by a number of theoretical results denying the possibility of any meaningful implementation of a “superior” learning algorithm. However, there exist some general assumptions that, even when overlooked, preside the activity of researchers and practitioners. A thorough reflection over such essential premises brings forward the meta-learning approach as the most suitable for escaping the long-dated riddle of induction claiming also an epistemologic soundness. Several examples of meta-learning models can be found in literature, yet the combination of computational intelligence techniques with meta-learning models still remains scarcely explored. Our contribution to this particular research line consists in the realisation of Mindful, a meta-learning system based on the neuro-fuzzy hybridisation. We present the Mindful system firstly situating it inside the general context of the meta-learning frameworks proposed in literature. Finally, a complete session of experiments is illustrated, comprising both base-level and meta-level learning activity. The appreciable experimental results underline the suitability of the Mindful system for managing past accumulated learning experience while facing novel tasks.
In computing with words (CWW), knowledge is linguistically represented and has an explicit semantics defined through fuzzy information granules. The linguistic representation, in turn, naturally bears an implicit semantics that belongs to users reading the knowledge base; hence a necessary condition for achieving interpretability requires that implicit and explicit semantics are cointensive. Interpretability is definitely stringent when knowledge must be acquired from data through inductive learning. Therefore, in this paper we propose a methodology for designing interpretable fuzzy models through semantic cointension. We focus our analysis on fuzzy rule-based classifiers (FRBCs), where we observe that rules resemble logical propositions, thus semantic cointension can be partially regarded as the fulfillment of the "logical view", i.e. the set of basic logical laws that are required in any logical system. The proposed approach is grounded on the employment of a couple of tools: DCf, which extracts interpretable classification rules from data, and Espresso, that is capable of fast minimization of Boolean propositions. Our research demonstrates that it is possible to design models that exhibit good classification accuracy combined with high interpretability in the sense of semantic cointension. Also, structural parameters that quantify model complexity show that the derived models are also simple enough to be read and understood. © 2011 Elsevier Inc. All rights reserved.
The adoption of triangular fuzzy sets to define Strong Fuzzy Partitions (SFPs) is a common practice in the research community: due to their inherent simplicity, triangular fuzzy sets can be easily derived from data by applying suitable clustering algorithms. However, the choice of triangular fuzzy sets may be limiting for the modeling process. In this paper we focus on SFPs built up starting from cuts (points of separation between cluster projections on data dimensions), showing that a SFP based on cuts can always be defined by trapezoidal fuzzy sets. Different mechanisms to derive SFPs from cuts are presented and compared by employing DC*, an algorithm for extracting fuzzy information granules from classified data.
DC* is a method for generating interpretable fuzzy information granules from pre-classified data. It is based on the subsequent application of LVQ1 for data compression and an ad-hoc procedure based on A* to represent data with the minimum number of fuzzy information granules satisfying some interpretability constraints. While being efficient in tackling several problems, the A* procedure included in DC* may happen to require a long computation time because the A* algorithm has exponential time complexity in the worst case. In this paper, we approach the problem of driving the search process of A* by suggesting a close-to-optimal solution that is produced through a Genetic Algorithm (GA). Experimental evaluations show that, by driving the A* algorithm embodied in DC* with a GA solution, the time required to perform data granulation can be reduced from 45% to 99%.
In questo articolo proponiamo l’impiego delle fattorizzazioni matriciali non negative per l’analisi dei dati nell’Educational Data Mining. Il metodo si basa su un processo di decomposizione di un dataset per l’estrazione di informazioni latenti di immediata interpretazione. In particolare, l’applicazione delle fattorizzazioni non negative a score matrix consente di generare in modo automatico le cosiddette question matrix (Q-matrix), che descrivono le abilità necessarie affinché uno studente possa rispondere adeguatamente a questionari di valutazione. Un esempio su dati real-world illustra l’efficacia del metodo.
In this paper we face the problem of intelligently analyze Twitter data. We propose a novel workflow based on Nonnegative Matrix Factorization (NMF) to collect, organize and analyze Twitter data. The proposed workflow firstly fetches tweets from Twitter (according to some search criteria) and processes them using text mining techniques; then it is able to extract latent features from tweets by using NMF, and finally it clusters tweets and extracts human-interpretable topics. We report some preliminary experiments demonstrating the effectiveness of the proposed workflow as a tool for Intelligent Data Analysis (IDA), indeed it is able to extract and visualize interpretable topics from some newly collected Twitter datasets, that are automatically grouped together according to these topics. Furthermore, we numerically investigate the influence of different initializations mechanisms for NMF algorithms on the factorization results when very sparse Twitter data are considered. The numerical comparisons confirm that NMF algorithms can be used as clustering method in place of the well known k-means.
Computing with words (CWW) relies on linguistic representation of knowledge that is processed by operating at the semantical level defined through fuzzy sets. Linguistic representation of knowledge is a major issue when fuzzy rule based models are acquired from data by some form of empirical learning. Indeed, these models are often requested to exhibit interpretability, which is normally evaluated in terms of structural features, such as rule complexity, properties on fuzzy sets and partitions. In this paper we propose a different approach for evaluating interpretability that is based on the notion of cointension. The interpretability of a fuzzy rule-based model is measured in terms of cointension degree between the explicit semantics, defined by the formal parameter settings of the model, and the implicit semantics conveyed to the reader by the linguistic representation of knowledge. Implicit semantics calls for a representation of user's knowledge which is difficult to externalise. Nevertheless, we identify a set of properties - which we call "logical view" - that is expected to hold in the implicit semantics and is used in our approach to evaluate the cointension between explicit and implicit semantics. In practice, a new fuzzy rule base is obtained by minimising the fuzzy rule base through logical properties. Semantic comparison is made by evaluating the performances of the two rule bases, which are supposed to be similar when the two semantics are almost equivalent. If this is the case, we deduce that the logical view is applicable to the model, which can be tagged as interpretable from the cointension viewpoint. These ideas are then used to define a strategy for assessing interpretability of fuzzy rule-based classifiers (FRBCs). The strategy has been evaluated on a set of pre-existent FRBCs, acquired by different learning processes from a well-known benchmark dataset. Our analysis highlighted that some of them are not cointensive with user's knowledge, hence their linguistic representation is not appropriate, even though they can be tagged as interpretable from a structural point of view.
Recommender systems are systems capable of assisting users by quickly providing them with relevant resources according to their interests or preferences. The efficacy of a recommender system is strictly connected with the possibility of creating meaningful user profiles, including information about user preferences, interests, goals, usage data and interactive behavior. In particular, analysis of user preferences is important to predict user behaviors and make appropriate recommendations. In this paper, we present a fuzzy framework to represent, learn and update user profiles. The representation of a user profile is based on a structured model of user cognitive states, including a competence profile, a preference profile and an acquaintance profile. The strategy for deriving and updating profiles is to record the sequence of accessed resources by each user, and to update preference profiles accordingly, so as to suggest similar resources at next user accesses. The adaption of the preference profile is performed continuously, but in earlier stages it is more sensitive to updates (plastic phase) while in later stages it is less sensitive (stable phase) to allow resource recommendation. Simulation results are reported to show the effectiveness of the proposed approach.
In modern cities, everything is connected to Internet and the amount of data available on-line grows dramatically. Humans face two main challenges: i) to extract valuable knowl- edge from the Big Data; ii) to become part of the equation as active actors in the Internet of Things. Fuzzy intelligent systems are currently used in many applications in the context of Smart Cities. Now, it is time to address the effective interaction between intelligent systems and citizens with the aim of passing from Smart to Cognitive Cities. We claim that the use of interpretable fuzzy systems and natural language generation can facilitate such interaction and pave the way towards Cognitive Cities.
Decision support systems in Medicine must be easily comprehensible, both for physicians and patients. In this chapter, the authors describe how the fuzzy modeling methodology called HILK (Highly Interpretable Linguistic Knowledge) can be applied for building highly interpretable fuzzy rule-based classifiers (FRBCs) able to provide medical decision support. As a proof of concept, they describe the case study of a real-world scenario concerning the development of an interpretable FRBC that can be used to predict the evolution of the end-stage renal disease (ESRD) in subjects affected by Immunoglobin A Nephropathy (IgAN). The designed classifier provides users with a number of rules which are easy to read and understand. The rules classify the prognosis of ESRD evolution in IgAN-affected subjects by distinguishing three classes (short, medium, long). Experimental results show that the fuzzy classifier is capable of satisfactory accuracy results – in comparison with Multi-Layer Perceptron (MLP) neural networks – and high interpretability of the knowledge base.
In this paper we illustrate the use of Nonnegative Matrix Factorization (NMF) to analyze real data derived from an e-learning context. NMF is a matrix decomposition method which extracts latent information from data in such a way that it can be easily interpreted by humans. Particularly, the NMF of a score matrix can automatically generate the so called Q-matrix. In an e-learning scenario, the Q-matrix describes the abilities to be acquired by students to correctly answer evaluation exams. An example on real response data illustrates the effectiveness of this factorization method as a tool for EDM.
Many fuzzy algorithms and models are indeed aimed at extracting knowledge from data, and the acquired knowledge must be usually communicated to users. However, as far as such knowledge is difficult to understand by users, the acceptance of such methods may be seriously compromised. Interpretability must be the central point on fuzzy system modeling. Nowadays, interpretability is recognized as one of the most valuable properties of fuzzy systems. Thus, interpretability issues constitute a fruitful and up-to-date research line in the fuzzy community.
Condividi questo sito sui social