Effettua una ricerca
Claudia D'amato
Ruolo
Ricercatore a tempo determinato - tipo B
Organizzazione
Università degli Studi di Bari Aldo Moro
Dipartimento
DIPARTIMENTO DI INFORMATICA
Area Scientifica
AREA 01 - Scienze matematiche e informatiche
Settore Scientifico Disciplinare
INF/01 - Informatica
Settore ERC 1° livello
Non Disponibile
Settore ERC 2° livello
Non Disponibile
Settore ERC 3° livello
Non Disponibile
Following previous works on inductive methods for ABox reasoning, we propose an alternative method for predicting assertions based on available evidence and the analogical criterion. Once neighbors of a test individual are selected through some distance measures, a combination rule descending from Dempster-Shafer theory can join together the evidence provided by the various neighbor individuals in order to predict unknown values in a learning problem. We show how to exploit the procedure in the problems of determining unknown class- and role-memberships or fillers for datatype properties which may be the basis for many further ABox inductive reasoning algorithms. This work presents also an empirical evaluation of the method on real ontologies.
An automated ontology matching methodology is presented, supported by various machine learning techniques, as implemented in the system MoTo. The methodology is two-tiered. On the first stage it uses a meta-learner to elicit certain mappings from those predicted by single matchers induced by a specific base-learner. Then, uncertain mappings are recovered passing through a validation process, followed by the aggregation of the individual predictions through linguistic quantifiers. Experiments on benchmark ontologies demonstrate the effectiveness of the methodology.
One of the bottlenecks of the ontology construction process is the amount of work required with various figures playing a role in it: domain experts contribute their knowledge that has to be formalized by knowledge engineers so that it can be mechanized. As the gap between these roles likely makes the process slow and burdensome, this problem may be tackled by resorting to machine learning tech- niques. By adopting algorithms from inductive logic programming, the effort of the domain expert can be reduced, i.e. he has to label individual resources as instances of the target concept. From those labels, axioms can be induced, which can then be confirmed by the knowledge engineer. In this chapter, we survey existing methods in this area and illustrate three different algorithms in more detail. Some basics like refinement operators, decision trees and information gain are described. Finally, we briefly present implementations of those algorithms.
Efficient resource retrieval is a crucial issue, particularly when semantic resource descriptions are considered which enable the exploitation of reasoning services during the retrieval process. In this context, resources are commonly retrieved by checking if each available resource description satisfies the given query. This approach becomes inefficient with the increase of available resources. We propose a method for improving the retrieval process by constructing a tree index through a new conceptual clustering method for resources expressed as class definitions or as instances of classes in ontology languages. The available resource descriptions are located at the leaf nodes of the index, while inner nodes represent intensional descriptions (generalizations) of their child nodes. The retrieval is performed by following the tree branches whose nodes satisfy the query. Query answering time may be improved as the number of retrieval steps may be O(log n) in the best case.
This work focusses on the problem of clustering resources contained in knowledge bases represented through multi-relational standard languages that are typical for the context of the SemanticWeb, and ultimately founded in Description Logics. The proposed solution relies on effective and language-independent dissimilarity measures that are based on a finite number of dimensions corresponding to a committee of discriminating features, that stands for a context, represented by concept descriptions in Description Logics. The proposed clustering algorithm expresses the possible clusterings in tuples of central elements: in this categorical setting, we resort to the notion of medoid, w.r.t. the given metric. These centers are iteratively adjusted following the rationale of fuzzy clustering approach, i.e. one where the membership to each cluster is not deterministic but graded, ranging in the unit interval. This better copes with the inherent uncertainty of the knowledge bases expressed in Description Logics which adopt an open-world semantics. An extensive experimentation with a number of ontologies proves the feasibility of our method and its effectiveness in terms of major clustering validity indices.
The paper focuses on the task of approximate classification of semantically annotated individual resources in ontological knowledge bases. The method is based on classification models built through kernel methods, a well-known class of effective statistical learning algorithms. Kernel functions encode a notion of similarity among elements of some input space. The definition of a family of parametric language- independent kernel functions for individuals occurring in an ontology allows the application of these statistical learning methods on Semantic Web knowledge bases. The classification models induced by kernel methods offer an alternative way to classify individuals with respect to the typical exact and approximate deductive reasoning procedures. The proposed statistical setting enables further inductive approaches to a variety of other tasks that can better cope with the inherent incompleteness of the knowledge bases in the Semantic Web and with their potential incoherence due to their distributed nature. The effectiveness of the proposed method is empirically proved through experiments on the task of approximate classification with real ontologies collected from standard repositories.
In the context of Semantic Web, one of the most important issues related to the class-membership prediction task (through inductive models) on ontological knowledge bases concerns the imbalance of the training examples distribution, mostly due to the heterogeneous nature and the incompleteness of the knowledge bases. An ensemble learning approach has been proposed to cope with this problem. However, the majority voting procedure, exploited for deciding the membership, does not consider explicitly the uncertainty and the conflict among the classifiers of an ensemble model. Moving from this observation, we propose to integrate the Dempster-Shafer (DS) theory with ensemble learning. Specifically, we propose an algorithm for learning Evidential Terminological Random Forest models, an extension of Terminological Random Forests along with the DS theory. An empirical evaluation showed that: i) the resulting models performs better for datasets with a lot of positive and negative examples and have a less conservative behavior than the voting-based forests; ii) the new extension decreases the variance of the results
Nowadays, building ontologies is a time consuming task since they are mainly manually built. This makes hard the full realization of the Semantic Web view. In order to overcome this issue, machine learning techniques, and specifically inductive learning methods, could be fruitfully exploited for learning models from existing Web data. In this paper we survey methods for (semi-)automatically building and enriching ontologies from existing sources of information such as Linked Data, tagged data, social networks, ontologies. In this way, a large amount of ontologies could be quickly available and possibly only refined by the knowledge engineers. Furthermore, inductive incremental learning techniques could be adopted to perform reasoning at large scale, for which the deductive approach has showed its limitations. Indeed, incremental methods allow to learn models from samples of data and then to refine/enrich the model when new (samples of) data are available. If on one hand this means to abandon sound and complete reasoning procedures for the advantage of uncertain conclusions, on the other hand this could allow to reason on the entire Web. Besides, the adoption of inductive learning methods could make also possible to dial with the intrinsic uncertainty characterizing the Web, that, for its nature, could have incomplete and/or contradictory information.
The increasing availability of structured machine-processable knowl- edge in the W EB OF D ATA calls for machine learning methods to support stan- dard pattern matching and reasoning based services (such as query-answering and inference). Statistical regularities can be efficiently exploited to overcome the limitations of the inherently incomplete knowledge bases distributed across the Web. This paper focuses on the problem of predicting missing class-memberships and property values of individual resources in Web ontologies. We propose a transductive inference method for inferring missing properties about individuals: given a class-membership/property value prediction problem, we address the task of identifying relations encoding similarities between individuals, and efficiently propagating knowledge across their relations
The paper tackles the problem of mining linked open data. The inherent lack of knowledge caused by the openworld assumption made on the semantic of the data model determines an abundance of data of uncertain classification. We present a semi-supervised machine learning approach. Specifically a self-training strategy is adopted which iteratively uses labeled instances to predict a label also for unlabeled instances. The approach is empirically evaluated with an extensive experimentation involving several different algorithms demonstrating the added value yielded by a semi-supervised approach over standard supervised methods.
In the Semantic Web vision of the World Wide Web, content will not only be accessible to humans but will also be available in machine interpretable form as ontological knowledge bases. Ontological knowledge bases enable formal querying and reasoning and, consequently, a main research focus has been the investigation of how deductive reasoning can be utilized in ontological representations to enable more advanced applications. However, purely logic methods have not yet proven to be very effective for several reasons: First, there still is the unsolved problem of scalability of reasoning to Web scale. Second, logical reasoning has problems with uncertain information, which is abundant on Semantic Web data due to its distributed and heterogeneous nature. Third, the construction of ontological knowledge bases suitable for advanced reasoning techniques is complex, which ultimately results in a lack of such expressive real-world data sets with large amounts of instance data. From another perspective, the more expressive structured representations open up new opportunities for data mining, knowledge extraction and machine learning techniques. If moving towards the idea that part of the knowledge already lies in the data, inductive methods appear promising, in particular since inductive methods can inherently handle noisy, inconsistent, uncertain and missing data. While there has been broad coverage of inducing concept structures from less structured sources (text, Web pages), like in ontology learning, given the problems mentioned above, we focus on new methods for dealing with Semantic Web knowledge bases, relying on statistical inference on their standard representations. % extsc{Comment: the last sentence makes no sense to me. Whoever knows what it is supposed to mean: please improve} We argue that machine learning research has to offer a wide variety of methods applicable to different expressivity levels of Semantic Web knowledge bases: Ranging from weakly expressive but widely available knowledge bases in RDF to highly expressive first-order knowledge bases, this paper surveys statistical approaches to mining the Semantic Web. We specifically cover similarity and distance-based methods, kernel machines, multivariate prediction models, relational graphical models and first-order probabilistic learning approaches and discuss their applicability to Semantic Web representations. Finally, we present selected experiments which were conducted on Semantic Web mining tasks for some of the algorithms presented before. This is intended to show the breadth and general potential of this exiting new research and application area for data mining.
In the context of semantic knowledge bases, among the possible problems that may be tackled by means of data-driven inductive strategies, one can consider those that require the prediction of the unknown values of existing numeric features or the definition of new features to be derived from the data model. These problems can be cast as regression problems so that suitable solutions can be devised based on those found for multi-relational databases. In this paper, a new framework for the induction of logical regression trees is presented. Differently from the classic logical regression trees and the recent fork of the terminological classification trees, the novel terminological regression trees aim at predicting continuous values, while tests at the tree nodes are expressed with Description Logic concepts. They are intended for multiple uses with knowledge bases expressed in the standard ontology languages for the Semantic Web. A top-down method for growing such trees is proposed as well as algorithms for making predictions with the trees and deriving rules. The system that implements these methods is experimentally evaluated on ontologies selected from popular repositories.
Semantic Web search is currently one of the hottest research topics in both Web search and the Semantic Web. In previous work, we have presented a novel approach to Semantic Web search, which allows for evaluating ontology-based complex queries that involve reasoning over the Web relative to an underlying background ontology. We have developed the formal model behind this approach, and provided a technique for processing Semantic Web search queries, which consists of an offline ontological inference step and an online reduction to standard Web search. In this paper, we continue this line of research. We further enhance the above approach by the use of inductive rather than deductive reasoning in the offline inference step. This increases the robustness of Semantic Web search, as it adds the important ability to handle inconsistencies, noise, and incompleteness, which are all very likely to occur in distributed and heterogeneous environments such as the Web. The inductive variant also allows to infer new (not logically deducible) knowledge (from training individuals). We report on a prototype implementation of (both the deductive and) the inductive variant of our approach in desktop search, and we provide extensive new experimental results, especially on the running time and the precision and the recall of our new approach.
In the context of semantic knowledge bases, we tackle the problem of ranking resources w.r.t. some criterion. The pro- posed solution is a method for learning functions that can approximately predict the correct ranking. Differently from other related methods proposed, that assume the ranking criteria to be explicitly expressed (e.g. as a query or a func- tion), our approach is data-driven, being able to produce a predictor detecting the implicit underlying criteria from as- sertions regarding the resources in the knowledge base. The usage of specific kernel functions encoding the similarity be- tween individuals in the context of knowledge bases allows the application of the method to ontologies in the standard representations for the Semantic Web. The method is based on a kernelized version of the Perceptron Ranking algo- rithm which is suitable for batch but also online problem settings. Moreover, differently from other approaches based on regression, the method takes advantage from the under- lying ordering on the ranking labels. The reported empirical evaluation proves the effectiveness of the method at the task of predicting the rankings of single users in the Linked User Feedback dataset, by integrating knowledge from the Linked Open Data cloud during the learning process.
We investigate the modeling of uncertain concepts via rough description logics (RDLs), which are an extension of traditional description logics (DLs) by a mechanism to handle approximate concept definitions via lower and upper approximations of concepts based on a rough-set semantics. This allows to apply RDLs to modeling uncertain knowledge. Since these approximations are ultimately grounded on an indiscernibility relation, we explore possible logical and numerical ways for defining such relations based on the considered knowledge. In particular, we introduce the notion of context, allowing for the definition of specific equivalence relations, which are directly used for lower and upper approximations of concepts. The notion of context also allows for defining similarity measures, which are used for introducing a notion of tolerance in the indiscernibility. Finally, we describe several learning problems in our RDL framework.
Knowledge Graphs (KGs) are a widely used formalism for representing knowledge in the Web of Data. We focus on the problem of link prediction, i.e. predicting missing links in large knowledge graphs, so to discover new facts about the world. Representation learning models that embed entities and relation types in continuous vector spaces recently were used to achieve new state-of-the-art link prediction results. A limiting factor in these models is that the process of learning the optimal embedding vectors can be really time-consuming, and might even require days of computations for large KGs. In this work, we propose a principled method for sensibly reducing the learning time, while converging to more accurate link prediction models. Furthermore, we employ the proposed method for training and evaluating a set of novel and scalable models. Our extensive evaluations show significant improvements over state-of-the-art link prediction methods on several datasets.
In the context of semantic knowledge bases, among the possible problems that may be tackled by means of data-driven inductive strategies, one can consider those that require the prediction of the unknown values of existing numeric features or the definition of new features to be derived from the data model. These problems can be cast as regression problems so that suitable solutions can be devised based on those found for multi-relational databases. In this paper, a new framework for the induction of logical regression trees is presented. Differently from the classic logical regression trees and the recent fork of the terminological classification trees, the novel terminological regression trees aim at predicting continuous values, while tests at the tree nodes are expressed with Description Logic concepts. They are intended for multiple uses with knowledge bases expressed in the standard ontology languages for the Semantic Web. A top-down method for growing such trees is proposed as well as algorithms for making predictions with the trees and deriving rules. The system that implements these methods is experimentally evaluated on ontologies selected from popular repositories.
Considering the increasing availability of structured machine processable knowledge in the context of the Semantic Web, only relying on purely deductive inference may be limiting. This work proposes a new method for similarity-based class-membership prediction in Description Logic knowledge bases. The underlying idea is based on the concept of propagating class-membership information among similar individuals; it is non-parametric in nature and characterised by interesting complexity properties, making it a potential candidate for large-scale transductive inference. We also evaluate its effectiveness with respect to other approaches based on inductive inference in SW literature.
Condividi questo sito sui social