Effettua una ricerca
Giovanni Semeraro
Ruolo
Professore Ordinario
Organizzazione
Università degli Studi di Bari Aldo Moro
Dipartimento
DIPARTIMENTO DI INFORMATICA
Area Scientifica
AREA 01 - Scienze matematiche e informatiche
Settore Scientifico Disciplinare
INF/01 - Informatica
Settore ERC 1° livello
Non Disponibile
Settore ERC 2° livello
Non Disponibile
Settore ERC 3° livello
Non Disponibile
The exponential growth of available online information provides computer scientists with many new challenges and opportunities. A recent trend is to analyze people feelings, opinions and orientation about facts and brands: this is done by exploiting Sentiment Analysis techniques, whose goal is to classify the polarity of a piece of text according to the opinion of the writer. In this paper we propose a lexicon-based approach for sentiment classication of Twitter posts. Our approach is based on the exploitation of widespread lexical resources such as SentiWordNet, WordNet-Affect, MPQA and SenticNet. In the experimental session the eectiveness of the approach was evaluated against two state-of-the-art datasets. Preliminary results provide interesting outcomes and pave the way for future research in the area.
Museums have recognized the need of supporting visitors in fulfilling a personalized experience when visiting artwork collections and have started to adopt recommender systems as a way to meet this requirement. Content-based recommender systems analyze features of artworks previously rated by a visitor and build a visitor model or profile, in which preferences and interests are stored, based on those features. For example, the profile of a visitor might store the names of his or her favorite painters or painting techniques, extracted from short textual descriptions associated with artworks. The user profile is then matched against the attributes of new items in order to provide personalized suggestions. The Web 2.0 (r)evolution has changed the game for personalization from ‘elitist’ Web 1.0, written by few and read by many, to web content generated by everyone (user-generated content - UGC). One of the forms of UGC that has drawn most attention of the research community is folksonomy, a taxonomy generated by users who collaboratively annotate and categorize resources of interests with freely chosen keywords called tags. In this work, we investigate the problem of deciding whether folksonomies might be a valuable source of information about user interests in the context of recommending digital artworks. We present FIRSt (Folksonomy-based Item Recommender syStem), a content-based recommender system which integrates UGC through social tagging in a classic content-based model, letting users express their preferences for items by entering a numerical rating as well as by annotating items with free tags. Experiments show that the accuracy of recommendations increases when tags are exploited in the recommendation process to enrich user profiles, provided that tags are not used as a surrogate of the item descriptions, but in conjunction with them. FIRSt has been developed within the CHAT project “Cultural Heritage fruition & e-Learning applications of new Advanced (multimodal) Technologies” – and it is the core of a bouquet of web services designed for personalized museum tours.
Wealth Management is a business model operated by banks and brokers, that offers a broad range of investment services to individual clients, in order to help them reach their investment objectives. Wealth management services include investment advisory, subscription of mandates, sales of financial products, collection of investment orders by clients. Due to the complexity of the task, which largely requires a deep knowledge of the financial domain, a recend trend in the area is to exploit recommendation technologies to support financial advisors and to improve the effectiveness of the process. This paper proposes a framework to support financial advisors in the task of providing clients with personalized investment strategies. Our methodology is based on the exploitation of case-based reasoning. A prototype version of the platform has been adopted to generate personalized portfolios, and the performance of the framework shows that the yield obtained by recommended portfolios overcomes that of portfolios proposed by human advisors in most experimental settings.
This paper proposes two approaches to compositional semantics in distributional semantic spaces. Both approaches conceive the semantics of complex structures, such as phrases or sentences, as being other than the sum of its terms. Syntax is the plus used as a glue to compose words. The former kind of approach encodes information about syntactic dependencies directly into distributional spaces, the latter exploits compositional operators reflecting the syntactic role of words. We present a preliminary evaluation performed on GEMS 2011 “Compositional Semantics” dataset, with the aim of understanding the effects of these approaches when applied to simple word pairs of the kind Noun-Noun, Adjective-Noun and Verb-Noun. Experimental results corroborate our conjecture that exploiting syntax can lead to improved distributional models and compositional operators, and suggest new openings for future uses in real-application scenario.
This work presents a virtual player for the quiz game “Who Wants to Be a Millionaire?”. The virtual player demands linguistic and common sense knowledge and adopts state-of-the-art Natural Language Processing and Question Answering technologies to answer the questions. Wikipedia articles and DBpedia triples are used as knowledge sources and the answers are ranked according to several lexical, syntactic and semantic criteria. Preliminary experiments carried out on the Italian version of the boardgame proves that the virtual player is able to challenge human players.
This paper provides an overview of the work done in the Linked Open Data-enabled Recommender Systems challenge, in which we proposed an ensemble of algorithms based on popularity, Vector Space Model, Random Forests, Logistic Regression, and PageRank, running on a diverse set of semantic features. We ranked 1st in the top-N recommendation task, and 3rd in the tasks of rating prediciton and diversity.
This paper describes a new Word Sense Disambiguation (WSD) algorithm which extends two well-known variations of the Lesk WSD method. Given a word and its context, Lesk algorithm exploits the idea of maximum number of shared words (maximum overlaps) between the context of a word and each definition of its senses (gloss) in order to select the proper meaning. The main contribution of our approach relies on the use of a word similarity function defined on a distributional semantic space to compute the gloss-context overlap. As sense inventory we adopt BabelNet, a large multilingual semantic network built exploiting both WordNet and Wikipedia. Besides linguistic knowledge, BabelNet represents also encyclopedic concepts coming from Wikipedia. The evaluation performed on SemEval-2013 Multilingual Word Sense Disambiguation shows that our algorithm goes beyond the most frequent sense baseline and the simplified version of the Lesk algorithm. Moreover, when compared with the other participants in SemEval-2013 task, our approach is able to outperform the best system for English.
Recommender systems are filters which suggest items or information that might be interesting to users. These systems analyze the past behavior of a user, build her profile that stores information about her interests, and exploit that profile to find potentially interesting items. The main limitation of this approach is that it may provide accurate but likely obvious suggestions, since recommended items are similar to those the user already knows. In this paper we investigate this issue, known as overspecialization or serendipity problem, by proposing a strategy that fosters the suggestion of surprisingly interesting items the user might not have otherwise discovered. The proposed strategy enriches a graph-based recommendation algorithm with background knowledge that allows the system to deeply understand the items it deals with. The hypothesis is that the infused knowledge could help to discover hidden correlations among items that go beyond simple feature similarity and therefore promote non obvious suggestions. Two evaluations are performed to validate this hypothesis: an in-vitro experiment on a subset of the hetrec2011-movielens-2k dataset, and a preliminary user study. Those evaluations show that the proposed strategy actually promotes non obvious suggestions, by narrowing the accuracy loss.
Thanks to the continuous growth of collaborative platforms like YouTube, Flickr and Delicious, we are recently witnessing to a rapid evolution of web dynamics towards a more 'social' vision, called Web 2.0. In this context collaborative tagging systems are rapidly emerging as one of the most promising tools. However, as tags are handled in a simply syntactical way, collaborative tagging systems suffer of typical Information Retrieval (IR) problems like polysemy and synonymy: so, in order to reduce the impact of these drawbacks and to aid at the same time the so-called tag convergence, systems that assist the user in the task of tagging are required. In this paper we present a system, called STaR, that implements an IR-based approach for tag recommendation. Our approach, mainly based on the exploitation of a state-of-the-art IR-model called BM25, relies on two assumptions: firstly, if two or more resources share some common patterns (e.g. the same features in the textual description), we can exploit this information supposing that they could be annotated with similar tags. Furthermore, since each user has a typical manner to label resources, a tag recommender might exploit this information to weigh more the tags she already used to annotate similar resources. We also present an experimental evaluation, carried out using a large dataset gathered from Bibsonomy.
This paper proposes an approach to the construction of WordSpaces which takes into account temporal information. The proposed method is able to build a geometrical space considering several periods of time. This methodology enables the analysis of the time evolution of the meaning of a word. Exploiting this approach, we build a framework, called Temporal Random Indexing (TRI) that provides all the necessary tools for building WordSpaces and performing such linguistic analysis. We propose some examples of usage of our tool by analysing word meanings in two corpora: a collection of Italian books and English scientific papers about computational linguistics.
Nowadays, the proliferation of geographic information systems has caused great interest in integration. However, an integration process is not as simple as joining several systems, since any effort at information sharing runs into the problem of semantic heterogeneity, which requires the identification and representation of all semantics useful in performing schema integration. On several research lines, including research on geographic information system integration, ontologies have been introduced to facilitate knowledge sharing among various agents. Particularly, one of the aspects of ontology sharing is performing some sort of mapping between ontology constructs. Further, some research suggests that we should also be able to combine ontologies where the product of this combination will be, at the very least, the intersection of the two given ontologies. However, few approaches built integrations upon standard and normalized information, which might improve accuracy of mappings and therefore commitment and understandability of the integration. In this work, we propose a novel system (called GeoMergeP) to integrate geographic sources by formalizing their information as normalized ontologies. Our integral merging process including structural, syntactic and semantic aspects assists users in finding the more suitable correspondences. The system has been empirically tested in the context of projects of the Italian Institute for Environmental Protection and Research (ISPRA, ex APAT), providing a consistent and complete integration of their sources. (C) 2011 Elsevier Ltd. All rights reserved.
The textual similarity is a crucial aspect for many extractive text summarization methods. A bag-of-words representation does not allow to grasp the semantic relationships between concepts when comparing strongly related sentences with no words in common. To overcome this issue, in this paper we propose a centroid-based method for text summarization that exploits the compositional capabilities of word embeddings. The evaluations on multi-document and multilingual datasets prove the effectiveness of the continuous vector representation of words compared to the bag-of-words model. Despite its simplicity, our method achieves good performance even in comparison to more complex deep learning models. Our method is unsupervised and it can be adopted in other summarization tasks.
The chapter presents the SWAPTeam participation at the ECML/PKDD 2011 - Discovery Challenge for the task on the cold start problem focused on making recommendations for new video lectures. The developed solution uses a content-based approach because it is less sensitive to the cold start problem that is commonly associated with pure collaborative filtering recommenders. The Challenge organizers encouraged solutions that can actually affect VideoLecture.net, thus the proposed integration strategy is the hybridization by switching. In addition, the surrounding idea for the proposed solution is that providing recommendations about cold items remains a chancy task, thus a computational resource curtailment for such task is a reasonable strategy to control performance trade-off of a day-to-day running system. The main contribution concerns about the compromise between recommendation accuracy and scalability performance of proposed approach.
The paper presents our participation [5] at the ECML/PKDD 2011 - Discovery challenge for the task on the cold start problem. The challenge dataset was gathered from VideoLectures.Net web site that exploits a Recommender System (RS) to guide users during the access to its large multimedia repository of video lectures. Cold start concerns performance issues when new items and new users should be handled by a RS and it is commonly associated with pure collaborative ltering- based RSs. The proposed approach exploits the challenge data to predict the frequencies of pairs of cold items and old items and then the highest values are used to provide recommendations.
Distributional semantics approaches have proven their ability to enhance the performance of overlap-based Word Sense Disambiguation algorithms. This paper shows the application of such a technique for the Italian language, by analysing the usage of two different Distributional Semantic Models built upon ItWaC and Wikipedia corpora, in conjunction with two different functions for leveraging the sense distributions. Results of the experimental evaluation show that the proposed method outperforms both the most frequent sense baseline and other state-of-the-art systems.
The effectiveness of content-based recommendation strategies tremendously depends on the representation formalism adopted to model both items and user profiles. As a consequence, techniques for semantic content representation emerged thanks to their ability to filter out the noise and to face with the issues typical of keyword-based representations. This article presents Contextual eVSM (C-eVSM), a content-based context-aware recommendation framework that adopts a novel semantic representation based on distributional models and entity linking techniques. Our strategy is based on two insights: first, entity linking can identify the most relevant concepts mentioned in the text and can easily map them with structured information sources, easily triggering some inference and reasoning on user preferences, while distributional models can provide a lightweight semantics representation based on term co-occurrences that can bring out latent relationships between concepts by just analying their usage patterns in large corpora of data. The resulting framework is fully domain-independent and shows better performance than state-of-the-art algorithms in several experimental settings, confirming the validity of content-based approaches and paving the way for several future research directions.
In this paper we deal with the problem of providing users with cross-language recommendations by comparing two dierent content- based techniques: the rst one relies on a knowledge-based word sense disambiguation algorithm that uses MultiWordNet as sense inventory, while the latter is based on the so-called distributional hypothesis and exploits a dimensionality reduction technique called Random Indexing in order to build language-independent user proles.
The rapid growth of the so-called Web 2.0 has changed the surfers’ behavior. A new democratic vision emerged, in which users can actively contribute to the evolution of the Web by producing new content or enriching the existing one with user generated metadata. In this context the use of tags, keywords freely chosen by users for describing and organizing resources, spread as a model for browsing and retrieving web contents. The success of that collaborative model is justified by two factors: firstly, information is organized in a way that closely reflects the users’ mental model; secondly, the absence of a controlled vocabulary reduces the users’ learning curve and allows the use of evolving vocabularies. Since tags are handled in a purely syntactical way, annotations provided by users generate a very sparse and noisy tag space that limits the effectiveness for complex tasks. Consequently, tag recommenders, with their ability of providing users with the most suitable tags for the resources to be annotated, recently emerged as a way of speeding up the process of tag convergence. The contribution of this work is a tag recommender system implementing both a collaborative and a content-based recommendation technique. The former exploits the user and community tagging behavior for producing recommendations, while the latter exploits some heuristics to extract tags directly from the textual content of resources. Results of experiments carried out on a dataset gathered from Bibsonomy show that hybrid recommendation strategies can outperform single ones and the way of combining them matters for obtaining more accurate results.
This paper provides an overview of the work done in the ESWC Linked Open Data-enabled Recommender Systems challenge, in which we proposed an ensemble of algorithms based on popularity, Vector Space Model, Random Forests, Logistic Regression, and PageRank, running on a diverse set of semantic features. We ranked 1st in the top-N recommendation task, and 3rd in the tasks of rating prediction and diversity.
In several domains contextual information plays a key role in the recommendation task, since factors such as user location, time of the day, user mood, weather, etc., clearly affect user perception for a particular item. However, traditional recommendation approaches do not take into account contextual information, and this can limit the goodness of the suggestions. In this paper we extend the enhanced Vector Space Model (eVSM) framework in order to model contextual information as well. Specifically, we propose two different context-aware approaches: in the first one we adapt the microprofiling technique, already evaluated in collaborative filtering, to content-based recommendations. Next, we define a contextual modeling technique based on distributional semantics: it builds a context-aware user profile that merges user preferences with a semantic vector space representation of the context itself. In the experimental evaluation we carried out an extensive series of tests in order to determine the best-performing configuration among the proposed ones. We also evaluated Contextual eVSM against a state of the art dataset, and it emerged that our framework overcomes all the baselines in most of the experimental settings.
CroSer (Cross-language Semantic Retrieval) is an IR system able to discover links between e-gov services described in different languages. CroSeR supports public administrators to link their own source catalogs of e-gov services described in any language to a target catalog whose services are described in English and are available in the Linked Open Data (LOD) cloud. Our system is based on a cross-language semantic matching method that i) translates service labels in English using a machine translation tool, ii) extracts a Wikipedia-based semantic repre sentation from the translated service labels using Explicit Semantic Analysis (ESA), iii) evaluates the similarity between two services using their Wikipedia-based representations. The user selects a service in a source catalog and exploits the ranked list of matches suggested by CroSeR to establish a relation (of type narrower, equivalent, or broader match) with other services in the English catalog. The method is independent from the language adopted in the source catalog and it does not assume the availability of information about the services other than very short text descriptions used as service labels. CroSeR is a web application accessible via http://siti-rack.siti.disco.unimib.it:8080/croser/.
CroSer (Cross-language Semantic Retrieval) is an IR system able to discover links between eGov services described in different languages. CroSeR supports public administrators to link their own source catalogs of eGov services described in any language to a target catalog whose services are described in English and are available in the Linked Open Data (LOD) cloud. Our system is based on a cross-language semantic matching method that i) translates service labels in English using a machine translation tool, ii) extracts a Wikipedia-based semantic representation from the translated service labels using Explicit Semantic Analysis (ESA), iii) evaluates the similarity between two services using their Wikipedia-based representations. The user selects a service in a source catalog and exploits the ranked list of matches suggested by CroSeR to establish a relation (of type narrower, equivalent, or broader match) with other services in the English catalog. The method is independent from the language adopted in the source catalog and it does not assume the availability of information about the services other than very short text descriptions used as service labels. CroSeR is a web application accessible via http://siti-rack.siti.disco.unimib.it:8080/croser/3. The work has been partially supported by the Italian PON project PON01 00861 SMART-Services and Meta-services for smART eGovernment.
The exponential growth of the Web is the most influential factor that contributes to the increasing importance of text retrieval and filtering systems. Anyway, since information exists in many languages, users could also consider as relevant documents written in different languages from the one the query is formulated in. In this context, an emerging requirement is to sift through the increasing flood of multilingual text: this poses a renewed challenge for designing effective multilingual Information Filtering systems. How could we represent user information needs or user preferences in a language-independent way? In this paper, we compared two content-based techniques able to provide users with cross-language recommendations: the first one relies on a knowledge-based word sense disambiguation technique that uses MultiWordNet as sense inventory, while the latter is based on a dimensionality reduction technique called Random Indexing and exploits the so-called distributional hypothesis in order to build language-independent user profiles. Since the experiments conducted in a movie recommendation scenario show the effectiveness of both approaches, we tried also to underline strenghts and weaknesses of each approach in order to identify scenarios in which a specific technique fits better.
The large diffusion of e-gov initiatives is increasing the attention of public administrations towards the Open Data initiative. The adoption of open data in the e-gov domain produces different advantages in terms of more transparent government, development of better public services, economic growth and social value. However, the process of data opening should adopt standards and open formats. Only in this way it is possible to share experiences with other service providers, to exploit best practices from other cities or countries, and to be easily connected to the Linked Open Data (LOD) cloud. In this paper we present CroSeR (Cross-language Service Retriever), a tool able to match and retrieve cross-language e-gov services stored in the LOD cloud. The main goal of this work is to help public administrations to connect their e-gov services to services, provided by other administrations, already connected to the LOD cloud. We adopted a Wikipedia-based semantic representation in order to overcome the problems related to match really short textual descriptions associated to the services. A preliminary evaluation on an open catalog of e-gov services showed that the adopted techniques are promising and are more effective than techniques based only on keyword representation.
The exponential growth of the Web is the most influential factor that contributes to the increasing importance of cross-lingual text retrieval and filtering systems. Indeed, relevant information exists in different languages, thus users need to find documents in languages different from the one the query is formulated in. In this context, an emerging requirement is to sift through the increasing flood of multilingual text: this poses a renewed challenge for designing effective multilingual Information Filtering systems. Content-based filtering systems adapt their behavior to individual users by learning their preferences from documents that were already deemed relevant. The learning process aims to construct a profile of the user that can be later exploited in selecting/recommending relevant items. User profiles are generally represented using keywords in a specific language. For example, if a user likes movies whose plots are written in Italian, content-based filtering algorithms will learn a profile for that user which contains Italian words, thus movies whose plots are written in English will be not recommended, although they might be definitely interesting. In this paper, we propose a language-independent content-based recommender system, called MARS (MultilAnguage Recommender System), that builds cross-language user profiles, by shifting the traditional text representation based on keywords, to a more advanced language-independent representation based on word meanings. The proposed strategy relies on a knowledge-based word sense disambiguation technique that exploits MultiWordNet as sense inventory. As a consequence, content-based user profiles become language-independent and can be exploited for recommending items represented in a language different from the one used in the content-based user profile. Experiments conducted in a movie recommendation scenario show the effectiveness of the approach.
Public administrations are aware of the advantages of sharing Open Government Data in terms of transparency, development of improved services, collaboration between stakeholders, and spurring new economic activities. Initiatives for the publication and interlinking of government service catalogs as Linked Open Data (LOD) support the interoperability among European administrations and improve the capability of foreign citizens to access services across Europe. However, linking service catalogs to reference LOD catalogs requires a significant effort from local administrations, preventing the uptake of interoperable solutions at a large scale. The web application presented in this paper is named CroSeR (Cross-language Service Retriever) and supports public bodies in the process of linking their own service catalogs to the LOD cloud. CroSeR supports different European languages and adopts a semantic representation of e-gov services based on Wikipedia. CroSeR tries to overcome problems related to the short textual descriptions associated to a service by embodying a semantic annotation algorithm that enriches service labels with emerging Wikipedia concepts related to the service. An experimental evaluation carried out on e-gov service catalogs in five different languages shows the effectiveness of our model.
The series of DART workshops provides an interactive and focused platform for researchers and practitioners for presenting and discussing new and emerging ideas. Focusing on research and study on new challenges in intelligent information filtering and retrieval, DART aims to investigate novel systems and tools to web scenarios and semantic computing. Therefore, DART contributes to discuss and compare suitable novel solutions based on intelligent techniques and applied in real-world applications. Information Retrieval attempts to address similar filtering and ranking problems for pieces of information such as links, pages, and documents. Information Retrieval systems generally focus on the development of global retrieval techniques, often neglecting individual user needs and preferences. Information Filtering has drastically changed the way information seekers find what they are searching for. In fact, they effectively prune large information spaces and help users in selecting items that best meet their needs, interests, preferences, and tastes. These systems rely strongly on the use of various machine learning tools and algorithms for learning how to rank items and predict user evaluation. Submitted proposals received two or three review reports from Program Committee members. Based on the recommendations of the reviewers, 7 full papers have been selected for publication and presentation at DART 2013. When organizing a scientific conference, one always has to count on the efforts of many volunteers. We are grateful to the members of the Program Committee who devoted a considerable amount of their time in reviewing the submissions to DART 2013. We were glad and happy to work together with highly motivated people to arrange the conference and to publish these proceedings. We appreciate the work of the Publicity Chair Fedelucio Narducci from University of Milan-Bicocca for announcing the workshop on various lists. Special thanks to Cristina Baroglio and Matteo Baldoni for the support and help in managing the workshop organization. We hope that you find these proceedings a valuable source of information on intelligent information filtering and retrieval tools, technologies, and applications.
Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user [1]. They exploit adaptive and intelligent systems technologies and have already proved to be valuable for coping with the information overload problem in several application domains. However, while most of the previous research has focused on recommendation techniques and algorithms, i.e., how to compute precise and accurate recommendations, only few studies have stood from users' angles to consider the processes and issues related to the actual acceptance of the recommendations. Hence, characterizing and evaluating the quality of users' experience and their subjective attitudes toward the recommendations and the recommendation technologies is an important issue that merits the attention of researchers and practitioners. These issues are important and should be studied both by web technology experts and in the human factor field. The main goal of the first workshop on Decision Making and Recommendation Acceptance issues in Recommender Systems (DEMRA) held at UMAP 2011 was to stimulate the discussion around problems, challenges and research directions about the acceptance of recommendation technologies [2].
Users interact with recommender systems to obtain useful information about products or services that may be of interest for them. But, while users are interacting with a recommender system to fulfill a primary task, which is usually the selection of one or more items, they are facing several other decision problems. For instance, they may be requested to select specific feature values (e.g., camera’s size, zoom) as criteria for a search, or they could have to identify features to be used in a critiquing based recommendation session, or they may need to select a repair proposal for inconsistent user preferences when interacting with a recommender. In all these scenarios, and in many others, users of recommender systems are facing decision tasks. The complexity of decision tasks, limited cognitive resources of users, and the tendency to keep the overall decision effort as low as possible is modeled by theories that conjecture “bounded rationality”, i.e., users are exploiting decision heuristics rather than trying to take an optimal. Furthermore, preferences of users will likely change throughout a recommendation session, i.e., preferences are constructed in a specific decision context and users may not fully know their preferences beforehand. Within the scope of a decision process, preferences are strongly influenced by the goals of the customer, existing cognitive constraints, and the personal experience of the customer. Due to the fact that users do not have stable preferences, the interaction mechanisms provided by a recommender system and the information shown to a user can have an enormous impact on the outcome of a decision process. Theories from decision psychology and cognitive psychology have already elaborated a number of methodological tools for explaining and predicting the user behavior in these scenarios. The major goal of this workshop is to establish a platform for industry and academia to present and discuss new ideas and research results that are related to the topic of human decision making in recommender systems. The workshop consists of a mix of six presentations of papers in which results of ongoing research as reported in these proceedings are presented and two invited talks: Bart Knijnenburg presenting “Simplifying privacy decisions: towards interactive and adaptive solutions” and Jill Freyne and Shlomo Berkovsky presenting: “Food Recommendations: Biases that Underpin Ratings”. The workshop is closed by a final discussion session.
This paper presents the preliminary results of a joint research project about Smart Cities. This project is adopting a multi-disciplinary approach that combines artificial intelligence techniques with psychology research to monitor the current state of the city of L’Aquila after the dreadful earthquake of April 2009. This work focuses on the description of a semantic content analysis module. This component, integrated into L’Aquila Social Urban Network (SUN), combines Natural Language Processing (NLP) and Artificial Intelligence (AI) to deeply analyze the content produced by citizens on social platforms in order to map social data with social indicators such as cohesion, sense of belonging and so on. The research carries on the insight that social data can supply a lot of information about latent people feelings, opinion and sentiments. Within the project, this trustworthy snapshot of the city is used by community promoters to proactively propose initiatives aiming at empowering the social capital of the city and recovering the urban structure which has been disrupted after the ’diaspora’ of citizens in the so called ”new towns”.
This paper investigates the role of Distributional Semantic Models (DSMs) into a Question Answering (QA) system. Our purpose is to exploit DSMs for answer re-ranking in QuestionCube, a framework for building QA systems. DSMs model words as points in a geometric space, also known as semantic space. Words are similar if they are close in that space. Our idea is that DSMs approaches can help to compute relatedness between users’ questions and candidate answers by exploiting paradigmatic relations between words, thus providing better answer reranking. Results of the evaluation, carried out on the CLEF2010 QA dataset, prove the effectiveness of the proposed approach.
Personalized electronic program guides help users overcome information overload in the TV and video domain by exploiting recommender systems that automatically compile lists of novel and diverse video assets, based on implicitly or explicitly defined user preferences. In this context, we assume that user preferences can be specified by program genres (documentary, sports, ...) and that an asset can be labeled by one or more program genres, thus allowing an initial and coarse preselection of potentially interesting assets. As these assets may come from various sources, program genre labels may not be consistent among these sources, or not even be given at all, while we assume that each asset has a possibly short textual description. In this paper, we tackle this problem by considering whether those textual descriptions can be effectively used to automatically retrieve the most related TV shows for a specific program genre. More specifically, we compare a statistical approach called logistic regression with an enhanced version of the commonly used vector space model, called random indexing, where the latter is extended by means of a negation operator based on quantum logic. We also apply a new feature generation technique based on explicit semantic analysis for enriching the textual description associated to a TV show with additional features extracted from Wikipedia.
The recent explosion of Big Data is offering new chances and challenges to all those platforms that provide personalized access to information sources, such as recommender systems and personalized search engines. In this context, social networks are gaining more and more interests since they represent a perfect source to trigger personalization tasks. Indeed, users naturally leave on these platforms a lot of data about their preferences, feelings, and friendships. Hence, those data are really valuable for addressing the cold start problem of recommender systems. On the other hand, since content shared on social networks is noisy and heterogeneous, information extracted must be hardly processed to build user profiles that can effectively mirror user interests and needs. In this paper we investigated the effectiveness of external knowledge derived from Wikipedia in representing both documents and user profiles in a recommendation scenario. Specifically, we compared a classical keyword-based representation with two techniques that are able to map unstructured text with Wikipedia pages. The advantage of using this representation is that documents and user profiles become richer, more human-readable, less noisy, and potentially connected to the Linked Open Data (LOD) cloud. The goal of our preliminary experimental evaluation was twofolds: 1) to define the representation that best reflects user preferences; 2) to define the representation that provides the best predictive accuracy. We implemented a news recommender for a preliminary evaluation of our model. We involved more than 50 Facebook and Twitter users and we demonstrated that the encyclopedic-based representation is an effective way for modeling both user profiles and documents.
This paper investigates the role of Distributional Semantic Models (DSMs) in Question Answering (QA), and specifically in a QA system called QuestionCube. QuestionCube is a framework for QA that combines several techniques to retrieve passages containing the exact answers for natural language questions. It exploits Information Retrieval models to seek candidate answers and Natural Language Processing algorithms for the analysis of questions and candidate answers both in English and Italian. The data source for the answer is an unstructured text document collection stored in search indices. In this paper we propose to exploit DSMs in the QuestionCube framework. In DSMs words are represented as mathematical points in a geometric space, also known as semantic space. Words are similar if they are close in that space. Our idea is that DSMs approaches can help to compute relatedness between users’ questions and candidate answers by exploiting paradigmatic relations between words. Results of an experimental evaluation carried out on CLEF2010 QA dataset, prove the effectiveness of the proposed approach.
In this paper we propose an innovative Information Retrieval system able to manage temporal information. The system allows temporal constraints in a classical keyword-based search. Information about temporal events is automatically extracted from text at indexing time and stored in an ad-hoc data structure exploited by the retrieval module for searching relevant documents. Our system can search textual information that refers to specific period of times. We perform an exploratory case study indexing all Italian Wikipedia articles.
A number of works have shown that the aggregation of several information Retrieval (IR) systems works better than each system working individually. Nevertheless, early investigation in the context of CLEF Robust-WSD task, in which semantics is involved, showed that aggregation strategies achieve only slight improvements. This paper proposes a re-ranking approach which relies on inter-document similarities. The novelty of our idea is twofold: the output of a semantic based IR, system is exploited to re-weigh documents and a new strategy based on Semantic Vectors is used to compute inter-document similarities.
E-Government is becoming more attentive towards providing personalized services to citizens so that they can benefit from better services with less time and effort. To develop citizen-centered services, a fundamental activity consists in mining needs and preferences of users by identifying homogeneous groups of users, also known as user segments, sharing similar characteristics. Since the same user often has characteristics shared by several segments, in this work we propose an approach based on fuzzy clustering for inferring user segments that could be properly exploited to offer personalized services that better satisfy user needs and their expectations. User segments are inferred starting from data, gathered by questionnaires, which essentially describe demographic characteristics of users. For each derived segment a user profile is defined which summarizes characteristics shared by users belonging to that segment. Results obtained on a case study are reported in the last part of the paper.
The combination of the use of advanced Information and Communication Technology, especially the Internet, to enable new ways of working, with the enhanced provision of information and interactive services accessible over different channels, is the foundation of a new family of information systems. Particularly, this information explosion on the Web, which threatens our ability to manage information, has affected the geographic information systems. Interoperability is a key word here, since it means, an increasing level of cooperation between information sources on national, regional and local levels; and requires new methods to develop interoperable geographic systems. In this paper, an ontology-driven system (GeoMergeP) is described for the semantic integration of geographic information sources. Particularly, we focus on how ontology matching can be enriched through the use of standards for implementing a semi-automatic matching approach. Then, the requirements and steps of the system are illustrated on the ISPRA (Italian Institute for Environmental Protection and Research) case study. Our preliminary results show that ontology matching can be improved; helping interoperating systems increase reliability of exchanged and shared information.
Recommender Systems have already proved to be valuable for coping with the information overload problem in several application domains. They provide people with suggestions for items which are likely to be of interest for them; hence, a primary function of recommender systems is to help people make good choices and decisions. However, most previous research has focused on recommendation techniques and algorithms, and less attention has been devoted to the decision making processes adopted by the users and possibly supported by the system. There is still a gap between the importance that the community gives to the assessment of recommendation algorithms and the current range of ongoing research activities concerning human decision making. Different decision-psychological phenomena can influence the decision making of users of recommender systems, and research along these lines is becoming increasingly important and popular. This special issue highlights how the coupling of recommendation algorithms with the understanding of human choice and decision making theory has the potential to benefit research and practice on recommender systems and to enable users to achieve a good balance between decision accuracy and decision effort.
The purpose of the Italian Information Retrieval (IIR) workshop series is to provide an international meeting forum for stimulating and disseminating research in Information Retrieval and related disciplines, where researchers, especially early stage Italian researchers, can exchange ideas and present results in an informal way. IIR 2012 took place in Bari, Italy, at the Department of Computer Science, University of Bari Aldo Moro, on January 26‐27, 2012, following the first two successful editions in Padua (2010) and Milan (2011). We received 37 submissions, including full and short original papers with new research results, as well as short papers describing ongoing projects or presenting already published results. Most contributors to IIR 2012 were PhD students and early stage researchers. Each submission was reviewed by at least two members of the Program Committee, and 24 papers were selected on the basis of originality, technical depth, style of presentation, and impact. The 24 papers published in these proceedings cover six main topics: ranking, text classification, evaluation and geographic information retrieval, filtering, content analysis, and information retrieval applications. Twenty papers are written in English and four in Italian. We also include an abstract of the invited talk given by Roberto Navigli (Department of Computer Science, University of Rome “La Sapienza”), who presented a novel approach to Web search result clustering based on the automated discovery of word senses from raw text.
Il progetto “Mappa Italiana dell’Intolleranza” si è posto come principale obiettivo quello di analizzare i contenuti prodotti sulle Reti sociali al fine di misurare il livello di intolleranza del Paese, sulla base di cinque temi: omofobia, razzismo, violenza sulle donne, antisemitismo e disabilità. Il progetto, coordinato da Vox- Osservatorio sui diritti, ha visto la sinergia tra l’Università degli Studi di Milano, l’Università La Sapienza di Roma, ed il Dipartimento di Informatica dell’Università degli Studi di Bari, che ha messo a disposizione una piattaforma di Big Data & Content Analytics per l’analisi semantica di contenuti sociali.
Throughout the last decade, the area of Digital Libraries (DL) get more and more interest from both the research and development communities. Likewise, since the release of new platforms enriches them with new features and makes DL more powerful and effective, the number of web sites integrating these kind of tools is rapidly growing. In this paper we propose an approach for the exploitation of digital libraries for personalization goal in cultural heritage scenario. Specifically, we tried to integrate FIRSt (Folksonomy-based Item Recommender syStem), a content-based recommender system developed at the University of Bari, and Fedora, a flexible digital library architecture, in a framework for the adaptive fruition of cultural heritage implemented within the activities of the CHAT research project. In this scenario, the role of the digital library was to store information (such as textual and multimedial ones) about paintings gathered from the Vatican Picture Gallery and to provide them in a multimodal and personalized way through a PDA device given to a user before her visit in a museum. This paper describes the system architecture of our recommender system and its integration in the framework implemented for the CHAT project, showing how this recommendation model has been applied to recommend the artworks located at the Vatican Picture Gallery (Pinacoteca Vaticana), providing users with a personalized museum tour tailored on their tastes. The experimental evaluation we performed also confirmed that these recommendation services are really able to catch the real user preferences thus improving their experience in cultural heritage fruition.
Traditional Information Retrieval (IR) systems are based on bag-of-words representation. This approach retrieves relevant documents by lexical matching between query and document terms. Due to synonymy and polysemy, lexical methods produce imprecise or incomplete results. In this paper we present how named entities are integrated in SENSE (SEmantic N-levels Search Engine). SENSE is an IR system that tries to overcome the limitations of the ranked keyword approach, by introducing semantic levels which integrate (and not simply replace) the lexical level represented by keywords. Semantic levels provide information about word meanings, as described in a reference dictionary, and named entities. Our aim is to prove that named entities are useful to improve retrieval performance.
This paper proposes an Information Retrieval (IR) system that integrates sense discrimination to overcome the problem of word ambiguity. Word ambiguity is a key problem for systems that have access to textual information. Semantic Vectors are able to divide the usages of a word into different meanings, by discriminating among word meanings on the ground of information available in unannotated corpora. This paper has a twofold goal: the former is to evaluate the effectiveness of an IR system based on Semantic Vectors, the latter is to describe how they have been integrated in a semantic IR framework to build semantic spaces of words and documents. To achieve the first goal, we performed an in vivo evaluation in an IR scenario and we compared the method based on sense discrimination to a method based on Word Sense Disambiguation (WSD). Contrarily to sense discrimination, which aims to discriminate among different meanings not necessarily known a priori, WSD is the task of selecting a sense for a word from a set of predefined possibilities. To accomplish the second goal, we integrated Semantic Vectors in a semantic search engine called SENSE (SEmantic N-levels Search Engine).
Intelligent Information Access techniques attempt to overcome the limitations of current search devices by providing personalized information items and product/service recommendations. They normally utilize direct or indirect user input and facilitate the information search and decision processes, according to user needs, preferences and usage patterns. Recent developments at the intersection of Information Retrieval, Information Filtering, Machine Learning, User Modelling, Natural Language Processing and Human- Computer Interaction offer novel solutions that empower users to go beyond single-session lookup tasks and that aim at serving the more complex requirement: “Tell me what I don’t know that I need to know”. Information filtering systems, specifically recommender systems, have been revolutionizing the way information seekers find what they want, because they effectively prune large information spaces and help users in selecting items that best meet their needs and preferences. Recommender systems rely strongly on the use of various machine learning tools and algorithms for learning how to rank, or predict user evaluation, of items. Information Retrieval systems, on the other hand, also attempt to address similar filtering and ranking problems for pieces of information such as links, pages, and documents. But they generally focus on the development of global retrieval techniques, often neglecting individual user needs and preferences. The book aims to investigate current developments and new insights into methods, techniques and technologies for intelligent information access from a multidisciplinary perspective. It comprises six chapters authored by participants in the research event Intelligent Information Access, held in Cagliari (Italy) in December 2008. In Chapter 1, Enhancing Conversational Access to Information through a Socially Intelligent Agent, Berardina De Carolis, Irene Mazzotta and Nicole Novielli emphasize the role of Embodied Conversational Agents (ECAs) as a natural interaction metaphor for personalized and context-adapted access to information. They propose a scalable architecture for the development of ECAs able to exhibit an emotional state and/or social signs. VI Preface The automatic detection of emotions in text is the problem investigated in Chapter 2, Annotating and Identifying Emotions in Text, by Carlo Strapparava and Rada Mihalcea. The authors describe the “Affective Text” task, presented at SEMEVAL- 2007. The task focused on classifying emotions in news headlines, and was intended to explore the connection between emotions and lexical semantics. After illustrating the data set, the rationale of the task and a brief description of the participating systems, several experiments on the automatic annotation of emotions in text are presented. The practical applications of the task are very important. Consider for example opinion mining and market analysis, affective computing, natural language interfaces for e-learning environments or educational games. Personalization of the ranking computed by search engines and recommender systems is the main topic of Chapter 3, Improving Ranking by Respecting the Multidimensionality and Uncertainty of User Preferences, by Bettina Berendt and Veit Koppen. The research question addressed by the authors is whether system ranking is the “right ranking” for the user, based on the context in which she/he operates. A general conceptualization of the ranking-evaluation task is proposed: the comparison between the ranking generated by a computational system, and the “ user’s ideal ranking”. Eight challenges to this simple model are discussed, leading to the conclusion that approaches for dealing with multidimensional, and often only partial, preference orders are required and that randomness could be a beneficial feature of system rankings. In Chapter 4, Hotho reviews
As an interactive intelligent system, recommender systems are developed to suggest items that match users’ preferences. Since the emergence of recommender systems, a large majority of research has focused on objective accuracy criteria and less attention has been paid to how users interact with the system and the efficacy of interface designs from users’ perspectives. The field has reached a point where it is ready to look beyond algorithms, into users’ interactions, decision making processes and overall experience. Accordingly, the goals of the workshop are to explore the human aspects of recommender systems, with a particular focus on the impact of interfaces and interaction design on decision-making and user experiences with recommender systems, and to explore methodologies to evaluate these human aspects of the recommendation process that go beyond traditional automated approaches. The aim is to bring together researchers and practitioners around the topics of designing and evaluating novel intelligent interfaces for recommender systems in order to: (1) share research and techniques, including new design technologies and evaluation methodologies (2) identify next key challenges in the area, and (3) identify emerging topics. The workshop covers three interrelated themes: a) user interfaces (e.g. visual interfaces, explanations), b) interaction, user modeling and decision-making (e.g. decision theories, argumentation, detection and avoidance of biases), and c) evaluation (e.g. case studies and empirical evaluations). This workshop aims at creating an interdisciplinary community with a focus on the interface design issues for recommender systems and promoting collaboration opportunities between researchers and practitioners. The workshop consists of a mix of eight presentations of papers in which results of ongoing research as reported in these proceedings are presented and one invited talk by Julita Vassileva presenting “Visualization and User Control of Recommender Systems”. The workshop is closed by a final discussion session.
With the increasing amount of information in electronic form the fields of Machine Learning and Data Mining continue to grow by providing new advances in theory, applications and systems. The aim of this paper is to consider some recent theoretical aspects and approaches to ML and DM with an emphasis on the Italian research.
The exponential growth of the Web is the most influential factor that contributes to the increasing importance of cross-lingual text retrieval and filtering systems. Indeed, relevant information exists in different languages, thus users need to find documents in languages different from the one the query is formulated in. In this context, an emerging requirement is to sift through the increasing flood of multilingual text: this poses a renewed challenge for designing effective multilingual Information Filtering systems. Content-based filtering systems adapt their behavior to individual users by learning their preferences from documents that were already deemed relevant. The learning process aims to construct a profile of the user that can be later exploited in selecting/recommending relevant items. User profiles are generally represented using keywords in a specific language. For example, if a user likes movies whose plots are written in Italian, a content-based filtering algorithm will learn a profile for that user which contains Italian words, thus failing in recommending movies whose plots are written in English, although they might be definitely interesting. Moreover, keywords suffer of typical Information Retrieval-related problems such as polysemy and synonymy. In this paper, we propose a language-independent content-based recommender system, called MARS (MultilAnguage Recommender System), that builds cross-language user profiles, by shifting the traditional text representation based on keywords, to a more complex language-independent representation based on word meanings. The proposed strategy relies on a knowledge-based word sense disambiguation technique that exploits MultiWordNet as sense inventory. As a consequence, content-based user profiles become language-independent and can be exploited for recommending items represented in a language different from the one used in the content-based user profile. Experiments conducted in a movie recommendation scenario show the effectiveness of the approach.
This paper presents MyMusic, a system that exploits social media sources for generating personalized music playlists. This work is based on the idea that information extracted from social networks, such as Facebook and Last.fm, might be effectively exploited for personalization tasks. Indeed, information related to music preferences of users can be easily gathered from social platforms and used to define a model of user interests. The use of social media is a very cheap and effective way to overcome the classical cold start problem of recommender systems. In this work we enriched social media-based playlists with new artists related to those the user already likes. Specically, we compare two different enrichment techniques: the first leverages the knowledge stored on DBpedia, the structured version of Wikipedia, while the second is based on the content-based similarity between descriptions of artists. The final playlist is ranked and finally presented to the user that can listen to the songs and express her feedbacks. A prototype version of MyMusic was made available online in order to carry out a preliminary user study to evaluate the best enrichment strategy. The preliminary results encouraged keeping on this research.
In the last years, hundreds of social networks sites have been launched with both professional (e.g., LinkedIn) and non-professional (e.g., MySpace, Facebook) orientations. This resulted in a renewed information overload problem, but it also provided a new and unforeseen way of gathering useful, accurate and constantly updated information about user interests and tastes. Content-based recommender systems can leverage the wealth of data emerging by social networks for building user profiles in which representations of the user interests are maintained. The idea proposed in this paper is to extract content-based user profiles from the data available in the LinkedIn social network, to have an image of the users' interests that can be used to recommend interesting academic research papers. A preliminary experiment provided interesting results which deserve further attention.
In this paper we present our participation as SWAPTeam at the ECML/PKDD 2011 - Discovery challenge for the task on the cold start problem focused on making recommendations for new video lectures. The main idea is to use a content-based approach because it is less sensitive to the cold start problem that is commonly associated with pure collaborative filtering recommenders. The strategy for the integration by hybridization and the scalability performance affect the developed components.
The exponential growth of the Web is the most influential factor that contributes to the increasing importance of cross-lingual text retrieval and filtering systems. Indeed, relevant information exists in different languages, thus users need to find documents in languages different from the one the query is formulated in. In this context, an emerging requirement is to sift through the increasing flood of multilingual text: this poses a renewed challenge for designing effective multilingual Information Filtering systems. In this paper, we propose a language-independent content-based recommender system, called MARS (MultilAnguage Recommender System), that builds cross-language user profiles, by shifting the traditional text representation based on keywords, to a more complex language-independent representation based on word meanings. As a consequence, the recommender system is able to suggest items represented in a language different from the one used in the content-based user profile. Experiments conducted in a movie recommendation scenario show the effectiveness of the approach.
On internet today, an overabundance of information can be accessed, making it difficult for users to process and evaluate options and make appropriate choices. This phenomenon is known as information overload. Over time, various methods of information filtering have been introduced in order to assist users in choosing what may be of their interest. Recommender Systems (RS) [14] are techniques for information filtering which play an important role in e-commerce, advertising, e-mail filtering, etc. Therefore, RS are an answer, though partial, to the problem of information overload. Recommendation algorithms need to be continuously updated because of a constant increase in both the quantity of information and ways of access to that information, which define the different contexts of information use. The research of more effective and more efficient methods than those currently known in literature is also stimulated by the interests of industrial research in this field, as demonstrated by the Netflix Prize Contest, the open competition for the best algorithm to predict user ratings for films, based on previous ratings. The contest showed the superiority of mathematical methods that discover latent factors which drives user-item similarity, with respect to classical collaborative filtering algorithms. With the ever-increasing information available in digital archives and textual databases, the challenge of implementing personalized filters has become the challenge of designing algorithms able to manage huge amounts of data for the elicitation of user needs and preferences. In recent years, matrix factorization techniques have proved to be a quite promising solution to the problem of designing efficient filtering algorithms in the Big Data Era. The main contribution of this paper is an analysis of these methods, which focuses on tensor factorization techniques, as well as the definition of a method for tensor factorization suitable for recommender systems.
On internet today, an overabundance of information can be accessed, making it difficult for users to process and evaluate options and make appropriate choices. This phenomenon is known as Information Overload. Over time, various methods of information filtering have been introduced in order to assist users in choosing what may be of interest to them. Recommender Systems (RS) are a technique for filtering information and play an important role in e-commerce, advertising, e-mail filtering etc. Therefore, RS are an answer, though partial, to the problem of Information Overload. Algorithms behind the recommendation techniques need to be continuously updated because of a constant increase in both the quantity of information and the availability of modes of access to that information, which define the different contexts of information use. The research of more effective and more efficient methods than those currently known in literature is also stimulated by the interests of industrial research in this field, as demonstrated by the Netflix Prize Competition. The Company, which gives its name to the award, has invested one million dollars in acknowledgement of the best collaborative filtering algorithm that improves the accuracy of its own adopted RS, evidently in the belief that the RS can provide a competitive advantage. The mathematical techniques discussed in this article seem to be, at present, the most feasible way to calculate more efficient and accurate recommendations. The main contribution of this paper is a survey about matrix and tensor factorization techniques adopted in the literature of RS. In particular, the discussion focuses on recent applications of High Order Singular Value Decomposition (HOSVD) in the area of information filtering and retrieval (Section IV). Finally, we suggest the application of PARAFAC (PARAllel FACtor) to multidimensional data for the computation of context-aware recommendations.
Recommender Systems suggest items that are likely to be the most interesting for users, based on the feedback, i.e. ratings, they provided on items already experienced in the past. Time-aware Recommender Systems (TARS) focus on temporal context of ratings in order to track the evolution of user preferences and to adapt suggestions accordingly. In fact, some people's interests tend to persist for a long time, while others change more quickly, because they might be related to volatile information needs. In this paper, we focus on the problem of building an effective profile for short-term preferences. A simple approach is to learn the short-term model from the most recent ratings, discarding older data. It is based on the assumption that the more recent the data is, the more it contributes to find items the user will shortly be interested in. We propose an improvement of this classical model, which tracks the evolution of user interests by exploiting the content of the items, besides time information on ratings. When a new item-rating pair comes, the replacement of an older one is performed by taking into account both a decay function for user interests and content similarity between items, computed by distributional semantics models. Experimental results confirm the effectiveness of the proposed approach.
Recommender Systems try to assist users to access complex information spaces regarding their long term needs and preferences. Various recommendation techniques have been investigated and each one has its own strengths and weaknesses. Especially, content-based techniques suffer of overspecialization problem. We propose to inject diversity in the recommendation task by exploiting the content-based user profile to spot potential surprising suggestions. In addition, the actual selection of serendipitous items is motivated by an applicative scenario. Thus, the reference scenario concerns personalized tours in a museum and serendipitous items are introduced by slight diversions on the context-aware tours.
This paper describes OTTHO (On the Tip of my THOught), a system designed for solving a language game, called Guillotine. The rule of the game is simple: the player observes five words, generally unrelated to each other, and in one minute she has to provide a sixth word, semantically connected to the others. The system performs retrieval from several knowledge sources, such as a dictionary, a set of proverbs, and Wikipedia to realize a knowledge infusion process. The main motivation for designing an artificial player for Guillotine is the challenge of providing the machine with the cultural and linguistic background knowledge which makes it similar to a human being, with the ability of interpreting natural language documents and reasoning on their content. Our feeling is that the approach presented in this work has a great potential for other more practical applications besides solving a language game.
Information about top-ranked documents plays a key role to improve retrieval performance. One of the most common strategies which exploits this kind of information is relevance feedback. Few works have investigated the role of negative feedback on retrieval performance. This is probably due to the difficulty of dealing with the concept of non-relevant document. This paper proposes a novel approach to document re-ranking, which relies on the concept of negative feedback represented by non-relevant documents. In our model the concept of non-relevance is defined as a quantum operator in both the classical Vector Space Model and a Semantic Document Space. The latter is induced from the original document space using a distributional approach based on Random Indexing. The evaluation carried out on a standard document collection shows the effectiveness of the proposed approach and opens new perspectives to address the problem of quantifying the concept of non-relevance.
This volume focuses on new challenges in distributed Information Filtering and Retrieval. It collects invited chapters and extended research contributions from the DART 2011 Workshop, held in Palermo (Italy), on September 2011, and co-located with the XII International Conference of the Italian Association on Artificial Intelligence. The main focus of DART was to discuss and compare suitable novel solutions based on intelligent techniques and applied to real-world applications. The chapters of this book present a comprehensive review of related works and state of the art. Authors, both practitioners and researchers, shared their results in several topics such as "Multi-Agent Systems", "Natural Language Processing", "Automatic Advertisement", "Customer Interaction Analytics", "Opinion Mining".
In this work, we propose a method for document re-ranking, which exploits negative feedback represented by non-relevant documents. The concept of non-relevance is modelled through the quantum negation operator. The evaluation carried out on a standard collection shows the eectiveness of the proposed method in both the classical Vector Space Model and a Semantic Document Space.
This paper describes OTTHO (On the Tip of my THOught), an artificial player able to solve a very popular language game, called “The Guillotine”, broadcast by the Italian National TV company. The game demands knowledge covering a broad range of topics, such as politics, literature, history, proverbs, and popular culture. The rule of the game is simple: the player observes five words, generally unrelated to each other, and in one minute she has to provide a sixth word, semantically connected to the others. In order to find the solution, a human being has to perform a complex memory retrieval task within the facts retained in her own knowledge, concerning the meanings of thousands of words and their contextual relations. In order to make this task executable by machines, machine reading techniques are exploited for knowledge extraction from the web, while Artificial Intelligence techniques are used to infer new knowledge, in the form of keywords, from the extracted information.
Recommendation of financial investment strategies is a complex and knowledge-intensive task. Typically, financial advisors have to discuss at length with their wealthy clients and have to sift through several investment proposals before finding one able to completely meet investors' needs and constraints. As a consequence, a recent trend in wealth management is to improve the advisory process by exploiting recommendation technologies. This paper proposes a framework for recommendation of asset allocation strategies which combines case-based reasoning with a novel diversification strategy to support financial advisors in the task of proposing diverse and personalized investment portfolios. The performance of the framework has been evaluated by means of an experimental session conducted against 1172 real users, and results show that the yield obtained by recommended portfolios overcomes that of portfolios proposed by human advisors in most experimental settings while meeting the preferred risk profile. Furthermore, our diversification strategy shows promising results in terms of both diversity and average yield.
Wealth management services have become a priority for most financial services organizations firms. As investors are pressing wealth managers to justify their value proposition, turbulence in financial markets reinforced the need to improve the advisory offering with more customized and sophisticated services. As a consequence, a recent trend in wealth management is to improve the advisory process by exploiting recommendation technologies. However, widespread recommendation approaches, such as content-based (CB) and collaborative filtering (CF), can hardly be put into practice in this domain. In fact, in this domain each user is typically modeled through his risk profile and other simple features, while each financial product is described through a rating provided by credit rating agencies, an average yield and the category it belongs to. In this scenario a pure CB strategy is likely to fail since content information is too poor and not meaningful to feed a CB recommendation algorithm. Furthermore, the over-‐specialization problem, typical of CB recommenders, may collide with the fact that turbulence and fluctuations in financial markets suggest to change and diversify the investments over time. Similarly, CF algorithms can hardly be adopted since they may lead to the well-‐known problem of flocking: given that user-‐based CF provides recommendations by assuming that a user is interested in the asset classes other people similar to her already invested in, this could move many similar users to invest in the same asset classes at the same time, making the recommendation algorithm victim of potential trader attacks1. These dynamics suggest to focus on different recommendation paradigms. Given that financial advisors have to analyze and sift through several investment portfolios2 before providing the user with a solution able to meet his investment goals, the insight behind our recommendation framework is to exploit case-‐based reasoning (CBR) to tailor investment proposals on the ground of a case base of previously proposed investments. Our recommendation process is based on the typical CBR workflow and is structured in three different steps: 1) Retrieve and Reuse: retrieval of similar portfolios is performed by representing each user through a feature vector (as feature risk profile, inferred through the standard MiFiD questionnaire3, investment goals, temporal goals, financial experience, and financial situation were chosen. Each feature is represented on a five-‐point ordinal scale, from very low to very high). Next, cosine similarity is adopted to retrieve the most similar users (along with the portfolios they agreed) from the case base. 2) Revise: candidate solutions retrieved by the first step are typically too many to be consulted by a human advisor. Thus, the Revise step further filters this set to obtain the final solutions. To revise the candidate solutions four techniques were compared: a basic (temporal) ranking, a Greedy diversification which implements a Greedy algorithm to select the solutions with the best compromise between quality and diversity and FCV, a novel scoring methodology which computes how close to the optimal one is the distribution of the asset classes in the portfolio. 3) Review and Retain: in the Review step human advisor and client can further discuss and modify the portfolio, before generating the final solution for the user. If the yield obtained by the newly recommended portfolio is accepta
This paper describes the techniques used to build a virtual player for the popular TV game "Who Wants to Be a Millionaire?". The player must answer a series of multiple-choice questions posed in natural language by selecting the correct answer among four different choices. The architecture of the virtual player consists of 1) a Question Answering (QA) module, which leverages Wikipedia and DBpedia datasources to retrieve the most relevant passages of text useful to identify the correct answer to a question, 2) an Answer Scoring (AS) module, which assigns a score to each candidate answer according to different criteria based on the passages of text retrieved by the Question Answering module, and 3) a Decision Making (DM) module, which chooses the strategy for playing the game according to specific rules as wellas to the scores assigned to the candidate answers.We have evaluated both the accuracy of the virtual player to correctly answer to questions of the game, and its ability to play real games in order to earn money. The experiments have been carried out on questions comingfrom the official Italian and English boardgames. The average accuracy of the virtual player for Italian is 79.64%, which is significantly better than the performance of human players, which is equal to 51.33%. The average accuracy of the virtual player for English is 76.41%. The comparison with human players is not carried out for English since, playing successfully the game heavily depends on the players' knowledge about popular culture, and in this experiment we have only involved a sample of Italian players. As regards the ability to play real games, which involves the definition of a proper strategy for the usage of lifelines in order to decide whether to answer to a question even in a condition of uncertainty or to retire from the game by taking the earned money, the virtual player earns € 114,531 on average for Italian, and E 88,878 for English, which exceeds the average amount earned by the human players to a greater extent (€ 5,926 for Italian).
These are the proceedings of the First Workshop on Semantic Technologies meet Recommender Systems & Big Data (SeRSy 2012), held in conjunction with the 11th International Semantic Web Conference (ISWC 2012). People generally need more and more advanced tools that go beyond those implementing the canonical search paradigm for seeking relevant information. A new search paradigm is emerging, where the user perspective is completely reversed: from finding to being found. Recommender Systems may help to support this new perspective, because they have the effect of pushing relevant objects, selected from a large space of possible options, to potentially interested users. To achieve this result, recommendation techniques generally rely on data referring to three kinds of objects: users, items and their relations. The widespread success of Semantic Web techniques, creating a Web of interoperable and machine readable data, can be also beneficial for recommender systems. Indeed, more and more semantic data are published following the Linked Data principles, that enable to set up links between objects in different data sources, by connecting information in a single global data space – the Web of Data. Today, the Web of Data includes different types of knowledge represented in a homogeneous form – sedimentary one (encyclopedic, cultural, linguistic, common-sense, …) and real-time one (news, data streams, …). This data might be useful to interlink diverse information about users, items, and their relations and implement reasoning mechanisms that can support and improve the recommendation process. The challenge is to investigate whether and how this large amount of wide-coverage and linked semantic knowledge can significantly improve the search process in those tasks that cannot be solved merely through a straightforward matching of queries and documents. Such tasks involve finding information from large document collections, categorizing and understanding that information, and producing some product, such as an actionable decision. Examples of such tasks include understanding a health problem in order to make a medical decision, or simply deciding which laptop to buy. Recommender systems support users exactly in those complex tasks. The primary goal of the workshop is to showcase cutting edge research in the intersection of semantic technologies and recommender systems, by taking the best of the two worlds. This combination may provide the Semantic Web community with important realworld scenarios where its potential can be effectively exploited into systems performing complex tasks. We wish to thank all authors who submitted papers and all workshop participants for fruitful discussions. We would like to thank the program committee members and external referees for their timely expertise in carefully reviewing the submissions. We would also like to thank our invited speaker Ora Lassila for his interesting and stimulating talk. October 2012 The Workshop Chairs Marco de Gemmis Tommaso Di Noia Pasquale Lops Thomas Lukasiewicz Giovanni Semeraro
Interacting with a recommender system means to take different decisions such as selecting a song/movie from a recommendation list, selecting specific feature values (e.g., camera’s size, zoom) as criteria, selecting feedback features to be critiqued in a critiquing based recommendation session, or selecting a repair proposal for inconsistent user preferences when interacting with a knowledge-based recommender. In all these scenarios, users have to solve a decision task. The complexity of decision tasks, limited cognitive resources of users, and the tendency to keep the overall decision effort as low as possible lead to the phenomenon of bounded rationality, i.e., users exploit decision heuristics rather than trying to take an optimal decision. Furthermore, preferences of users will likely change throughout a recommendation session, i.e., preferences are constructed in a specific decision environment and users do not know their preferences beforehand. Decision making under bounded rationality is a door opener for different types of non-conscious influences on the decision behavior of a user. Theories from decision psychology and cognitive psychology are trying to explain these influences, for example, decoy effects and defaults can trigger significant shifts in item selection probabilities; in group decision scenarios, the visibility of the preferences of other group members can have a significant impact on the final group decision. The major goal of this workshop was to establish a platform for industry and academia to present and discuss new ideas and research results that are related to the topic of human decision making in recommender systems. The workshop consisted of technical sessions in which results of ongoing research as reported in these proceedings were presented, a keynote talk given by Joseph A. Konstan on “Decision-Making and Recommender Systems: Failures, Successes, and Research Directions” and a wrap up session chaired by Alexander Felfernig.
The vector space model (VSM) emerged for almost three decades as one of the most effective approaches in the area of Information Retrieval (IR), thanks to its good compromise between expressivity, effectiveness and simplicity. Although Information Retrieval and Information Filtering (IF) undoubtedly represent two related research areas, the use of VSM in Information Filtering is much less analyzed, especially for content-based recommender systems. The goal of this work is twofold: first, we investigate the impact of VSM in the area of content-based recommender systems; second, since VSM suffer from well-known problems, such as its high dimensionality and the inability to manage information coming from negative user preferences, we propose techniques able to effectively tackle these drawbacks. Specifically we exploited Random Indexing for dimensionality reduction and the negation operator implemented in the Semantic Vectors open source package to model negative user preferences. Results of an experimental evaluation performed on these enhanced vector space models (eVSM) and the potential applications of these approaches confirm the effectiveness of the model and lead us to further investigate these techniques.
Interacting with a recommender system means to take different decisions such as selecting an item from a recommendation list, selecting a specific item feature value (e.g., camera’s size, zoom) as a search criteria, selecting feedback features to be critiqued in a critiquing based recommendation session, or selecting a repair proposal for inconsistent user preferences when interacting with a knowledge-based recommender. In all these situations, users face a decision task. This workshop (Decisions@RecSys) focuses on approaches for supporting effective and efficient human decision making in different types of recommendation scenarios.
As an interactive intelligent system, recommender systems are developed to give predictions that match users preferences. Since the emergence of recommender systems, a large majority of research focuses on objective accuracy criteria and less attention has been paid to how users interact with the system and the efficacy of interface designs from the end-user perspective. The field has reached a point where it is ready to look beyond algorithms, into users interactions, decision-making processes and overall experience. Accordingly, the goals of this workshop (IntRS@RecSys) are to explore the human aspects of recommender systems, with a particular focus on the impact of interfaces and interaction design on decision-making and user experiences with recommender systems, and to explore methodologies to evaluate these human aspects of the recommendation process that go beyond traditional automated approaches.
This paper proposes an investigation about a re-ranking strategy presented at SIGIR 2010. In that work we describe a re-ranking strategy in which the output of a semantic based IR system is used to re-weigh documents by exploiting inter-document similarities computed on a vector space. The space is built using the Random Indexing technique. The effectiveness of the strategy has been evaluated in the context of the CLEF Ad-Hoc Robust-WSD Task, while in this paper we propose new experiments in the TREC Ad-Hoc Robust Track 2004.
In this paper we exploit Semantic Vectors to develop an IR system. The idea is to use semantic spaces built on terms and documents to overcome the problem of word ambiguity. Word ambiguity is a key issue for those systems which have access to textual information. Semantic Vectors are able to dividing the usages of a word into different meanings, discriminating among word meanings based on information found in unannotated corpora. We provide an in vivo evaluation in an Information Retrieval scenario and we compare the proposed method with another one which exploits Word Sense Disambiguation (WSD). Contrary to sense discrimination, which is the task of discriminating among different meanings (not necessarily known a priori), WSD is the task of selecting a sense for a word from a set of predefined possibilities. The goal of the evaluation is to establish how Semantic Vectors affect the retrieval performance.
Artificial Intelligence technologies are growingly used within several software systems ranging from Web services to mobile applications. It is by no doubt true that the more AI algorithms and methods are used the more they tend to depart from a pure "AI" spirit and end to refer to the sphere of standard software. In a sense, AI seems strongly connected with ideas, methods and tools that are not (yet) used by the general public. On the contrary, a more realistic view of it would be a rich and pervading set of successful paradigms and approaches. Industry is currently perceiving semantic technologies as a key contribution of AI to innovation. In this paper a survey of current industrial experiences is used to discuss different semantic technologies at work in heterogeneous areas, ranging from Web services to semantic search and recommender systems.The resulting picture confirms the vitality of the area and allows to sketch a general taxonomy of approaches, that is the main contribution of this paper.
Today Recommender Systems (RSs) are commonly used with various purposes, especially dealing with e-commerce and information filtering tools. Content-based RSs rely on the concept of similarity between items. It is a common belief that the user is interested in what is similar to what she has already bought/searched/visited. We believe that there are some contexts in which this assumption is wrong: it is the case of acquiring unsearched but still useful items or pieces of information. This is called serendipity. Our purpose is to stimulate users and facilitate these serendipitous encounters to happen. The paper presents a hybrid recommender system that joins a content-based approach and serendipitous heuristics in order to provide also surprising suggestions. The reference scenario concerns with personalized tours in a museum and serendipitous items are introduced by slight diversions on the context-aware tours. Copyright owned by the authors.
“The Guillotine” is a language game whose goal is to predict the unique word that is linked in some way to five words given as clues, generally unrelated to each other. The ability of the human player to find the solution depends on the richness of her cultural background. We designed an artificial player for that game, based on a large knowledge repository built by exploiting several sources available on the web, such as Wikipedia, that provide the system with the cultural and linguistic background needed to understand clues. The “brain” of the system is a spreading activation algorithm that starts processing clues, finds associations between them and words within the knowledge repository, and computes a list of candidate solutions. In this paper we focus on the problem of finding the most promising candidate solution to be provided as the final answer. We improved the spreading algorithm by means of two strategies for finding associations also between candidate solutions and clues. Those strategies allow bidirectional reasoning and select the candidate solution which is the most connected with the clues. Experiments show that the performance of the system is comparable to that of average human players.
Super-sense tagging is the task of annotating each word in a text with a super-sense, i.e. a general concept such as animal, food or person, coming from the general semantic taxonomy defined by the WordNet lexicographer classes. Due to the small set of involved concepts, the task is simpler than Word Sense Disambiguation, which identifies a specific meaning for each word. The small set of concepts allows machine learning algorithms to achieve good performance when coping with the problem of tagging. However, machine learning algorithms suffer from data-sparseness. This problem becomes more evident when lexical features are involved, because test data can contain words with low frequency (or completely absent) in training data. To overcome the sparseness problem, this paper proposes a supervised method for super-sense tagging which incorporates information coming from a distributional space of words built on a large corpus. Results obtained on two standard datasets, SemCor and SensEval-3, show the effectiveness of our approach.
This paper describes the participation of the UNIBA team in the Named Entity rEcognition and Linking (NEEL) Challenge. We propose a completely unsupervised algorithm able to recognize and link named entities in English tweets. The approach combines the simple Lesk algorithm with information coming from both a distributional semantic model and usage frequency of Wikipedia concepts. The results show encouraging performance.
We report the results of UNIBA participation in the first SemEval-2012 Semantic Textual Similarity task. Our systems rely on distributional models of words automatically inferred from a large corpus. We exploit three different semantic word spaces: Random Indexing (RI), Latent Semantic Analysis (LSA) over RI, and vector permutations in RI. Runs based on these spaces consistently outperform the baseline on the proposed datasets.
This paper presents the participation of the semantic N-levels search engine SENSE at the CLEF 2009 Ad Hoc Robust-WSD Task. Our aim is to demonstrate that the combination of the N-levels model and WSD can improve the retrieval performance even when an effective retrieval model is adopted. To reach this aim, we worked on two different strategies. On one hand a model, based on Okapi BM25, was adopted at each level. On the other hand, we integrated a local relevance feedback technique, called Local Context Analysis, in both indexing levels of the system (keyword and word meaning). The hypothesis that Local Context Analysis can be effective even when it works on word meanings coming from a WSD algorithm is supported by experimental results. In monolingual task MAP increased of about 2% exploiting disambiguation, while CMAP increased from 4% to 9% when we used WSD in both mono- and bi- lingual tasks.
This paper describes the UNIBA team participation in the Cross-Level Semantic Similarity task at SemEval 2014. We propose to combine the output of different semantic similarity measures which exploit Word Sense Disambiguation and Distributional Semantic Models, among other lexical features. The integration of similarity measures is performed by means of two supervised methods based on Gaussian Process and Support Vector Machine. Our systems obtained very encouraging results, with the best one ranked 6th out of 38 submitted systems.
This paper describes the UNIBA participation in the Semantic Textual Similarity (STS) core task 2013. We exploited three different systems for computing the similarity between two texts. A system is used as baseline, which represents the best model emerged from our previous participation in STS 2012. Such system is based on a distributional model of semantics capable of taking into account also syntactic structures that glue words together. In addition, we investigated the use of two different learning strategies exploiting both syntactic and semantic features. The former uses a combination strategy in order to combine the best machine learning techniques trained on 2012 training and test sets. The latter tries to overcame the limit of working with different datasets with varying characteristics by selecting only the more suitable dataset for the training purpose.
E-Government is becoming more attentive towards providing intelligent personalized services to citizens so that they can receive better services with less time and effort. This work presents an approach for inferring user segments that could be properly exploited to offer personalized services that better satisfy user needs and their expectations. User segments are derived starting from data that essentially describe demographic characteristics of users and that are gathered by questionnaires. A clustering process is performed on gathered data in order to derive user segments, i.e. groups of users sharing similar characteristics. Finally, for each derived segment, we define a user profile that summarizes characteristics shared by users belonging to the same segment. The suitability of the proposed approach is shown by providing results obtained on a case study.
A primary function of recommender systems is to help their users to make better choices and decisions. The overall goal of the workshop is to analyse and discuss novel techniques and approaches for supporting effective and efficient human decision making in different types of recommendation scenarios. The submitted papers discuss a wide range of topics, from core algorithmic issues to the management of the human computer interaction.
La crescita esponenziale dell'informazione presente in Rete richiede una presenza capillare di sistemi intelligenti in grado di filtrare il flusso informativo e di sottoporre all'utente solo i contenuti per lui più rilevanti.Tipicamente, questo processo avviene analizzando i bisogni, i gusti e le preferenze descritte in una struttura detta profilo dell'utente. Il problema di modellare l'utente in modo efficace è reso però sempre più sfidante nell'era dei Big Data: le moderne piattaforme di modellazione devono essere infatti in grado di fondere dati che crescono in modo rapido, sono di natura diversa (interazioni sociali, contenuti prodotti, dati contestuali) e provengono da sorgenti eterogenee come dispositivi mobili, social network, Open Data, sensori dell'Internet of Things.Nel progetto si intende fornire una soluzione al problema introducendo una visione olistica dell'individuo, basata su tecniche che fondano l'informazione proveniente da più sorgenti eterogenee in un'unica rappresentazione organica che descriva tutte le sfaccettature dell'utente e riesca ad innescare in modo più efficace i processi di personalizzazione e filtraggio.L'implementazione di una piattaforma di Holistic User Modeling creerebbe valore aggiunto in vari scenari applicativi: la pubblica amministrazione potrebbe utilizzare i profili per personalizzare i propri servizi, mentre sfruttando i profili utente l'accesso ai contenuti culturali sarebbe reso personalizzato e più efficace.
Condividi questo sito sui social