Effettua una ricerca
Cataldo Musto
Ruolo
Ricercatore a tempo determinato - tipo A
Organizzazione
Università degli Studi di Bari Aldo Moro
Dipartimento
DIPARTIMENTO DI INFORMATICA
Area Scientifica
AREA 01 - Scienze matematiche e informatiche
Settore Scientifico Disciplinare
INF/01 - Informatica
Settore ERC 1° livello
Non Disponibile
Settore ERC 2° livello
Non Disponibile
Settore ERC 3° livello
Non Disponibile
The exponential growth of available online information provides computer scientists with many new challenges and opportunities. A recent trend is to analyze people feelings, opinions and orientation about facts and brands: this is done by exploiting Sentiment Analysis techniques, whose goal is to classify the polarity of a piece of text according to the opinion of the writer. In this paper we propose a lexicon-based approach for sentiment classication of Twitter posts. Our approach is based on the exploitation of widespread lexical resources such as SentiWordNet, WordNet-Affect, MPQA and SenticNet. In the experimental session the eectiveness of the approach was evaluated against two state-of-the-art datasets. Preliminary results provide interesting outcomes and pave the way for future research in the area.
Museums have recognized the need of supporting visitors in fulfilling a personalized experience when visiting artwork collections and have started to adopt recommender systems as a way to meet this requirement. Content-based recommender systems analyze features of artworks previously rated by a visitor and build a visitor model or profile, in which preferences and interests are stored, based on those features. For example, the profile of a visitor might store the names of his or her favorite painters or painting techniques, extracted from short textual descriptions associated with artworks. The user profile is then matched against the attributes of new items in order to provide personalized suggestions. The Web 2.0 (r)evolution has changed the game for personalization from ‘elitist’ Web 1.0, written by few and read by many, to web content generated by everyone (user-generated content - UGC). One of the forms of UGC that has drawn most attention of the research community is folksonomy, a taxonomy generated by users who collaboratively annotate and categorize resources of interests with freely chosen keywords called tags. In this work, we investigate the problem of deciding whether folksonomies might be a valuable source of information about user interests in the context of recommending digital artworks. We present FIRSt (Folksonomy-based Item Recommender syStem), a content-based recommender system which integrates UGC through social tagging in a classic content-based model, letting users express their preferences for items by entering a numerical rating as well as by annotating items with free tags. Experiments show that the accuracy of recommendations increases when tags are exploited in the recommendation process to enrich user profiles, provided that tags are not used as a surrogate of the item descriptions, but in conjunction with them. FIRSt has been developed within the CHAT project “Cultural Heritage fruition & e-Learning applications of new Advanced (multimodal) Technologies” – and it is the core of a bouquet of web services designed for personalized museum tours.
Wealth Management is a business model operated by banks and brokers, that offers a broad range of investment services to individual clients, in order to help them reach their investment objectives. Wealth management services include investment advisory, subscription of mandates, sales of financial products, collection of investment orders by clients. Due to the complexity of the task, which largely requires a deep knowledge of the financial domain, a recend trend in the area is to exploit recommendation technologies to support financial advisors and to improve the effectiveness of the process. This paper proposes a framework to support financial advisors in the task of providing clients with personalized investment strategies. Our methodology is based on the exploitation of case-based reasoning. A prototype version of the platform has been adopted to generate personalized portfolios, and the performance of the framework shows that the yield obtained by recommended portfolios overcomes that of portfolios proposed by human advisors in most experimental settings.
This paper provides an overview of the work done in the Linked Open Data-enabled Recommender Systems challenge, in which we proposed an ensemble of algorithms based on popularity, Vector Space Model, Random Forests, Logistic Regression, and PageRank, running on a diverse set of semantic features. We ranked 1st in the top-N recommendation task, and 3rd in the tasks of rating prediciton and diversity.
Recommender systems are filters which suggest items or information that might be interesting to users. These systems analyze the past behavior of a user, build her profile that stores information about her interests, and exploit that profile to find potentially interesting items. The main limitation of this approach is that it may provide accurate but likely obvious suggestions, since recommended items are similar to those the user already knows. In this paper we investigate this issue, known as overspecialization or serendipity problem, by proposing a strategy that fosters the suggestion of surprisingly interesting items the user might not have otherwise discovered. The proposed strategy enriches a graph-based recommendation algorithm with background knowledge that allows the system to deeply understand the items it deals with. The hypothesis is that the infused knowledge could help to discover hidden correlations among items that go beyond simple feature similarity and therefore promote non obvious suggestions. Two evaluations are performed to validate this hypothesis: an in-vitro experiment on a subset of the hetrec2011-movielens-2k dataset, and a preliminary user study. Those evaluations show that the proposed strategy actually promotes non obvious suggestions, by narrowing the accuracy loss.
Thanks to the continuous growth of collaborative platforms like YouTube, Flickr and Delicious, we are recently witnessing to a rapid evolution of web dynamics towards a more 'social' vision, called Web 2.0. In this context collaborative tagging systems are rapidly emerging as one of the most promising tools. However, as tags are handled in a simply syntactical way, collaborative tagging systems suffer of typical Information Retrieval (IR) problems like polysemy and synonymy: so, in order to reduce the impact of these drawbacks and to aid at the same time the so-called tag convergence, systems that assist the user in the task of tagging are required. In this paper we present a system, called STaR, that implements an IR-based approach for tag recommendation. Our approach, mainly based on the exploitation of a state-of-the-art IR-model called BM25, relies on two assumptions: firstly, if two or more resources share some common patterns (e.g. the same features in the textual description), we can exploit this information supposing that they could be annotated with similar tags. Furthermore, since each user has a typical manner to label resources, a tag recommender might exploit this information to weigh more the tags she already used to annotate similar resources. We also present an experimental evaluation, carried out using a large dataset gathered from Bibsonomy.
The effectiveness of content-based recommendation strategies tremendously depends on the representation formalism adopted to model both items and user profiles. As a consequence, techniques for semantic content representation emerged thanks to their ability to filter out the noise and to face with the issues typical of keyword-based representations. This article presents Contextual eVSM (C-eVSM), a content-based context-aware recommendation framework that adopts a novel semantic representation based on distributional models and entity linking techniques. Our strategy is based on two insights: first, entity linking can identify the most relevant concepts mentioned in the text and can easily map them with structured information sources, easily triggering some inference and reasoning on user preferences, while distributional models can provide a lightweight semantics representation based on term co-occurrences that can bring out latent relationships between concepts by just analying their usage patterns in large corpora of data. The resulting framework is fully domain-independent and shows better performance than state-of-the-art algorithms in several experimental settings, confirming the validity of content-based approaches and paving the way for several future research directions.
In this paper we deal with the problem of providing users with cross-language recommendations by comparing two dierent content- based techniques: the rst one relies on a knowledge-based word sense disambiguation algorithm that uses MultiWordNet as sense inventory, while the latter is based on the so-called distributional hypothesis and exploits a dimensionality reduction technique called Random Indexing in order to build language-independent user proles.
The rapid growth of the so-called Web 2.0 has changed the surfers’ behavior. A new democratic vision emerged, in which users can actively contribute to the evolution of the Web by producing new content or enriching the existing one with user generated metadata. In this context the use of tags, keywords freely chosen by users for describing and organizing resources, spread as a model for browsing and retrieving web contents. The success of that collaborative model is justified by two factors: firstly, information is organized in a way that closely reflects the users’ mental model; secondly, the absence of a controlled vocabulary reduces the users’ learning curve and allows the use of evolving vocabularies. Since tags are handled in a purely syntactical way, annotations provided by users generate a very sparse and noisy tag space that limits the effectiveness for complex tasks. Consequently, tag recommenders, with their ability of providing users with the most suitable tags for the resources to be annotated, recently emerged as a way of speeding up the process of tag convergence. The contribution of this work is a tag recommender system implementing both a collaborative and a content-based recommendation technique. The former exploits the user and community tagging behavior for producing recommendations, while the latter exploits some heuristics to extract tags directly from the textual content of resources. Results of experiments carried out on a dataset gathered from Bibsonomy show that hybrid recommendation strategies can outperform single ones and the way of combining them matters for obtaining more accurate results.
This paper provides an overview of the work done in the ESWC Linked Open Data-enabled Recommender Systems challenge, in which we proposed an ensemble of algorithms based on popularity, Vector Space Model, Random Forests, Logistic Regression, and PageRank, running on a diverse set of semantic features. We ranked 1st in the top-N recommendation task, and 3rd in the tasks of rating prediction and diversity.
In several domains contextual information plays a key role in the recommendation task, since factors such as user location, time of the day, user mood, weather, etc., clearly affect user perception for a particular item. However, traditional recommendation approaches do not take into account contextual information, and this can limit the goodness of the suggestions. In this paper we extend the enhanced Vector Space Model (eVSM) framework in order to model contextual information as well. Specifically, we propose two different context-aware approaches: in the first one we adapt the microprofiling technique, already evaluated in collaborative filtering, to content-based recommendations. Next, we define a contextual modeling technique based on distributional semantics: it builds a context-aware user profile that merges user preferences with a semantic vector space representation of the context itself. In the experimental evaluation we carried out an extensive series of tests in order to determine the best-performing configuration among the proposed ones. We also evaluated Contextual eVSM against a state of the art dataset, and it emerged that our framework overcomes all the baselines in most of the experimental settings.
The exponential growth of the Web is the most influential factor that contributes to the increasing importance of text retrieval and filtering systems. Anyway, since information exists in many languages, users could also consider as relevant documents written in different languages from the one the query is formulated in. In this context, an emerging requirement is to sift through the increasing flood of multilingual text: this poses a renewed challenge for designing effective multilingual Information Filtering systems. How could we represent user information needs or user preferences in a language-independent way? In this paper, we compared two content-based techniques able to provide users with cross-language recommendations: the first one relies on a knowledge-based word sense disambiguation technique that uses MultiWordNet as sense inventory, while the latter is based on a dimensionality reduction technique called Random Indexing and exploits the so-called distributional hypothesis in order to build language-independent user profiles. Since the experiments conducted in a movie recommendation scenario show the effectiveness of both approaches, we tried also to underline strenghts and weaknesses of each approach in order to identify scenarios in which a specific technique fits better.
The exponential growth of the Web is the most influential factor that contributes to the increasing importance of cross-lingual text retrieval and filtering systems. Indeed, relevant information exists in different languages, thus users need to find documents in languages different from the one the query is formulated in. In this context, an emerging requirement is to sift through the increasing flood of multilingual text: this poses a renewed challenge for designing effective multilingual Information Filtering systems. Content-based filtering systems adapt their behavior to individual users by learning their preferences from documents that were already deemed relevant. The learning process aims to construct a profile of the user that can be later exploited in selecting/recommending relevant items. User profiles are generally represented using keywords in a specific language. For example, if a user likes movies whose plots are written in Italian, content-based filtering algorithms will learn a profile for that user which contains Italian words, thus movies whose plots are written in English will be not recommended, although they might be definitely interesting. In this paper, we propose a language-independent content-based recommender system, called MARS (MultilAnguage Recommender System), that builds cross-language user profiles, by shifting the traditional text representation based on keywords, to a more advanced language-independent representation based on word meanings. The proposed strategy relies on a knowledge-based word sense disambiguation technique that exploits MultiWordNet as sense inventory. As a consequence, content-based user profiles become language-independent and can be exploited for recommending items represented in a language different from the one used in the content-based user profile. Experiments conducted in a movie recommendation scenario show the effectiveness of the approach.
This paper presents the preliminary results of a joint research project about Smart Cities. This project is adopting a multi-disciplinary approach that combines artificial intelligence techniques with psychology research to monitor the current state of the city of L’Aquila after the dreadful earthquake of April 2009. This work focuses on the description of a semantic content analysis module. This component, integrated into L’Aquila Social Urban Network (SUN), combines Natural Language Processing (NLP) and Artificial Intelligence (AI) to deeply analyze the content produced by citizens on social platforms in order to map social data with social indicators such as cohesion, sense of belonging and so on. The research carries on the insight that social data can supply a lot of information about latent people feelings, opinion and sentiments. Within the project, this trustworthy snapshot of the city is used by community promoters to proactively propose initiatives aiming at empowering the social capital of the city and recovering the urban structure which has been disrupted after the ’diaspora’ of citizens in the so called ”new towns”.
Personalized electronic program guides help users overcome information overload in the TV and video domain by exploiting recommender systems that automatically compile lists of novel and diverse video assets, based on implicitly or explicitly defined user preferences. In this context, we assume that user preferences can be specified by program genres (documentary, sports, ...) and that an asset can be labeled by one or more program genres, thus allowing an initial and coarse preselection of potentially interesting assets. As these assets may come from various sources, program genre labels may not be consistent among these sources, or not even be given at all, while we assume that each asset has a possibly short textual description. In this paper, we tackle this problem by considering whether those textual descriptions can be effectively used to automatically retrieve the most related TV shows for a specific program genre. More specifically, we compare a statistical approach called logistic regression with an enhanced version of the commonly used vector space model, called random indexing, where the latter is extended by means of a negation operator based on quantum logic. We also apply a new feature generation technique based on explicit semantic analysis for enriching the textual description associated to a TV show with additional features extracted from Wikipedia.
The recent explosion of Big Data is offering new chances and challenges to all those platforms that provide personalized access to information sources, such as recommender systems and personalized search engines. In this context, social networks are gaining more and more interests since they represent a perfect source to trigger personalization tasks. Indeed, users naturally leave on these platforms a lot of data about their preferences, feelings, and friendships. Hence, those data are really valuable for addressing the cold start problem of recommender systems. On the other hand, since content shared on social networks is noisy and heterogeneous, information extracted must be hardly processed to build user profiles that can effectively mirror user interests and needs. In this paper we investigated the effectiveness of external knowledge derived from Wikipedia in representing both documents and user profiles in a recommendation scenario. Specifically, we compared a classical keyword-based representation with two techniques that are able to map unstructured text with Wikipedia pages. The advantage of using this representation is that documents and user profiles become richer, more human-readable, less noisy, and potentially connected to the Linked Open Data (LOD) cloud. The goal of our preliminary experimental evaluation was twofolds: 1) to define the representation that best reflects user preferences; 2) to define the representation that provides the best predictive accuracy. We implemented a news recommender for a preliminary evaluation of our model. We involved more than 50 Facebook and Twitter users and we demonstrated that the encyclopedic-based representation is an effective way for modeling both user profiles and documents.
Il progetto “Mappa Italiana dell’Intolleranza” si è posto come principale obiettivo quello di analizzare i contenuti prodotti sulle Reti sociali al fine di misurare il livello di intolleranza del Paese, sulla base di cinque temi: omofobia, razzismo, violenza sulle donne, antisemitismo e disabilità. Il progetto, coordinato da Vox- Osservatorio sui diritti, ha visto la sinergia tra l’Università degli Studi di Milano, l’Università La Sapienza di Roma, ed il Dipartimento di Informatica dell’Università degli Studi di Bari, che ha messo a disposizione una piattaforma di Big Data & Content Analytics per l’analisi semantica di contenuti sociali.
Throughout the last decade, the area of Digital Libraries (DL) get more and more interest from both the research and development communities. Likewise, since the release of new platforms enriches them with new features and makes DL more powerful and effective, the number of web sites integrating these kind of tools is rapidly growing. In this paper we propose an approach for the exploitation of digital libraries for personalization goal in cultural heritage scenario. Specifically, we tried to integrate FIRSt (Folksonomy-based Item Recommender syStem), a content-based recommender system developed at the University of Bari, and Fedora, a flexible digital library architecture, in a framework for the adaptive fruition of cultural heritage implemented within the activities of the CHAT research project. In this scenario, the role of the digital library was to store information (such as textual and multimedial ones) about paintings gathered from the Vatican Picture Gallery and to provide them in a multimodal and personalized way through a PDA device given to a user before her visit in a museum. This paper describes the system architecture of our recommender system and its integration in the framework implemented for the CHAT project, showing how this recommendation model has been applied to recommend the artworks located at the Vatican Picture Gallery (Pinacoteca Vaticana), providing users with a personalized museum tour tailored on their tastes. The experimental evaluation we performed also confirmed that these recommendation services are really able to catch the real user preferences thus improving their experience in cultural heritage fruition.
The exponential growth of the Web is the most influential factor that contributes to the increasing importance of cross-lingual text retrieval and filtering systems. Indeed, relevant information exists in different languages, thus users need to find documents in languages different from the one the query is formulated in. In this context, an emerging requirement is to sift through the increasing flood of multilingual text: this poses a renewed challenge for designing effective multilingual Information Filtering systems. Content-based filtering systems adapt their behavior to individual users by learning their preferences from documents that were already deemed relevant. The learning process aims to construct a profile of the user that can be later exploited in selecting/recommending relevant items. User profiles are generally represented using keywords in a specific language. For example, if a user likes movies whose plots are written in Italian, a content-based filtering algorithm will learn a profile for that user which contains Italian words, thus failing in recommending movies whose plots are written in English, although they might be definitely interesting. Moreover, keywords suffer of typical Information Retrieval-related problems such as polysemy and synonymy. In this paper, we propose a language-independent content-based recommender system, called MARS (MultilAnguage Recommender System), that builds cross-language user profiles, by shifting the traditional text representation based on keywords, to a more complex language-independent representation based on word meanings. The proposed strategy relies on a knowledge-based word sense disambiguation technique that exploits MultiWordNet as sense inventory. As a consequence, content-based user profiles become language-independent and can be exploited for recommending items represented in a language different from the one used in the content-based user profile. Experiments conducted in a movie recommendation scenario show the effectiveness of the approach.
This paper presents MyMusic, a system that exploits social media sources for generating personalized music playlists. This work is based on the idea that information extracted from social networks, such as Facebook and Last.fm, might be effectively exploited for personalization tasks. Indeed, information related to music preferences of users can be easily gathered from social platforms and used to define a model of user interests. The use of social media is a very cheap and effective way to overcome the classical cold start problem of recommender systems. In this work we enriched social media-based playlists with new artists related to those the user already likes. Specically, we compare two different enrichment techniques: the first leverages the knowledge stored on DBpedia, the structured version of Wikipedia, while the second is based on the content-based similarity between descriptions of artists. The final playlist is ranked and finally presented to the user that can listen to the songs and express her feedbacks. A prototype version of MyMusic was made available online in order to carry out a preliminary user study to evaluate the best enrichment strategy. The preliminary results encouraged keeping on this research.
In the last years, hundreds of social networks sites have been launched with both professional (e.g., LinkedIn) and non-professional (e.g., MySpace, Facebook) orientations. This resulted in a renewed information overload problem, but it also provided a new and unforeseen way of gathering useful, accurate and constantly updated information about user interests and tastes. Content-based recommender systems can leverage the wealth of data emerging by social networks for building user profiles in which representations of the user interests are maintained. The idea proposed in this paper is to extract content-based user profiles from the data available in the LinkedIn social network, to have an image of the users' interests that can be used to recommend interesting academic research papers. A preliminary experiment provided interesting results which deserve further attention.
The exponential growth of the Web is the most influential factor that contributes to the increasing importance of cross-lingual text retrieval and filtering systems. Indeed, relevant information exists in different languages, thus users need to find documents in languages different from the one the query is formulated in. In this context, an emerging requirement is to sift through the increasing flood of multilingual text: this poses a renewed challenge for designing effective multilingual Information Filtering systems. In this paper, we propose a language-independent content-based recommender system, called MARS (MultilAnguage Recommender System), that builds cross-language user profiles, by shifting the traditional text representation based on keywords, to a more complex language-independent representation based on word meanings. As a consequence, the recommender system is able to suggest items represented in a language different from the one used in the content-based user profile. Experiments conducted in a movie recommendation scenario show the effectiveness of the approach.
Recommendation of financial investment strategies is a complex and knowledge-intensive task. Typically, financial advisors have to discuss at length with their wealthy clients and have to sift through several investment proposals before finding one able to completely meet investors' needs and constraints. As a consequence, a recent trend in wealth management is to improve the advisory process by exploiting recommendation technologies. This paper proposes a framework for recommendation of asset allocation strategies which combines case-based reasoning with a novel diversification strategy to support financial advisors in the task of proposing diverse and personalized investment portfolios. The performance of the framework has been evaluated by means of an experimental session conducted against 1172 real users, and results show that the yield obtained by recommended portfolios overcomes that of portfolios proposed by human advisors in most experimental settings while meeting the preferred risk profile. Furthermore, our diversification strategy shows promising results in terms of both diversity and average yield.
Wealth management services have become a priority for most financial services organizations firms. As investors are pressing wealth managers to justify their value proposition, turbulence in financial markets reinforced the need to improve the advisory offering with more customized and sophisticated services. As a consequence, a recent trend in wealth management is to improve the advisory process by exploiting recommendation technologies. However, widespread recommendation approaches, such as content-based (CB) and collaborative filtering (CF), can hardly be put into practice in this domain. In fact, in this domain each user is typically modeled through his risk profile and other simple features, while each financial product is described through a rating provided by credit rating agencies, an average yield and the category it belongs to. In this scenario a pure CB strategy is likely to fail since content information is too poor and not meaningful to feed a CB recommendation algorithm. Furthermore, the over-‐specialization problem, typical of CB recommenders, may collide with the fact that turbulence and fluctuations in financial markets suggest to change and diversify the investments over time. Similarly, CF algorithms can hardly be adopted since they may lead to the well-‐known problem of flocking: given that user-‐based CF provides recommendations by assuming that a user is interested in the asset classes other people similar to her already invested in, this could move many similar users to invest in the same asset classes at the same time, making the recommendation algorithm victim of potential trader attacks1. These dynamics suggest to focus on different recommendation paradigms. Given that financial advisors have to analyze and sift through several investment portfolios2 before providing the user with a solution able to meet his investment goals, the insight behind our recommendation framework is to exploit case-‐based reasoning (CBR) to tailor investment proposals on the ground of a case base of previously proposed investments. Our recommendation process is based on the typical CBR workflow and is structured in three different steps: 1) Retrieve and Reuse: retrieval of similar portfolios is performed by representing each user through a feature vector (as feature risk profile, inferred through the standard MiFiD questionnaire3, investment goals, temporal goals, financial experience, and financial situation were chosen. Each feature is represented on a five-‐point ordinal scale, from very low to very high). Next, cosine similarity is adopted to retrieve the most similar users (along with the portfolios they agreed) from the case base. 2) Revise: candidate solutions retrieved by the first step are typically too many to be consulted by a human advisor. Thus, the Revise step further filters this set to obtain the final solutions. To revise the candidate solutions four techniques were compared: a basic (temporal) ranking, a Greedy diversification which implements a Greedy algorithm to select the solutions with the best compromise between quality and diversity and FCV, a novel scoring methodology which computes how close to the optimal one is the distribution of the asset classes in the portfolio. 3) Review and Retain: in the Review step human advisor and client can further discuss and modify the portfolio, before generating the final solution for the user. If the yield obtained by the newly recommended portfolio is accepta
The vector space model (VSM) emerged for almost three decades as one of the most effective approaches in the area of Information Retrieval (IR), thanks to its good compromise between expressivity, effectiveness and simplicity. Although Information Retrieval and Information Filtering (IF) undoubtedly represent two related research areas, the use of VSM in Information Filtering is much less analyzed, especially for content-based recommender systems. The goal of this work is twofold: first, we investigate the impact of VSM in the area of content-based recommender systems; second, since VSM suffer from well-known problems, such as its high dimensionality and the inability to manage information coming from negative user preferences, we propose techniques able to effectively tackle these drawbacks. Specifically we exploited Random Indexing for dimensionality reduction and the negation operator implemented in the Semantic Vectors open source package to model negative user preferences. Results of an experimental evaluation performed on these enhanced vector space models (eVSM) and the potential applications of these approaches confirm the effectiveness of the model and lead us to further investigate these techniques.
Artificial Intelligence technologies are growingly used within several software systems ranging from Web services to mobile applications. It is by no doubt true that the more AI algorithms and methods are used the more they tend to depart from a pure "AI" spirit and end to refer to the sphere of standard software. In a sense, AI seems strongly connected with ideas, methods and tools that are not (yet) used by the general public. On the contrary, a more realistic view of it would be a rich and pervading set of successful paradigms and approaches. Industry is currently perceiving semantic technologies as a key contribution of AI to innovation. In this paper a survey of current industrial experiences is used to discuss different semantic technologies at work in heterogeneous areas, ranging from Web services to semantic search and recommender systems.The resulting picture confirms the vitality of the area and allows to sketch a general taxonomy of approaches, that is the main contribution of this paper.
Condividi questo sito sui social