Social Database for Biodiversity
Abstract
MotivationBiodiversity research concerns with data coming from many different domains (e.g., Biology, Geography, Evolutionary Studies, Genomics, Taxonomy, Environmental Sciences, etc.) which need to be integrated for leading to valuable Biodiversity knowledge. Collecting and integrating data from so many heterogeneous resources is not a trivial task. Data are extremely scattered, heterogeneous in format and purpose, and protected in repositories of several research institutes. Driven by the widely diffused trend of the web of sharing information through aggregation of people with the same interests (social networks), and by the new type of database architecture defined as dynamic distributed federated database, we are proposing a new paradigm of data integration in the Biodiversity domain. Here we present a new approach for the development of a Knowledge Base aiming to the collection, integration and analysis of biodiversity data implemented as a product of the MBLab project.MethodsThe implementation of the Biodiversity Knowledge Base is based on the integration of several components: a robust Database Management System (IBM DB2) managing the large volume of information from public databases like GenBank, a set of GaianDB nodes [1] to manage remote private collections of biodiversity data; the IBM Federator Server to implement the general conceptual schema integrating all biodiversity databases available across remote nodes of MBLab project partners.ResultsGaianDB is a Dynamic Distributed Federated Database of sources whose growth is regulated by biologically inspired principles and graph theoretic methods. By means of the GaianDB network architecture data remains on the remote research group servers, and each database owner is responsible for its integrity, availability and sharing. Each vertex of this network is a suitable entry point receiving the user query and responding with an output aggregating different pieces of information retrieved from the different data sources spanned all over the network. To integrate GenBank molecular data in the MBLabDB we built an efficient and reliable ETL (Extraction, Transformation and Load) module, implemented with CLIPS Rule Based Programming Language. The ETL extracts information from the feature- based GenBank entries and fits them in the MBLabDB schema. Molecular data collections are structured following a Chado-like model [2], using Sequence Ontology entities and relations. This allows to retrieve data using the biological concepts expressed by the Sequence Ontology [3]. The main result of this work is the development of a standard conceptual schema and a knowledge base architecture tailored to biodiversity data collection, integration and analysis. The database is modeled on six main sections: Taxonomic, Individual, Collection, Supply chain, Experimental molecular data. Currently two biodiversity data collections have been integrated by using GaianDB: the ITEM Collection [4] located at the I
Autore Pugliese
Tutti gli autori
-
Pannarale P. ; Scioscia G. ; Rubino F. ; Leo P. ; Pappadà G. ; D'Elia D. ; Grillo G. ; Vicario S. ; De Caro G. ; Gisel A. ; Mulè G. ; Susca A. ; Catalano D. ; Licciulli F.
Titolo volume/Rivista
Non Disponibile
Anno di pubblicazione
2010
ISSN
Non Disponibile
ISBN
978-88-6194-079-6
Numero di citazioni Wos
Nessuna citazione
Ultimo Aggiornamento Citazioni
Non Disponibile
Numero di citazioni Scopus
Non Disponibile
Ultimo Aggiornamento Citazioni
Non Disponibile
Settori ERC
Non Disponibile
Codici ASJC
Non Disponibile
Condividi questo sito sui social