Integrating bioinformatics resources for modelling Human non-coding RNA networks
Abstract
IntroductionNon-coding RNAs (ncRNAs) serve as regulatory molecules for a variety of biological processes. They are roughly classified into two major categories, small non-coding RNAs (sncRNAs), such as microRNAs (miRNAs), and long non-coding RNAs (lncRNAs) according to their size. The lncRNAs have a broader spectrum of functions and are, therefore, a potential new class of cancer therapeutic target [1,2]. In addition there are other different types of ncRNAs whose role is not yet clear: circular-RNA, lincRNA, scRNA, sense-intronic and vault-RNA. New advances in translational research will require an accurate understanding of the functional relationships between protein- coding and ncRNA categories, as well as sponge regulatory networks [3,4]. To achieve this goal, we have built an integrated bioinformatics knowledge base, collecting non-redundant annotations of human ncRNAs, sequences and interactors, which provides a comprehensive access to all the knowledge available concerning ncRNAs, their interaction with other molecules and associated diseases. As key characteristics, the database overcomes the problem of different nomenclatures used by different sources and provides new clues about ncRNA functions throughout interactions inferred by network reconstruction [5].MethodsncRNA interactions include physical (i.e. molecular bindings between ncRNAs and DNA, RNAs or proteins) and functional relationships (i.e., co-expression, regulation, associated diseases, statistical and functional associations). Interactions stored in the database are in the form 'ncRNAs-mate', where the mate entity belongs to one of the following types: ncRNA, protein coding RNA (pcRNA), gene, protein, pseudogene and phenotype. In order to ensure the data quality of our interaction database we have developed a series of Extraction Transformation and Loading (ETL) modules able to extract, collect and integrate primary annotations, sequences and interactions from different public biological resources.The biological extracted entities and their relations are modelled as a network, a mathematical object composed by nodes (entities) and edges (relations) [5]. Entities redundancy has been identified by cross-link references and sequence similarity using the Cleanup software [6]. Non- coding RNAs are classified in biotypes, associated to Sequence Ontology terms [7] and integrated with data of protein coding RNAs (pcRNAs), gene, protein, pseudogene and phenotype. Furthermore, we extended the cross-reference network with data provided by Ensembl [8], using the biomaRt library of BioConductor [9].ResultsTotal amount of different entities collected in our interaction database are: 168.058 ncRNA , 5.009 pcRNA, 52.811 genes, 1.999 proteins, 15.940 pseudogenes and 849 phenotype.Moreover, total amount of interactions, based on mate type cardinalities, include: 130.383 ncRNA- ncRNA, 55.048 ncRNA-pcRNA, 1.458.925 ncRNA-gene, 99.653 ncRNA-protein, 70.482 ncRNA-phenotype, 17.217 ncR
Autore Pugliese
Tutti gli autori
-
V. Bonnici; G. De Caro; S. Liuni; D. D'Elia; N. Bombieri; R. Giugno; F. Licciulli
Titolo volume/Rivista
Non Disponibile
Anno di pubblicazione
2016
ISSN
Non Disponibile
ISBN
Non Disponibile
Numero di citazioni Wos
Nessuna citazione
Ultimo Aggiornamento Citazioni
Non Disponibile
Numero di citazioni Scopus
Non Disponibile
Ultimo Aggiornamento Citazioni
Non Disponibile
Settori ERC
Non Disponibile
Codici ASJC
Non Disponibile
Condividi questo sito sui social