Integrating bioinformatics resources for modelling Human non-coding RNA networks

Abstract

IntroductionNon-coding RNAs (ncRNAs) serve as regulatory molecules for a variety of biological processes. They are roughly classified into two major categories, small non-coding RNAs (sncRNAs), such as microRNAs (miRNAs), and long non-coding RNAs (lncRNAs) according to their size. The lncRNAs have a broader spectrum of functions and are, therefore, a potential new class of cancer therapeutic target [1,2]. In addition there are other different types of ncRNAs whose role is not yet clear: circular-RNA, lincRNA, scRNA, sense-intronic and vault-RNA. New advances in translational research will require an accurate understanding of the functional relationships between protein- coding and ncRNA categories, as well as sponge regulatory networks [3,4]. To achieve this goal, we have built an integrated bioinformatics knowledge base, collecting non-redundant annotations of human ncRNAs, sequences and interactors, which provides a comprehensive access to all the knowledge available concerning ncRNAs, their interaction with other molecules and associated diseases. As key characteristics, the database overcomes the problem of different nomenclatures used by different sources and provides new clues about ncRNA functions throughout interactions inferred by network reconstruction [5].MethodsncRNA interactions include physical (i.e. molecular bindings between ncRNAs and DNA, RNAs or proteins) and functional relationships (i.e., co-expression, regulation, associated diseases, statistical and functional associations). Interactions stored in the database are in the form 'ncRNAs-mate', where the mate entity belongs to one of the following types: ncRNA, protein coding RNA (pcRNA), gene, protein, pseudogene and phenotype. In order to ensure the data quality of our interaction database we have developed a series of Extraction Transformation and Loading (ETL) modules able to extract, collect and integrate primary annotations, sequences and interactions from different public biological resources.The biological extracted entities and their relations are modelled as a network, a mathematical object composed by nodes (entities) and edges (relations) [5]. Entities redundancy has been identified by cross-link references and sequence similarity using the Cleanup software [6]. Non- coding RNAs are classified in biotypes, associated to Sequence Ontology terms [7] and integrated with data of protein coding RNAs (pcRNAs), gene, protein, pseudogene and phenotype. Furthermore, we extended the cross-reference network with data provided by Ensembl [8], using the biomaRt library of BioConductor [9].ResultsTotal amount of different entities collected in our interaction database are: 168.058 ncRNA , 5.009 pcRNA, 52.811 genes, 1.999 proteins, 15.940 pseudogenes and 849 phenotype.Moreover, total amount of interactions, based on mate type cardinalities, include: 130.383 ncRNA- ncRNA, 55.048 ncRNA-pcRNA, 1.458.925 ncRNA-gene, 99.653 ncRNA-protein, 70.482 ncRNA-phenotype, 17.217 ncR


Tutti gli autori

  • V. Bonnici; G. De Caro; S. Liuni; D. D'Elia; N. Bombieri; R. Giugno; F. Licciulli

Titolo volume/Rivista

Non Disponibile


Anno di pubblicazione

2016

ISSN

Non Disponibile

ISBN

Non Disponibile


Numero di citazioni Wos

Nessuna citazione

Ultimo Aggiornamento Citazioni

Non Disponibile


Numero di citazioni Scopus

Non Disponibile

Ultimo Aggiornamento Citazioni

Non Disponibile


Settori ERC

Non Disponibile

Codici ASJC

Non Disponibile