Persone Apulia Research Gate

2010 Annual General Meeting - Publicity & Public Relations Project Committee Report

The P&PR-PC main goal is to enhance EMBnet visibility and to establish cooperation/collaboration with other (even dissimilar) major groups, networks and societies. This is the report of the Committee's activies from November 2009 to June 2010

2014 Annual General Meeting: Publicity & Public Relations Project Committee Report

To properly respond to the most urgent needs raised during the EMBnet workshop held in Valencia in May 20131, the Publicity & Public Relations Project Committee (P&PR PC) established two task-forces: i) a website task-force, comprising Rafael Jimenez and Cesar Bonavides-Martinez; and ii) a communication strategy task-force, comprising Vicky Schneider and Rubina Kalra.An overview of activities and achievements was given by the PC Chair, and discussed during the EMBnet 2014 workshop2 held in Lyon, 30 May. The programme also included a "Website hands-on" by Rafael Jimenez on "How to use the EMBnet website, add and manage content" aiming to expose members to some of its basic functions and services, and to practise their use. This article describes the achievements of the P&PR PC since June 2013, and plans for the next year.

2016 Big Data Forum for Life and Health Sciences

The discovery of the pivotal roles of non-coding RNAs (ncRNAs) on gene expression and genome maintenance represents one of the most significant revolution of the last decades in life science research. Some ncRNA classes, such as ribosomal RNAs and transfer RNAs, have been known for a very long time, others, such as micro RNAs (miRNAs), Piwi interacting RNAs (piRNAs), and long non-coding RNAs (lnRNAs) were discovered in more recent years. However, the discovery of the great diversity and magnitude of this family of regulators is a recent achievement. Indeed, it is only in the last decade that, thanks to the advent of next generation sequencing technologies, large-scale sequencing studies have allowed scientists to systematically analyse ncRNAs for their real size and functional activities. These studies have identified a surprisingly large number of new and diverse ncRNA genes that are emerging to be central to many aspects of plant and animal gene regulation. Domenica D'Elia will expose some of the most relevant and recent results in this fascinating research domain, highlighting her research and recent achievements.

454 GS-FLX TITANIUM PLATFORM: THE EXPERIENCE OF ITB-BA

Perfectly integrated in the scenario of modern research, which requires advancedtechnologies to be applied in a wide range of research fields, the Institute forBiomedical Technologies in Bari has equipped its Labs with both the 454 GenomeSequencer FLX Titanium by Roche and with a powerful bioinformatics platform(hardware and software facilities) for managing and analysing NGS data.To cope with the fast rate at which NGS technologies are evolving, thedevelopment and setting up of new experimental protocols and of bioinformaticsanalysis pipelines is a challenge that researchers have to face daily.Major obstacles in NGS "omics" research are - at experimental level - finding thebest way to extract DNA or RNAs from samples and obtaining good libraries forsequencing, and - at bioinformatics level - data storage, transfer, and analysis. Theclassical statistical methods and computational algorithms are inadequate foranalysing the large amount of sequence data produced by new NGS technologies.Novel analytical strategies are urgently needed for exploring new features ofsequencing data, integrating various genomic and epigenomic data, unravelling thestructure, organisation, and function of the human genome, understandingfundamental principles of genomic biology, and discovering genetic and nongeneticbases of diseases.The Genomics Research team of ITB in Bari, is gaining great acquaintance withhigh throughput sequencing procedure and it has developed a new protocol for thepreparation and amplification of representative cDNA libraries to be sequenced byNGS platforms. This protocol is patent pending in Europe.The Bioinformatics Group is focused on the development of bioinformatics tools foranalysing data obtained by different NGS platforms for diverse projects spanningfrom molecular studies in cancer research to biodiversity studies. In this respect,the ITB-BA, in collaboration with other CNR and Academic Institutions, is presentlyinvolved in several projects for studying the molecular biodiversity in metagenomicsand metatrascriptomics within Biomedical, Food and Environmental fields.In particular we are:- studying the transcriptome of normal and pathological samples with the aim ofidentifying genes, new mRNA isoforms, microRNA and genome wide mapping oftranscription factors involved in the etiopathogenesis of human diseases;- analysing the exome and the transcriptome profile in short children with particularattention to the involvement of the p53 oncosuppressor gene family members (p53,p63, p73) in the regulation of the genes involved in growth;- investigating the possibility that only particular viral genotypes of Epstein Barrvirus (EBV), that ubiquitously infects humans, can be associated with theetiopathogenesis of multiple sclerosis;- investigating the taxonomical complexity of microbial communities living in foodindustry "habitats", particularly in winemaking chain, shedding li

A BIOINFORMATICS WORKFLOW FOR THE ANALYSIS OF NONCODING RNAs FROM DATA GENERATED BY DEEP-SEQUENCING

A bioinformatics workflow for the analysis of transcriptome data generated by deep-sequencing

The huge amount of transcript data produced by high-throughput sequencing requires the development and implementation of suitable bioinformatic workflows for their analysis and interpretation. These analysis workflows, including different modules, should be specifically designed also based on the sequencing platform (Roche 454, Illumina, SOLiD) and the nature of the data (polyA or total RNA fraction, strand specificity). In the case of cDNA obtained from a total RNA preparation, in addition to polyadenylated protein coding mRNAs, a great variety of transcript sequences can be obtained, including ribosomal RNAs, mitochondrial transcripts and a large variety of functional non coding RNAs (ncRNAs). To deal with these data the analysis workflow should include specific modules to distinguish ncRNAs fractions from the large number of other functional proteincoding transcripts. To this aim we developed an analysis pipeline that, given as input a large collection of reads (particularly from Roche 454), provides the expression profile at qualitative and quantitative level of human mtDNA, ribosomal RNAs, ncRNAs and protein coding mRNAs.

A novel biclustering algorithm for the discovery of meaningful biological correlations between miRNAs and mRNAs

microRNAs (miRNAs) are post-transcriptional regulators which represent one of the major regulatory gene families in animals, plants and viruses and that plays a key role in almost all main cellular processes. The computational prediction of miRNA target genes is important for the functional annotation of genomes and, on the other side, functional annotation of target genes can be of great help in suggesting specific biological functions of miRNAs [1]. This work aims to contribute to the elucidation of miRNAs role in the regulation of gene expression, by proposing a method for the hierarchical and overlapping biclustering of miRNAs and target messenger RNAs (mRNAs). The method allows to discover possible miRNA:mRNA functional relationships, at different granularity levels, in large datasets produced by miRNA target site prediction algorithms, thus reducing the impact of noise on the significance of the resulting biclusters.

A novel biclustering algorithm for the discovery of meaningful biological correlations between microRNAs and their target genes

microRNAs (miRNAs) are a class of small non-coding RNAs which have been recognized as ubiquitous post-transcriptional regulators. The analysis of interactions between different miRNAs and their target genes is necessary for the understanding of miRNAs' role in the control of cell life and death. In this paper we propose a novel data mining algorithm, called HOCCLUS2, specifically designed to bicluster miRNAs and target messenger RNAs (mRNAs) on the basis of their experimentally-verified and/or predicted interactions. Indeed, existing biclustering approaches, typically used to analyze gene expression data, fail when applied to miRNA:mRNA interactions since they usually do not extract possibly overlapping biclusters (miRNAs and their target genes may have multiple roles), extract a huge amount of biclusters (difficult to browse and rank on the basis of their importance) and work on similarities of feature values (do not limit the analysis to reliable interactions).RESULTS:To overcome these limitations, HOCCLUS2 i) extracts possibly overlapping biclusters, to catch multiple roles of both miRNAs and their target genes; ii) extracts hierarchically organized biclusters, to facilitate bicluster browsing and to distinguish between universe and pathway-specific miRNAs; iii) extracts highly cohesive biclusters, to consider only reliable interactions; iv) ranks biclusters according to the functional similarities, computed on the basis of Gene Ontology, to facilitate bicluster analysis.CONCLUSIONS:Our results show that HOCCLUS2 is a valid tool to support biologists in the identification of context-specific miRNAs regulatory modules and in the detection of possibly unknown miRNAs target genes. Indeed, results prove that HOCCLUS2 is able to extract cohesiveness-preserving biclusters, when compared with competitive approaches, and statistically confirm (at a confidence level of 99%) that mRNAs which belong to the same biclusters are, on average, more functionally similar than mRNAs which belong to different biclusters. Finally, the hierarchy of biclusters provides useful insights to understand the intrinsic hierarchical organization of miRNAs and their potential multiple interactions on target genes.

A NOVEL GENERAL-PURPOSE RNA-SEQ PROTOCOL OPTIMIZING THE DETECTION OF TRANSCRIPTOME EXPRESSION COMPLEXITY

Recent studies have demonstrated an unexpected complexity of transcription in eukaryotes.Indeed the majority of the genome is transcribed and only a little fraction of these transcripts isannotated as protein coding genes and their splice variants. Therefore high throughput transcriptomesequencing continuously identifies novel RNAs and novel classes of RNAs, which are the result ofantisense, overlapping and non-coding RNA expression, demonstrating that the transcriptomecaptures a level of complexity that the simple genome sequence may not (1).Among next-generation sequencing platforms, the latest series of Roche 454 GS Sequencer, the GSFLX Titanium FLX+, allows to obtain in each run over a million reads, each with a length up to 700base. Sequences of such length, providing connectivity information among splicing sites, in additionto enabling accurate mapping and relative quantification of mRNAs, are particularly suitable for thecharacterization of full-length splicing variants that may be differently expressed inphysiopathological conditions (2). On the other hand the higher throughput of the Illumina HiSeq1000 (150 bp) and ABI SOLID (75 bp) platforms, makes them particularly suitable for transcriptslevel quantification and for small RNAs sequencing.Irrespectively of the NGS platform used, the first step required for transcriptome sequencing is theconstruction of a cDNA library. Several protocols have been developed so far to this aim and eachof them is suitable for sequencing on a specific platform exclusively.Here we describe a new fast and simple method (Patent pending RM2010A000293-PCT/IB2011/052369) to prepare and amplify a representative and strand-specific cDNA librarystarting from low input total RNA (500ng) for RNA-Seq applications, that may be implemented withall major platforms currently available (Roche 454, Illumina, ABI/Solid).Our method includes the following steps: a) rRNA removal from total RNA b) retrotranscription ofthe rRNA-depleted RNA to cDNA with 5' phosphorylated Tag-random-octamers custom designedcapable of preserving strand information; c) single-strand cDNAs purification; d) ligation andamplification of the purified cDNAs, thus obtaining high yield of concatamers around 20kb long.These DNA molecules can be equally sequenced both with Illumina and Roche 454 sequencingplatforms allowing not only the quantitative but also the qualitative assessment of the transcriptomecomplexity.Moreover, we developed a suitable bioinformatic pipeline for the analysis of the sequences producedupon application of this protocol. Indeed, we developed an in house python script, named Tag_Find(available upon request), able to recognize the position and the type of tag found within the readsequence. The program returns out two files, one containing the type of tags found and their readspositions and one fastq file with non-tagged reads, cleaned up from tags. The Tag_Find efficiency

A platform independent RNA-Seq protocol for the detection of transcriptome complexity

Recent studies have demonstrated an unexpected complexity of transcription in eukaryotes. The majority of the genome is transcribed and only a little fraction of these transcripts is annotated as protein coding genes and their splice variants. Indeed, most transcripts are the result of antisense, overlapping and non-coding RNA expression. In this frame, one of the key aims of high throughput transcriptome sequencing is the detection of all RNA species present in the cell and the first crucial step for RNA-seq users is represented by the choice of the strategy for cDNA library construction. The protocols developed so far provide the utilization of the entire library for a single sequencing run with a specific platform.ResultsWe set up a unique protocol to generate and amplify a strand-specific cDNA library representative of all RNA species that may be implemented with all major platforms currently available on the market (Roche 454, Illumina, ABI/SOLiD). Our method is reproducible, fast, easy-to-perform and even allows to start from low input total RNA. Furthermore, we provide a suitable bioinformatics tool for the analysis of the sequences produced following this protocol.ConclusionWe tested the efficiency of our strategy, showing that our method is platform-independent, thus allowing the simultaneous analysis of the same sample with different NGS technologies, and providing an accurate quantitative and qualitative portrait of complex whole transcriptomes.

A two-stepped computational approach for miRNA-gene regulatory networks discovery

Deciphering the modular organization of gene regulatory networks is crucial for the understanding of biological processes at a system-wide level. MicroRNAs (miRNAs) represent the largest class of small non-coding RNAs (20-24 nucleotide long (nt)) acting as post-transcriptional regulators of many genes and playing a pivotal role in important biological processes, in almost all organisms and in a large number of human diseases. Computational approaches have been proven to be fundamental in the miRNA research for both gene-specific and large-scale predictions of miRNA targets, for the formulation of new functional hypothesis on their biological role and to guide experimental validations. However, their effectiveness is negatively affected by high uncertainty of miRNA gene target predictions and by the complexity of rules governing miRNA functional targeting whose mechanisms still remain elusive. In order to improve predictions of miRNA targets and to support the elucidation of miRNA functional role in the context of gene regulatory networks, we have recently developed a new two-stepped computational approach. In the first step, a semi-supervised ensemble-based classifier [1] is learned from both experimentally validated interactions (positively labelled examples) and miRNA gene target predictions (MTIs) returned from several prediction algorithms (unlabelled examples). This classifier acts as a meta-classifier of unlabelled examples. As a result of the first step, a unique (meta-)prediction score is available for all possible interactions. In the second step, these prediction scores are used to identify miRNA-gene regulatory networks (MGRNs) through the biclustering algorithm HOCCLUS2 [2]. The effectiveness of the computational approach has been validated on a number of alternative combinations of competitive algorithms for the first and the second step. Both the predicted MTIs and the MGRNs can be queried, retrieved, exported and visualized through the web-based system ComiRNet (http://193.204.187.158:9002/). The system interface facilitates the formulation of complex queries and help the user both in browsing bicluster hierarchies and in visualizing the interaction graph of MRGNs . The hierarchical organization of biclusters improves the interpretability of the results and emphasizes similarities among genes at different granularity levels, allowing ComiRNet users to explore many possible biological scenarios. The functional relationships suggested by miRNAs and target genes in biclusters can help to detect unknown functional similarities or synergies among miRNAs and among target genes, that can enable the discovery of new miRNA and gene functions. Acknowledgements We would like to acknowledge the support of the European Commission through the project MAESTRA - Learning from Massive, Incompletely annotated, and Structured Data (Grant number ICT-2013-612944). This work was also funded by the "PON01 02589 - MicroMap" project and by the flagship

Analysis of Gene-microRNA Maps for Regulatory Network Discovery

BiP-Day 2013: "Prima Giornata della Bioinformatica Pugliese" - Workshop report

On 5 December 2013, a regional workshop on Bioinformatics in Apulia (BiP-Day 2013) was held in Bari (IT) under the patronage of the Italian Bioinformatics Society (BITS) and EMBnet. The aim of the workshop was to stimulate tighter collaboration between life science researchers and private biotech companies in the Apulia Region around cutting-edge topics in biological and clinical research, for which bioinformatics R&D is key.The programme was structured into three main sessions: 1) Regional development programmes and major infrastructures for Bioinformatics in the Apulia Region; 2) Bioinformatics projects in bio-medicine, biodiversity, agri-food and bioinformatics training programmes; 3) Research & Business: the importance of communication. Presentations are available from the workshop website associated to the programme (http://www.ba.itb.cnr.it/bip-day/programma/), and from the News section Presentations (http://www.ba.itb.cnr.it/bip-day/category/presentazioni/page/3/).

BITS 2012: Ninth Annual Meeting of the Bioinformatics Italian Society

BITS 2010 - VII Annual Meeting of the Bioinformatics Italian Society, Bioinformatics and Computational Biology for Life Sciences

ComiRNet - The Database of Predicted miRNA Regulatory Networks

ComiRNet: a web-based system for the analysis of miRNA-gene regulatory networks

The understanding of mechanisms and functions of microRNAs (miRNAs) is fundamental for the study of many biological processes and for the elucidation of the pathogenesis of many human diseases. Technological advances represented by high-throughput technologies, such as microarray and next-generation sequencing, have significantly aided miRNA research in the last decade. Nevertheless, the identification of true miRNA targets and the complete elucidation of the rules governing their functional targeting remain nebulous. Computational tools have been proven to be fundamental for guiding experimental validations for the discovery of new miRNAs, for the identification of their targets and for the elucidation of their regulatory mechanisms.DescriptionComiRNet (Co-clustered miRNA Regulatory Networks) is a web-based database specifically designed to provide biologists and clinicians with user-friendly and effective tools for the study of miRNA-gene target interaction data and for the discovery of miRNA functions and mechanisms. Data in ComiRNet are produced by a combined computational approach based on: 1) a semi-supervised ensemble-based classifier, which learns to combine miRNA-gene target interactions (MTIs) from several prediction algorithms, and 2) the biclustering algorithm HOCCLUS2, which exploits the large set of produced predictions, with the associated probabilities, to identify overlapping and hierarchically organized biclusters that represent miRNA-gene regulatory networks (MGRNs).ConclusionsComiRNet represents a valuable resource for elucidating the miRNAs' role in complex biological processes by exploiting data on their putative function in the context of MGRNs. ComiRnet currently stores about 5 million predicted MTIs between 934 human miRNAs and 30,875 mRNAs, as well as 15 bicluster hierarchies, each of which represents MGRNs at different levels of granularity. The database can be freely accessed at: http://comirnet.di.uniba.it webcite.

ComiRNet: a database of predicted miRNA:mRNA interactions and regulatory networks

Computational methods are fundamental in the identification of miRNAs target site and in the reconstruction of interacting regulatory networks they are able to control. Understanding mechanisms and functions of microRNAs (miRNAs) is pivotal for the elucidation of many biological processes and of etiopathology of some diseases, such as tumors and neurodegenerative syndromes. ComiRNet (Co-clustered miRNA Regulatory Networks) is a new database which collects data of miRNA:mRNA interactions and interacting networks by exploiting human miRNAs target predictions from 10 different databases stored in mirDIP. These data have been produced by using a combined data mining approach based on biclustering and semi-supervised ensemble-based learning techniques.ComiRNet provides a user-friendly graphical interface (GUI) for efficient query, retrieval, export, visualization and analysis of the discovered regulatory networks.Availability: ComiRNet is available at http://193.204.187.158:9002/MethodIn [1], we presented a method which learns to combine the scores of several prediction algorithms, in order to improve the reliability of the predicted interactions. The approach works in the semi-supervised ensemble learning setting which exploits information conveyed by both labeled (validated interactions, from miRTarBase [3]) and unlabeled (predicted interactions, from mirDIP) instances. The algorithm HOCCLUS2 [2] exploits the large set of produced predictions, with the associated probability, to extract a set of hierarchically organized biclusters. The construction of the hierarchy is performed by an iterative merging, considering both distance and density-based criteria. Extracted biclusters are also ranked on the basis of the p-values obtained by the Student's T-Test which compares intra- and inter- functional similarity of miRNA targets, computed on the basis of the gene classification provided in Gene Ontology (GO).The ComiRNet database relies on PostgreSQL DBMS, while the web-based platform is built through the Play 2.2 Java framework and the Cytoscape library.ResultsComiRNet stores approximately 5 million predicted interactions between 934 human miRNAs and 30,875 mRNAs, which are exploited in the construction of the hierarchies of biclusters representing potential miRNA regulatory networks. The ComiRNet web interface allows users to perform extraction and visualization of single interactions (with the score/probability assigned by the learning algorithm) and of biclusters of interest, as well as to easily browse whole biclusters hierarchies. Biclusters hierarchy browsing (i.e., navigation among parents and children biclusters) helps to identify intrinsic and functional relationships between different miRNAs and their predicted functional co-targeting on different groups of genes. The interface for the analysis of biclusters also provides a graph-based visualization of the predicted miRNA-gene interac

Comparison of co-regulation of DNA methylation and smallRNA expression between Arabidopsis infected with RNA and DNA virus

In plants, which are particularly sensitive to changes of environmental conditions, modulation of DNA methylation is a crucial mechanism of regulation of gene expression in response to abiotic and biotic stresses. Monitoring plant's immune system in response to bacterial pathogen infection demonstrated that also dynamic DNA methylation changes, and not only gene imprinting, have regulatory effect in plant pathogen defense. Critical elements for epigenetic modifications of plant genomes are non-coding smallRNA e same RNA family is also a hallmark of plant reaction to virus infection. Interestingly, sRNA have a central role in both plant genome methylation and resistance upon virus infection, however, the interaction between sRNA expression and DNA methylation regulating the immune system in response to virus infection has not been investigated so far.To correlate dynamic DNA methylation and differential sRNA expression in response to virus infection, we have performed genome-wide methylation and sRNA expression profiling on Arabidopsis leaves systemically infected with either the DNA-genome virus Caulifower mosaic virus or the RNA virus Cucumber mosaic virus. We developed a software package to analyze the sRNA expression and the DNA methylation profile and deploy a genome wide comparison of control and infected samples to search regions significantly different either in the methylation profile or in the sRNA expression, or in both. In the regions where we observe significant correlation of methylation (mainly CHH methylation) and sRNA expression modifications, we found that both hypo- and hypermethylation correlated with downregulation of 21/24nt sRNAs. These regions mostly comprised transposons and few of them contained promoter or coding sequences of genes involved, according to gene ontology, in DNA-binding and DNA-dependent regulation of transcription and response to abiotic or biotic stimulus. This confirms virus-induced infection regulation of sRNA and DNA methylation. We are presently still in the process of data analysis and more details about correlation of virus-induced modification of sRNA and DNA methylation levelswill be reported.

Discovery of miRNA-Gene regulatory networks by using an integrative data-mining approach

IntroductionMicroRNAs (miRNAs) represent the largest class of small non-coding RNAs (20-24 nucleotide long) acting as post-transcriptional regulators of many genes and playing a pivotal role in important biological processes, in almost all organisms and in a large number of human diseases. Computational approaches have been proven to be fundamental in the miRNA research for both gene-specific and large-scale predictions of miRNA targets, for the formulation of new functional hypothesis on their biological role, for gene network discovery and to guide experimental validations. However, their effectiveness is negatively affected by high uncertainty of miRNA gene target predictions and by the complexity of rules governing miRNA functional targeting, whose mechanisms still remain elusive. In order to improve predictions of miRNA targets and to support the elucidation of miRNA functional role in the context of gene regulatory networks, we have recently developed a new two-stepped computational approach based on: i) a semi-supervised ensemble-based classifier for the prediction of miRNA target interactions (MTIs) [1] and, ii) a biclustering algorithm (HOCCLUS2) for the prediction of miRNA-gene regulatory networks (MGRNs) [2]. Data produced are available at ComiRNet, a user-friendly web-based system providing efficient query, retrieval, export, visualization and analysis of predicted MTIs and MGRNs.MethodIn the first step, a semi-supervised ensemble-based classifier is learned from both experimentally validated interactions (positively labelled examples), extracted from miRTarBase [3], and miRNA gene target predictions (MTIs), returned from several prediction algorithms (unlabelled examples) and extracted from mirDIP [4]. This classifier acts as a meta-classifier of unlabelled examples. As a result of the first step, a unique (meta-)prediction score is available for all possible interactions. In the second step, these prediction scores are used to identify miRNA-gene regulatory networks (MGRNs) through the biclustering algorithm HOCCLUS2. HOCCLUS2 exploits the large set of produced predictions, with the associated probability, to extract a set of overlapping and hierarchically organized biclusters each one representing putative MGRNs. The construction of the hierarchy is performed by an iterative merging, considering both distance and density-based criteria. Extracted biclusters are also ranked on the basis of the p-values obtained by the Student's T-Test which compares intra- and inter- functional similarity of miRNA targets, computed on the basis of the gene classification provided in Gene Ontology (GO) [5]. The ComiRNet database relies on PostgreSQL DBMS, while the web-based platform is built through the Play 2.2 Java framework and the Cytoscape library. ResultsThe effectiveness of the computational approach has been validated on a number of alternative combinations of competitive algorithms for the first [1] and the second step [2]. B

DNA and RNA viruses infection in plant: two different ways to dynamically change the host's epigenetic profile

The establishment and maintenance of DNA methylation are relatively well understood whereas little is known about their dynamics and biological relevance in innate immunity [1-2]. In plants, modulation of DNA methylation might be an effective mechanism to regulate gene expression in response to abiotic and biotic stresses. Recent evidences from large-scale epigenomic approaches indicate that dynamic DNA methylation changes are not limited to gene imprinting but can regulate the plant's immune system in response to pathogens.In plants, virus infections trigger the expression of non-coding small RNAs (smRNAs) by also influencing the epigenetic status of the host genome; however, the involvement of DNA methylation in regulation of plant immune system in response to virus infection has not been so far investigated. In this context, we are carrying out a study aiming to elucidate the impact of DNA and RNA virus infections on genomic DNA methylation in plants, and their correlation with also the expression of smallRNA, by integrating the analysis of multiple "omics" datasets obtained by using next-generation sequencing technologies.In this paper we present the results of the analysis on the methylation modifications induced by the viruses infection on the whole genome and on coding and non-coding gene regions.

Effects of edible plant microRNAs on cancer cell proliferation: a beneficial cross-kingdom interaction

Diet in human health is no longer simple nutrition but, in the light of recent findings, it might play a pivotal role on cell health status by modulating apoptosis, detoxification, and appropriate gene response to environmental stresses. Epidemiological studies suggest a role of fruits and vegetables in protection against several diseases, and nutrients have been demonstrated to alter gene expression by DNA methylation and histone modifications [1-2]. Diet has also been found to modulate micro RNA (miRNA) expression, leading to a subsequent regulation of the effectors genes [3]. Furthermore, recent studies demonstrate that some plant/food-derived microRNAs (miRNAs) regulate gene expression in a sequence specific manner [4]. On the basis of all these findings, we have carried out a pilot study, using a combined "in-silico and wet" approach, to investigate the potential effects, and elucidate the molecular mechanisms, of edible plant miRNAs on the expression of human genes involved in cancer onset and progression. In the present paper we report the results obtained by transfecting 2 colon cancer cell lines, p53 wild type and p53 knock-out, with selected miRNAs of G. max, Z. mais and M. truncatula, which we found, by in silico analysis, to have a putative targeting activity on human oncogenes and tumor suppressor genes.

EMBnet.journal - Bioinformatics in Action

Hierarchical and Overlapping Co-Clustering of mRNA:miRNA Interactions.

microRNAs (miRNAs) are an important class of regulatory factors controlling gene expression at post-transcriptional level. Studies on interactions between different miRNAs and their target genes are of utmost importance to understand the role of miRNAs in the control of biological processes. This paper contributes to these studies by proposing a method for the extraction of co-clusters of miRNAs and messenger RNAs (mRNAs). Different from several already available co-clustering algorithms, our approach efficiently extracts a set of possibly overlapping, exhaustive and hierarchically organized co-clusters. The algorithm is well-suited for the task at hand since: i) mRNAs and miRNAs can be involved in different regulatory networks that may or may not be co-active under some conditions, ii) exhaustive co-clusters guarantee that possible co-regulations are not lost, iii) hierarchical browsing of co-clusters facilitates biologists in the interpretation of results. Results on synthetic and on real human miRNA:mRNA data show the effectiveness of the approach.

HOCCLUS2

HOCCLU2: a data mining tool for easily handling interactions data and discovering regulatory networks

Identification of new p53 regulatory networks through NGS data analysis

MotivationAround 50% of all human tumours carry point mutations in the p53 tumour suppressor gene, which alter p53 DNA binding specificity. In tumours with p53 wild type, p53 is often rendered functionally inert by the inactivation of its positive modulators or by the activation of negative factors, which block p53 transcriptional activities [1]. We identified a new p53 direct target gene, TRIM8, belonging to the Tripartite Motif (TRIM) protein family, defined by the presence of a RING domain, one or two B-boxes and a Coiled-Coil region. We found that TRIM8 overexpression leads, through a positive feedback loop, to p53 stabilization and p53-mediated suppression of cell proliferation. In order to identify the pathways activated by TRIM8 leading to p53 stabilization we transiently transfected with TRIM8 the HCT116-p53 (wt) cell line, and sequenced the total transcriptome performing a NGS run on a 454 GS FLX platform. Here we report some statistics and the preliminary results of: i) reads mapping on the human genome and analysis of differential expressed genes; ii) functional analysis of differentially expressed genes. MethodTotal RNA was extracted from HCT116-p53 (wt) cell line 48h after transfection, depleted of rRNA, retro-transcribed, amplified and sequenced by using the pyrosequencer Roche GS FLX Titanium Series. Genome mapping, statistics and differential expression analyses were performed by using the "NGS-Trex" system (NGS Transcriptome profile Explorer) (Mignone F. et al., submitted), a automatic system designed for analyzing Next Generation Sequencing data generated from large-scale transcriptome studies. The overall procedure involves three steps: 1) creation of a project and upload of reads in a multi-fasta format; 2) reads mapping onto the reference genome after setup of appropriate parameters; 3) annotation of mapped reads; 3) data mining by using simple query forms. TRIM8 and FLAG data were submitted to NGS-Trex using default parameters that can briefly summarized as follows: reads were mapped onto human genome (min similarity 90% and min overlap 50 nt) discarding reads mapping onto more than 10 genomic regions. Mapped reads were compared to annotation to assign reads to genes and to identify new splice variants. Differentially expressed genes and splicing events were identified by computing a P-value associated to an hypergeometric distribution. Housekeeping genes were used to normalise reads count before identification of differentially expressed genes. The lists of genes showing a differential expression in the two samples were then analysed by using DAVID v(6.7), an integrated biological knowledgebase and analytic tools (text and pathway-mining tools) for large gene list functional annotation [2,3]. An additional analysis on TRIM8 and FLAG sequence samples was made for the detection and annotation of the ncRNA genome fraction. We used a bioinformatic analysis pipeline, developed by us, which is able to: 1) select ncRNA fro

Improving the Prediction of miRNA:mRNA Interactions by Exploiting Co-Clustering Methods

MicroRNAs (miRNAs) represent the largest class of small non-coding RNAs with a key role in post-transcriptional regulation of gene expression. Studies about their well-known role in embryonic and adult cell proliferation and differentiation (Ren et al., 2009) have recently been extended by works aiming at analyzing their role in several types of human cancer (Olive et al., 2010). For this reason, it is important to understand specific biological functions and mechanisms through which they are able to ensure cell homeostasis and to control cell cycle, developmental timing and cancer progression. However, this is not a trivial task because of two main reasons: the complexity of rules governing miRNAs functional targeting, that are still far from being completely elucidated, and the uncertainty of computational predictions. On the other hand, experimental validation of all potential miRNA:mRNA interactions is too much expensive and time consuming if it has to be carried out for any possible predicted interaction. More effective tools are necessary to provide reliable predictions also on the basis of the analysis of potential miRNA targeting in the context of functional interaction gene networks. This task cannot be solved by analyzing single interactions between miRNAs and their target genes. Indeed, in the literature there are several examples of cooperative activities, represented as multiple miRNAs binding the same group of target genes in many relevant biological processes (Pio et al., 2013). Although this aspect emphasizes the possible dependencies among different miRNAs (and/or among their target genes), most of existing works on the prediction of miRNA:mRNA interactions have focused on single miRNA:mRNA pairs (see Shirdel et al., 2011 for a review), often by considering only structural features and ignoring possible functional inter-dependencies. Due to the recognized limits of such approaches, recently, we have proposed a machine-learning based method (Pio et al., 2014) for exploiting the contribution of several prediction algorithms, by automatically combining their contribution on the basis of a model learned by exploiting both validated and predicted interactions. Although this method has been shown to be much more effective with respect to single prediction algorithms and with respect to some baseline combination approaches, the adopted strategy still does not exploit the inter-dependencies among the considered miRNAs and mRNAs (i.e., it considers independently each miRNA:mRNA interaction).

Integrating microRNA target predictions for the discovery of gene regulatory networks: a semi-supervised ensemble learning approach

MicroRNAs (miRNAs) are small non-coding RNAs which play a key role in the post-transcriptional regulation of many genes. Elucidating miRNA-regulated gene networks is crucial for the understanding of mechanisms and functions of miRNAs in many biological processes, such as cell proliferation, development, differentiation and cell homeostasis, as well as in many types of human tumors. To this aim, we have recently presented the biclustering method HOCCLUS2, for the discovery of miRNA regulatory networks. Experiments on predicted interactions revealed that the statistical and biological consistency of the obtained networks is negatively affected by the poor reliability of the output of miRNA target prediction algorithms. Recently, some learning approaches have been proposed to learn to combine the outputs of distinct prediction algorithms and improve their accuracy. However, the application of classical supervised learning algorithms presents two challenges: i) the presence of only positive examples in datasets of experimentally verified interactions and ii) unbalanced number of labeled and unlabeled examples.ResultsWe present a learning algorithm that learns to combine the score returned by several prediction algorithms, by exploiting information conveyed by (only positively labeled/) validated and unlabeled examples of interactions. To face the two related challenges, we resort to a semi-supervised ensemble learning setting. Results obtained using miRTarBase as the set of labeled (positive) interactions and mirDIP as the set of unlabeled interactions show a significant improvement, over competitive approaches, in the quality of the predictions. This solution also improves the effectiveness of HOCCLUS2 in discovering biologically realistic miRNA:mRNA regulatory networks from large-scale prediction data. Using the miR-17-92 gene cluster family as a reference system and comparing results with previous experiments, we find a large increase in the number of significantly enriched biclusters in pathways, consistent with miR-17-92 functions.ConclusionThe proposed approach proves to be fundamental for the computational discovery of miRNA regulatory networks from large-scale predictions. This paves the way to the systematic application of HOCCLUS2 for a comprehensive reconstruction of all the possible multiple interactions established by miRNAs in regulating the expression of gene networks, which would be otherwise impossible to reconstruct by considering only experimentally validated interactions.

Integrating bioinformatics resources for modelling Human non-coding RNA networks

IntroductionNon-coding RNAs (ncRNAs) serve as regulatory molecules for a variety of biological processes. They are roughly classified into two major categories, small non-coding RNAs (sncRNAs), such as microRNAs (miRNAs), and long non-coding RNAs (lncRNAs) according to their size. The lncRNAs have a broader spectrum of functions and are, therefore, a potential new class of cancer therapeutic target [1,2]. In addition there are other different types of ncRNAs whose role is not yet clear: circular-RNA, lincRNA, scRNA, sense-intronic and vault-RNA. New advances in translational research will require an accurate understanding of the functional relationships between protein- coding and ncRNA categories, as well as sponge regulatory networks [3,4]. To achieve this goal, we have built an integrated bioinformatics knowledge base, collecting non-redundant annotations of human ncRNAs, sequences and interactors, which provides a comprehensive access to all the knowledge available concerning ncRNAs, their interaction with other molecules and associated diseases. As key characteristics, the database overcomes the problem of different nomenclatures used by different sources and provides new clues about ncRNA functions throughout interactions inferred by network reconstruction [5].MethodsncRNA interactions include physical (i.e. molecular bindings between ncRNAs and DNA, RNAs or proteins) and functional relationships (i.e., co-expression, regulation, associated diseases, statistical and functional associations). Interactions stored in the database are in the form 'ncRNAs-mate', where the mate entity belongs to one of the following types: ncRNA, protein coding RNA (pcRNA), gene, protein, pseudogene and phenotype. In order to ensure the data quality of our interaction database we have developed a series of Extraction Transformation and Loading (ETL) modules able to extract, collect and integrate primary annotations, sequences and interactions from different public biological resources.The biological extracted entities and their relations are modelled as a network, a mathematical object composed by nodes (entities) and edges (relations) [5]. Entities redundancy has been identified by cross-link references and sequence similarity using the Cleanup software [6]. Non- coding RNAs are classified in biotypes, associated to Sequence Ontology terms [7] and integrated with data of protein coding RNAs (pcRNAs), gene, protein, pseudogene and phenotype. Furthermore, we extended the cross-reference network with data provided by Ensembl [8], using the biomaRt library of BioConductor [9].ResultsTotal amount of different entities collected in our interaction database are: 168.058 ncRNA , 5.009 pcRNA, 52.811 genes, 1.999 proteins, 15.940 pseudogenes and 849 phenotype.Moreover, total amount of interactions, based on mate type cardinalities, include: 130.383 ncRNA- ncRNA, 55.048 ncRNA-pcRNA, 1.458.925 ncRNA-gene, 99.653 ncRNA-protein, 70.482 ncRNA-phenotype, 17.217 ncR

Italian EMBnet node: AGM2011 report

ITB, the Institute of Biomedical Technologies, is an institute of the Italian National Research Council[1] (CNR); it is composed by 4 sections, located in Milano (from where the Institute is directed), Bari, Pisa and Padova. The ITB-Bari (Bioinformatics and Genomics) is the national node of EMBnet in Italy: Domenica D'Elia is the node manager, and Andreas Gisel is a regular member.

Learning to combine miRNA target predictions: A semi-supervised ensemble learning approach (Discussion Paper)

Link prediction in network data is a data mining task which is receiving significant attention due to its applicability in various do- mains. An example can be found in social network analysis, where the goal is to identify connections between users. Another application can be found in computational biology, where the goal is to identify previ- ously unknown relationships among biological entities. For example, the identification of regulatory activities (links) among genes would allow bi- ologists to discover possible gene regulatory networks. In the literature, several approaches for link prediction can be found, but they often fail in simultaneously considering all the possible criteria (e.g. network topol- ogy, nodes properties, autocorrelation among nodes). In this paper we present a semi-supervised data mining approach which learns to combine the scores returned by several link prediction algorithms. The proposed solution exploits both a small set of validated examples of links and a huge set of unlabeled links. The application we consider regards the identification of links between genes and miRNAs, which can contribute to the understanding of their roles in many biological processes. The specific application requires to learn from only positively labeled examples of links and to face with the high unbalancing between labeled and unla- beled examples. Results show a significant improvement with respect to single prediction algorithms and with respect to baseline combination.

Mining Spatial Association Rules for Composite Motif Discovery

Mining spatial association rules of multiple co-occurring motifs to discover cis-regulatory modules

Biological activities are typically co-regulated by several factors and this feature is properly reflected by higher-order structures called cis-regulatory modules (CRM) and represented by non-random clusters of regulatory motifs. Several methods have been proposed for the de novo discovery of modules. We propose an alternative approach based on the discovery of rules which define strong spatial associations between single motifs and suggest the structure of a module. Rules are expressed in a first-order logic formalism and are mined by means of an inductive logic programming (ILP) system. We also propose computational solutions to two issues: the hard discretization of numerical inter-motif distances and the choice of a minimum support threshold. All methods have been implemented and integrated in a prototypal tool designed to support biologists in the discovery and characterization of cis-regulatory modules.

mirNET: a web-based system for the analysis of miRNA:mRNA regulatory networks

Motivation: Understanding mechanisms and functions of microRNAs (miRNAs) is fundamental for the elucidation of many biological processes and of etiopathology of some diseases, such as tumors and neurodegenerative syndromes. We have developed a new biclustering algorithm, i.e. HOCCLUS2 [1], which is able to significantly correlate multiple miRNAs and their target genes to identify potential miRNA:mRNA regulatory networks. More recently, we developed a new probabilistic classifier [2] working in the semi-supervised ensemble learning setting, which allowed us to apply HOCCLUS2 on large-scale prediction data. In order to allow the researchers to exploit the obtained results, we have started to develop a web-based system, called mirNET, for the efficient query, retrieval, export, visualization and analysis of the discovered regulatory networks.Method: In [2], we presented a method which learns to combine the score of several prediction algorithms, in order to improve the reliability of the predicted interactions. The approach works in the semi-supervised ensemble learning setting which exploits information conveyed by both labeled (validated interactions, from miRTarBase) and unlabeled (predicted interactions, from mirDIP) instances. The algorithm HOCCLUS2 exploits the large set of produced predictions, with the associated probability, to extract a set of hierarchically organized biclusters. The construction of the hierarchy is performed by an iterative merging, considering both distance and density-based criteria.Extracted biclusters are also ranked on the basis of the p-values obtained by the Student's T-Test which compares intra- and inter- functional similarity of miRNA targets, computed on the basis of the gene classification provided in Gene Ontology (GO).mirNET database relies on PostgreSQL DBMS, while the web-based platform is built through the Play 2.2 Java framework and the Cytoscape library.Results: The mirNET database stores the set of interactions identified in [2] and the biclusters extracted by HOCCLUS2 from such set of interactions, with different parameters. In particular, mirNET stores approximately 5 million predicted interactions between 934 human miRNAs and 30,875 mRNAs, which are exploited in the construction of the hierarchies of biclusters representing potential miRNA regulatory networks.The mirNET web interface allows users to perform extraction and visualization of single interactions (with the score/probability assigned by the learning algorithm) and of biclusters of interest, as well as to easily browse whole biclusters hierarchies. Biclusters hierarchy browsing (i.e., navigation among parents and children biclusters) helps to identify intrinsic hierarchical organization of miRNAs in each specific context. The interface for the analysis of biclusters also provides a graph-based visualization of the predicted miRNA-gene interaction network. The database query system provides a series of filters to facilitate and re

Modern research requires the use of standards: the CHARME project and its aims

IntroductionOne of the most pressing need of modern research is efficient data sharing and integration, but tools and legislation about standards and Standard Operating Procedures (SOPs) that scientists can currently use are still not enough comprehensive, efficient and well harmonised.The production of omics data, by the use of High Throughput Next Generation Sequencing Technologies (HT-NGS), is providing incredible amounts of data at a rate much higher than the one taken to the scientists for to analyse and interpret them. This unbalanced situation prevents the scientific community to exploit the full potential that this "data production revolution" provides.The use of standards for the production and publication of research data is essential to maximize the results of research efforts and technology transfer because only standards can assure and ensure quality, efficient sharing and data interoperability. Reproducibility is essential for good research practices and reproducibility of data and experimental procedures can be obtained only if research data are produced and published by adhering to well established quality standards. Standards and SOPs must be key elements of any research project and be adopted in lab procedure as well as for in silico data production, storage and analysis.The CHARME project: "Harmonising standardisation strategies to increase efficiency and competitiveness of European life-science research", is a COST Action (CA 15110) whose main goal is to unite experts from all areas of scientific research and strategic development (academia, industry, policy, legal, ethical, etc.), joining their expertise to address needs and challenges along the value chain for life sciences research across Europe. The objectives are to address main gaps in standards and SOPs in different research domains, co-ordinate current research efforts in this field, integrate different stakeholder groups in CHARME's activities, co-develop a common research roadmap on quality management and standardization to provide the European Commission the support for the positioning of Europe as a "leading partner" in international standardisation and standardisation activities in life sciences, including input for technology transfer and cooperation with private enterprises.MethodsTo achieve the CHARME's 4-year vision, an integrated project strategy has been designed to ensure de-centralised decision-making and enhanced cooperation between the different stakeholders and partners. The leverage of the COST Action CHARME relates to four pillars: 1) the creation of a network of all relevant stakeholder groups involved in standardisation, to exchange and harmonise activities; 2) the development of a cross-cutting education and training strategy to raise awareness and facilitate the implementation of standards and SOPs; 3) strengthening of innovation creation and technology transfer; 4) strategy development to urge the implementation of standards and SOPs. This

Multi-type clustering for the identification of lncRNA-disease relationships

IntroductionHigh-throughput sequencing technology, alongside new or improved computational methods, have been crucial for rapid advances in functional genomics. Among the most important results obtained thanks to the introduction of these new technologies, there is the discovery of thousands of non-coding RNAs (ncRNAs) whose function is pivotal for the fine-tuning of the expression of many genes that guide cell development, differentiation, apoptosis and proliferation [2]. Therefore, in the last decade, the number of papers reporting evidences about ncRNAs involvement in human complex diseases, such as cancer, is grown at an exponential rate. Among the different classes of ncRNAs, the most investigated one is that of microRNAs (miRNAs), which are small molecules (20-22nt long) that regulate the expression of genes through the modulation of the translation of their transcripts [4]. Much less is known about the functional involvement of long non-coding RNAs (lncRNAs), represented by RNA molecules longer than 200 nt, that have been recently discovered to have a plethora of regulatory functions spanning from chromatin modifications to post-transcriptional regulation [8]. However, the number of lncRNAs for which the functional characterization is available is still quite poor. Assessing the role and, especially, the molecular mechanisms underlining the involvement of lncRNAs in human diseases, is not a trivial task.Most of existing approaches are based on expensive experimental evaluations or on computational methods which exploit known/verified relationships among the lncRNA and the disease [6]. However, because of the complex functional interactions that lncRNAs can establish with other regulatory RNAs (i.e., miRNAs) or proteins, considering only the evidences of a direct relationship between lncRNAs and diseases may be very limiting. Some recent works started to consider further related information, but they do not consider possible dependencies among the relationships, but analyze single relationships independently. This corresponds to the assumption that all the instances follow the same probability distribution and that are independent to each other. In this case such assumption is easily violated, since different lncRNAs can be involved in the development of the same disease, as well as different diseases can be related to each other on the basis of the involvement of common lncRNAs or other regulatory entities such as miRNAs. To overcome these limitations we propose a computational method which is able to predict possibly unknown relationships between lncRNA and diseases by exploiting different in- formation about an heterogeneous set of (related) biological entities. In particular, we focus on lncRNAs, miRNAs, target genes and diseases, as well as on known relationships among these entities (see Figure 1). The proposed method is based on a clustering algorithm which is able to group objects of multiple types and to predict possibly

nc-aReNA: an integrated bioinformatics platform for non-coding RNA-seq data classification and annotation

High-throughput technologies (HT), such as microarray and especially Next-Generation Sequencing (NGS) technologies, have provided tremendous potential for profiling protein-coding and non- protein coding RNAs (ncRNAs). Recent reports of the ENCODE project underline that while 80% of the human genome is transcribed, only 2% is protein coding, suggesting that the vast majority of the genome is transcribed as non-protein-coding RNA.We present the development of a web-based bioinformatics platform, nc-aReNA, for the mapping, classification and annotation of human and mouse ncRNAs from HT-NGS data. The platform is based on a data-warehouse approach and workflow environment that includes data quality control, genome and nc-RNAome sequence alignment, differential expression profiling analysis and statistics of classified data.MethodsThe nc-aReNA architecture is based on a modular analysis pipeline, flanked by a data-warehouse, for the classification and annotation of small-RNAseqdata. The pipeline takes in input the sequenced reads in FASTQ format. After the initial steps of adaptor removal and quality check, the input reads are mapped to an in-house non-redundant ncRNA reference database (http://ncRNAdb.ba.itb.cnr.it) which collects and integrates ncRNA gene lists, from MGI (Mouse Genome Informatics) and HGNC (Human Genome Nomenclature Committee), with sequences and biotype annotations from VEGA (Vertebrate Genome Annotation), ENSEMBL, RefSeq, RFam (for tRNA sequence) and miRBase (for miRNA). NGS reads mapped in this step are classified by using Sequence Ontology (SO) (Eilbeck K. et al., 2005). Unmapped reads are aligned to the reference genome and tagged to the corresponding genomic locus.Integrated statistics are used for RPM (Reads Per Million), fold changes and False Discovery Rate (FDR) corrected p-values calculation and differential expression analysis of all (or user-chosen) ncRNA classes, by comparing two or more experimental conditions or time-courses data.An additional module, called "miRNA identification", provides the analysis of all unmapped miRNA-like reads by mean of the miRDeep2 software.All the analysis results and annotation are stored in a data-warehouse implemented with Infobright (http://www.infobright.org). A user-friendly web-based Graphical User Interface (GUI), developed by using the JAVA platform, guides the user in the submission process and displays results in tables and graphs.ResultsThe main features of the nc-aReNA are:- identification and classification of reads in known functional ncRNA categories in SO;- identification and filtering of reads mapping to ribosomal RNAs and mtDNA transcripts;- RPMs calculation for each known ncRNA;- the export of user-selected classesof ncRNA for further specific investigation;- quantification of ncRNAs expression and differential expression analysis for all identified ncRNAclasses;- graphical visualization of sample expression profiles;- additional annot

Network reconstruction for the identification of miRNA:mRNA interaction networks

Network reconstruction from data is a data mining task which is receiving a significant attention due to its applicability in several domains. For example, it can be applied in social network analysis, where the goal is to identify connections among users and, thus, sub-communities. Another example can be found in computational biology, where the goal is to identify previously unknown relationships among biological entities and, thus, relevant interaction networks. Such task is usually solved by adopting methods for link prediction and for the identification of relevant sub-networks. Focusing on the biological domain, in [4] and [3] we proposed two methods for learning to combine the output of several link prediction algorithms and for the identification of biological significant interaction networks involving two important types of RNA molecules, i.e. microRNAs (miRNAs) and messenger RNAs (mRNAs). The relevance of this application comes from the importance of identifying (previously unknown) regulatory and cooperation activities for the understanding of the biological roles of miRNAs and mRNAs. In this paper, we review the contribution given by the combination of the proposed methods for network reconstruction and the solutions we adopt in order to meet specific challenges coming from the specific domain we consider. © 2014 Springer-Verlag.

NonCode aReNA DB: a non-redundant and integrated collection of non-coding RNAs

MOTIVATION:The recent availability of next generation sequencing (NGS) technologies, has provided the scientific community with an unprecedented opportunity for large-scale analysis of genome in a large number of organisms. One of the most challenging task for bioinformaticians is to develop tools that provide biologists with an easy access to curated and non-redundant collections of sequence data.Non-coding RNAs, for a long time believed to be not-functional, are emerging as the most large and important family of gene regulators.METHODS:NonCode aReNA DataBase is a comprehensive and non-redundant source of manually curated and automatically annotated ncRNA transcripts collected from major public resources.The database is built through a set of ETL (Extraction Transformation Loading) automated processes which extracts and collects data from VEGA, ENSEMBL, RefSeq, miRBase, GtRNAdb and piRNABank. The automatic process guarantees also recurring updates.The identification of redundant sequences is made by analyzing both cross-link references and sequence similarity. Furthermore non-coding RNA sequences have been classified in diverse biotypes and associated to Sequence Ontology terms.NonCode aReNA DataBase is originally developed as a component of a bigger project, represented by a datawarehouse and an analysis workflow, for the functional annotation of ncRNAs from NGS data.RESULTS:NonCode aReNA Database is currently available as a web-resource at http://ncrnadb.ba.itb.cnr.it/. The database can be queried by using multi-criteria and ontological search, through an easy-to-use web interface. Query results can be exported as non-redundant collections of ncRNA transcripts.Currently NonCode aReNA DataBase contains 134,908 human ncRNAs classified in 24 biotypes, and next updates will include transcripts of Mus musculus and Arabidopsis thaliana

Plant responses to two diverse viruses involve different DNA methylation profiles

The establishment and maintenance of DNA methylation are relatively well understood whereas little is known about their dynamics and biological relevance in innate immunity. In plants, modulation of DNA methylation might be an effective mechanism to regulate gene expression in response to abiotic and biotic stresses. Recent evidence through large-scale epigenomicapproaches indicate that dynamic DNA methylation changes are not limited to gene imprinting but can regulate the plant's immune system in response to pathogens. In plants, virus infections trigger expression and regulation of non-coding smallRNAs, and genomic regions are epigenetically modified through the action of the same molecules; however, the involvement of DNA methylation in regulation of plant immune system in response to virus infection was not investigated before. We have examined for the first time the impact of virus infections on genomic DNA methylation and the correlation with smallRNA regulation and gene expression by integrating together analysis of multiple "omics" datasets based on next-generation sequencing platforms. To investigate the possibility that DNA methylation dynamically responds to virus infection, we performed whole-genome bisulfite sequencing on Arabidopsis leaves systemically infected with either the DNA genome virus Cauliflower mosaic virus (CaMV-Arabidopsis) or the RNA virus Cucumber mosaic virus (CMV-Arabidopsis). Single-base resolution methylome analysis revealed more than 3.7million methyl-cytosines (mCs) for the control plant. Interestingly in CMV Arabidopsis we found 300.000 more mCs (hypermethylated) and in CaMV-Arabidopsis 700.000 mCs less (hypomethylated). Focusing on differentially methylated regions (DMR, 250nt in length) we observed a balanced distribution of hyper- and hypomethylation in CG and CHH context in CMV-Arabidopsis (total DMRs 2700) but in CaMV-Arabidopsis we have predominantly hypomethylated DMRs in CHH context (total DMRs 5600). Gene features including coding, non-coding and promoter sequences were assigned to unique gene identifiers according to the TAIR nomenclature. Among differentially methylated gene features, promoter regions were the vast majority, accounting, in specific mCs contexts, for up to 80% of the total. The whole gene ID dataset was subjected to gene functional enrichment analysis by using the DAVID package tool. Interestingly, definite functional categories such as "plant defense" and "auxin signalling pathway" resulted significantly enriched. The correlation between the DNA methylation status and the transcriptional modulation of those genes is under investigation. A comparison between methylation profiles induced by either CaMV or CMV infections revealed conspicuous qualitative and quantitative differences. Taken together our results indicate that RNA- and DNA-genome virus infection induce different regulation of DNA methylation and, at least in part, different immune response in Arabidopsis.

Preface to Abstract Book - BITS 2010, VII Annual Meeting of the Bioinformatics Italian Society

Preface to Abstract Book - Next Generation Sequencing Workshop

Publicity and Public Relations Project Committee: AGM2011 report

The main mission of the P&PR PC is to nurture and promote EMBnet's image at large. The P&PR PC is responsible for promoting any type of EMBnet activities, for the advertisement of products and services provided by the EMBnet community, as well as for proposing and developing new strategies aiming to enhance EMBnet's visibility, and to take care of public relationships with EMBnet communities and related networks/societies. In this document, we report proposals, activities and achievements of the committee from June 2010 to May 2011.

Report of the EMBnet AGM 2011 Workshop

The 2011 AGM workshop took place at the Instituto Gulbenkian de Ciência (IGC) in Oeiras, Portugal, from 23-25 May (Figure 1). The goal of the workshop was to build on the demonstrable progress made during the previous year, in particular by helping to deliver on some of the plans outlined during the 2010 AGM. It was also an opportunity to build on our commitment to take EMBnet forward by embracing new partners and new activities. The following pages summarise the workshop content, discussions and conclusions.

Semi-supervised ensemble learning to boost miRNA target predictions

The huge amount of data produced by the advent of Next Generation Sequencing (NGS) technologies is providing scientists with an unprecedented potential to investigate and shed light on remote secrets of genomes. We have developed a new tool based on biclustering techniques, i.e. HOCCLUS2 which is able to significantly correlate multiple miRNAs and their predicted targets to detect potential miRNA:mRNA regulatory modules. However, experiments performed on predicted interactions led to observe that the noise (i.e., false positives) introduced by prediction algorithms can substantially affect the significance of the discovered modules. In order to overcome this issue, we have developed a probabilistic method which is able to build a more reliable dataset, combining data produced by several well-known prediction algorithms. The main goal of this work is to combine the prediction score of several prediction algorithms in a single stronger classifier, in order to improve the reliability of the obtained predictions. This tool could greatly help in the interpretation of NGS miRNAs profile analysis with respect to their effects by using genome-wide predictions of their targets.

Social Database for Biodiversity

Biodiversity research concerns with data coming from many different domains (e.g., Biology, Geography, Evolutionary Studies, Genomics, Taxonomy, Environmental Sciences, etc.) which need to be integrated for leading to valuable Biodiversity knowledge. Collecting and integrating data from so many heterogeneous resources is not a trivial task. Data are extremely scattered, heterogeneous in format and purpose, and protected in repositories of several research institutes. Driven by the widely diffused trend of the web of sharing information through aggregation of people with the same interests (social networks), and by the new type of database architecture defined as dynamic distributed federated database, we are proposing a new paradigm of data integration in the Biodiversity domain. Here we present a new approach for the development of a Knowledge Base aiming to the collection, integration and analysis of biodiversity data implemented as a product of the MBLab project.

Standardisation in Life Sciences - with CHARME towards a unified standardisation European strategy

MotivationStandardisation and quality management are important drivers in the life sciences and biotechnology, as onlydata generated with minimum quality assurance can be easily implemented into industrial applications.Standards assure and ensure that data become easily accessible, shareable and comparable along the valuechain. Reproducibility, standards and standard operating procedures (SOPs) in data generation and analysisare challenging topics of the modern research and bioinformatics. Only by the use of common standardsEuropean life science research will improve its efficiency and competitiveness. In the past years severalinitiatives have been launched to develop and implement standards in life science research. Unfortunately,these efforts remain fragmented and largely disconnected. The Cost Action CHARME (CA15110) isdesigned as a central interface between existing standardisation initiatives, and as an intermediary betweenexisting parallel efforts. The goals of CHARME are: 1) to develop a framework for standardisation ofstandards and formats in life sciences; 2) community building/networking of scientific standardisationinitiatives; 3) development of a common understanding/definition of the subject matter; 4) to create a commonplatform for all stakeholders, for sustainable and efficient cooperation on standardisation and standardisation;5) mediation between the scientific standardisation initiatives and the competent standardisation bodies andstandards committees' activities (including input from stakeholders, e.g. standardisation bodies, policymakers, regulators, users); 6) the analysis and classification of existing (or developing) community standardsin the field of systems biology and computational modelling (and beyond); 7) to support the positioning ofEurope as a "leading partner" in international standardisation and standardisation activities in the life sciences(including input for future market applications and cooperation with private enterprises).MethodsCHARME's pan-European network unites experts from all areas of scientific research and strategicdevelopment (academia, industry, policy, legal, ethical, etc.), joining their expertise to address needs andchallenges along the value chain for life sciences across Europe. To achieve the Action's 4-year vision, anintegrated project strategy has been designed to ensure de-centralised decision-making and enhancedcooperation between the different stakeholders and partners. The leverage of the COST Action CHARMErelates to four pillars: 1) the creation of a network of all relevant stakeholder groups involved instandardisation, to exchange and harmonise activities; 2) the development of a cross-cutting education andtraining strategy to raise awareness and facilitate the implementation of standards and SOPs; 3) strengtheningof innovation creation and technology transfer; 4) strategy development to urge the implementation ofstandards and SOPs. This will be achieve

The integration of microRNA target data by biclustering techniques opens new roads for signaling networks analysis

MicroRNAs (miRNAs) are key modulators of gene expression. In addition to their recognised role in embryonic and adult cell proliferation and differentiation (Ren et al., 2009), many recent studies on diverse types of human cancer have demonstrated that miRNAs are functionally integrated into those oncogenic pathways that are central to tumorogenesis (Olive et al., 2010). Although microarray profiling and next generation sequencing technologies have allowed researchers to discover much of their structural and functional features as well as many new miRNAs, the current challenge is to understand their specific biological functions and mechanisms through which they are able to ensure cell homeostasis and to control developmental timing and cancer progression. This is not a trivial task because the post-transcriptional regulation of gene expression mediated by miRNAs is rarely resolved by a simple one-to-one interaction between a miRNA and a target gene. It is much more complex, often involving multiple binding of the same miRNA and/or of different miRNAs in a cooperative manner. The combinatorial effects of different miRNAs on the same gene, or on different genes of the same pathway, is an essential part of the mechanism through which they are able to fine-tune signaling pathways (Inui et al., 2010). Indeed, the effect of a miRNA may change depending on which other miRNAs are co-expressed or silenced, which in turn depends on the specific context in which the cell, the tissue or the organism is considered. This makes the interpretation of miRNAs expression profile really difficult and a mere analysis of the list of differentially expressed genes cannot provide enough information to elucidate the multiplicity of potential miRNA:mRNA interactions. In this context, the exploitation of data mining techniques, and in particular of biclustering algorithms, is considered as a useful approach to search the correlations among miRNAs and mRNAs. However, as each miRNA may target hundreds of genes, the selection of the most significant results for further experimental validations still remains a challenging task for many biologists.The proposed method, which is implemented in the system HOCCLUS2, has been designed to analyse data of miRNA:mRNA interactions (derived from expression arrays or from large sets of predictions) in order to detect significant co-regulatory partnerships. In particular, the aim is to provide the biologists with a tool which can support them in two challenging tasks, that is, the detection of actual miRNAs target genes and the identification of the context-specific co-associations of different miRNAs. A further contribution to the considered research consists in the ranking of the extracted biclusters on the basis of the semantic similarity between the target genes, which allows the biologists to easily select the most significant results, from a biological view point.Availability: http://www.di.uniba.it/~ceci/micFiles/systems/HOCCLU

The NonCode aReNA DB: a non-redundant and integrated collection of non- codingRNAs

The recent availability of high throughput tech- nologies, like next generation sequencing (NGS) platforms, has providedthescientific community with an unprecedented opportunity for large- scale analysis of genome in a large number of organisms.However,among others, one of the most challenging task for bioinformaticians is to developtools that providebiologists withaneasy access to curated and non-redundant collec- tions of sequence data.Non-coding RNAs, for a long time believed tobe not-functional, are emerging as themost large and important family of gene regulators. NonCode aReNA Database is a comprehensive and non-redundant source ofmanually curated and automatically annotated ncRNA transcripts. Originally developed as a component of a big- ger project, composed by a datawarehouse for the functional annotation of ncRNAs fromNGS data, NonCode aReNA DB is currently availableas a web-resource at http://ncrnadb.ba.itb.cnr. it/. Sequences have been classified in diverse biotypes and associated to SequenceOntology terms. The database can be queried by using multi-criteria and ontological search, through an easy-to-use web interface, and data exported as non-redundant collections of transcripts an- notated in VEGA, ENSEMBL, RefSeq, miRBase, GtRNAdb and piRNABank. The database is up- dated through an automatic pipeline and last updatewasonJanuary 2015. PresentlyNonCode aReNA DB contains 134,908 human ncRNAs clas- sified in 24 biotypes, and next update will include transcripts ofMusmusculus and Arabidopsis thal- iana.AcknowledgementsThis work was supported by the Italian MIUR Flagship Project "Epigen".

Ruolo

Organizzazione

Dipartimento

Area Scientifica

Settore Scientifico Disciplinare

Settore ERC 1° livello

Settore ERC 2° livello

Settore ERC 3° livello

Ruolo

Organizzazione

Dipartimento

Area Scientifica

Settore Scientifico Disciplinare

Settore ERC 1° livello

Settore ERC 2° livello

Settore ERC 3° livello

50 PUBBLICAZIONI

2010 Annual General Meeting - Publicity & Public...

2014 Annual General Meeting: Publicity & Public Relations...

2016 Big Data Forum for Life and Health...

454 GS-FLX TITANIUM PLATFORM: THE EXPERIENCE OF ITB-BA

A BIOINFORMATICS WORKFLOW FOR THE ANALYSIS OF NONCODING...

A bioinformatics workflow for the analysis of transcriptome...

A novel biclustering algorithm for the discovery of...

A novel biclustering algorithm for the discovery of...

A NOVEL GENERAL-PURPOSE RNA-SEQ PROTOCOL OPTIMIZING THE DETECTION...

A platform independent RNA-Seq protocol for the detection...

A two-stepped computational approach for miRNA-gene regulatory networks...

Analysis of Gene-microRNA Maps for Regulatory Network Discovery

BiP-Day 2013: "Prima Giornata della Bioinformatica Pugliese" -...

BITS 2012: Ninth Annual Meeting of the Bioinformatics...

BITS 2010 - VII Annual Meeting of the...

ComiRNet - The Database of Predicted miRNA Regulatory...

ComiRNet: a web-based system for the analysis of...

ComiRNet: a database of predicted miRNA:mRNA interactions and...

Comparison of co-regulation of DNA methylation and smallRNA...

Discovery of miRNA-Gene regulatory networks by using an...

DNA and RNA viruses infection in plant: two...

Effects of edible plant microRNAs on cancer cell...

EMBnet.journal - Bioinformatics in Action

Hierarchical and Overlapping Co-Clustering of mRNA:miRNA Interactions.

HOCCLUS2

HOCCLU2: a data mining tool for easily handling...

Identification of new p53 regulatory networks through NGS...

Improving the Prediction of miRNA:mRNA Interactions by Exploiting...

Integrating microRNA target predictions for the discovery of...

Integrating bioinformatics resources for modelling Human non-coding RNA...

Italian EMBnet node: AGM2011 report

Learning to combine miRNA target predictions: A semi-supervised...

Mining Spatial Association Rules for Composite Motif Discovery

Mining spatial association rules of multiple co-occurring motifs...

mirNET: a web-based system for the analysis of...

Modern research requires the use of standards: the...

Multi-type clustering for the identification of lncRNA-disease relationships

nc-aReNA: an integrated bioinformatics platform for non-coding RNA-seq...

Network reconstruction for the identification of miRNA:mRNA interaction...

NonCode aReNA DB: a non-redundant and integrated collection...

Plant responses to two diverse viruses involve different...

Preface to Abstract Book - BITS 2010, VII...

Preface to Abstract Book - Next Generation Sequencing...

Publicity and Public Relations Project Committee: AGM2011 report

Report of the EMBnet AGM 2011 Workshop

Semi-supervised ensemble learning to boost miRNA target predictions

Social Database for Biodiversity

Standardisation in Life Sciences - with CHARME towards...

The integration of microRNA target data by biclustering...

The NonCode aReNA DB: a non-redundant and integrated...

0 PROGETTI

0 BREVETTI

0 SPINOFF