Persone Apulia Research Gate

Sabino Liuni

Torna indietro

Ruolo

I livello - Dirigente Tecnologo

Organizzazione

Consiglio Nazionale delle Ricerche

Dipartimento

Non Disponibile

Area Scientifica

AREA 05 - Scienze biologiche

Settore Scientifico Disciplinare

BIO/11 - Biologia Molecolare

Settore ERC 1° livello

LS - LIFE SCIENCES

Settore ERC 2° livello

LS2 Genetics, Genomics, Bioinformatics and Systems Biology: Molecular and population genetics, genomics, transcriptomics, proteomics, metabolomics, bioinformatics, computational biology, biostati stics, biological modelling and simulation, systems biology, genetic epidemiology

Settore ERC 3° livello

LS2_10 Bioinformatics

A BIOINFORMATICS WORKFLOW FOR THE ANALYSIS OF NONCODING RNAs FROM DATA GENERATED BY DEEP-SEQUENCING

A BIOINFORMATIC APPROACH FOR NGS DATA TO ANALIZE AND VISUALIZE CHROMOSOMAL FUSION EVENTS IN HUMAN BREAST CANCER.

Cancer is a multi-stage process often driven by progressive accumulation of genomic rearrangements that can result in cells acquiring cancer properties such as tumor invasive and metastatic behavior. Many genes associated with cancer are the result of complex somatically and inherited chromosomal rearrangements, resulting in aberrant transcripts or defects in transcription [1-5]. The classical approach for the identification of genome rearrangements such as G-banded cytogenetics, spectral karyotyping and FISH, are poor in sensitivity, while copy number array can identify just imbalanced breakpoints and do not describe the resulted genome structure produced by the events, which may cause the breakpoints. The aim of this project is to obtain, by the paired-end mapping (PEM) approach applied to the massive parallel sequencing, an high resolution virtual karyotype of the genome of a breast-cancer-patient of which we obtained previously the transcriptomic portrait [6].The introduction of massively parallel high throughput sequencing (HTS) techniques have created a broad range of new and exciting research applications by increasing the output sequencing data dramatically. In recent years, the continuous technical improvements of next-generation sequencing technology have made RNA sequencing (RNA-seq) particularly effective for the detection of gene fusions, which are involved in several diseases. Gene fusions are found in many cancer types, and they have proved to be prognostic biomarkers in several studies [7-9]. In addition, gene fusions have often a direct functional impact on the molecular processes in the cell [10].Several analysis steps are needed to process the data provided by the sequencer and to use them for robust gene fusion detection.We propose a workflow to analyze NGS paired-end sequences in order to identify possible candidates to be the results of a fusion between different genes, looking for fusion events occurring on the same chromosome (intra-chromosomal rearrangement).The basic idea is to map the reads onto the reference genome and to study the insert size length distribution of the paired-end, looking at its peak and select all the mapping pairs having an insert size value quite far from the observed peak. In this way we are sure to select paired-end sequences mapping on different regions of the genome far from each other connecting different genes.

A bioinformatics workflow for the analysis of transcriptome data generated by deep-sequencing

The huge amount of transcript data produced by high-throughput sequencing requires the development and implementation of suitable bioinformatic workflows for their analysis and interpretation. These analysis workflows, including different modules, should be specifically designed also based on the sequencing platform (Roche 454, Illumina, SOLiD) and the nature of the data (polyA or total RNA fraction, strand specificity). In the case of cDNA obtained from a total RNA preparation, in addition to polyadenylated protein coding mRNAs, a great variety of transcript sequences can be obtained, including ribosomal RNAs, mitochondrial transcripts and a large variety of functional non coding RNAs (ncRNAs). To deal with these data the analysis workflow should include specific modules to distinguish ncRNAs fractions from the large number of other functional proteincoding transcripts. To this aim we developed an analysis pipeline that, given as input a large collection of reads (particularly from Roche 454), provides the expression profile at qualitative and quantitative level of human mtDNA, ribosomal RNAs, ncRNAs and protein coding mRNAs.

A fuzzy method for RNA-Seq differential expression analysis in presence of multireads

When the reads obtained from high-throughput RNA sequencing are mapped against a reference database, a significant proportion of them - known as multireads - can map to more than one reference sequence. These multireads originate from gene duplications, repetitive regions or overlapping genes. Removing the multireads from the mapping results, in RNA-Seq analyses, causes an underestimation of the read counts, while estimating the real read count can lead to false positives during the detection of differentially expressed sequences.ResultsWe present an innovative approach to deal with multireads and evaluate differential expression events, entirely based on fuzzy set theory. Since multireads cause uncertainty in the estimation of read counts during gene expression computation, they can also influence the reliability of differential expression analysis results, by producing false positives. Our method manages the uncertainty in gene expression estimation by defining the fuzzy read counts and evaluates the possibility of a gene to be differentially expressed with three fuzzy concepts: over-expression, same-expression and under-expression. The output of the method is a list of differentially expressed genes enriched with information about the uncertainty of the results due to the multiread presence.We have tested the method on RNA-Seq data designed for case-control studies and we have compared the obtained results with other existing tools for read count estimation and differential expression analysis.ConclusionsThe management of multireads with the use of fuzzy sets allows to obtain a list of differential expression events which takes in account the uncertainty in the results caused by the presence of multireads. Such additional information can be used by the biologists when they have to select the most relevant differential expression events to validate with laboratory assays. Our method can be used to compute reliable differential expression events and to highlight possible false positives in the lists of differentially expressed genes computed with other tools.

A NOVEL GENERAL-PURPOSE RNA-SEQ PROTOCOL OPTIMIZING THE DETECTION OF TRANSCRIPTOME EXPRESSION COMPLEXITY

Recent studies have demonstrated an unexpected complexity of transcription in eukaryotes.Indeed the majority of the genome is transcribed and only a little fraction of these transcripts isannotated as protein coding genes and their splice variants. Therefore high throughput transcriptomesequencing continuously identifies novel RNAs and novel classes of RNAs, which are the result ofantisense, overlapping and non-coding RNA expression, demonstrating that the transcriptomecaptures a level of complexity that the simple genome sequence may not (1).Among next-generation sequencing platforms, the latest series of Roche 454 GS Sequencer, the GSFLX Titanium FLX+, allows to obtain in each run over a million reads, each with a length up to 700base. Sequences of such length, providing connectivity information among splicing sites, in additionto enabling accurate mapping and relative quantification of mRNAs, are particularly suitable for thecharacterization of full-length splicing variants that may be differently expressed inphysiopathological conditions (2). On the other hand the higher throughput of the Illumina HiSeq1000 (150 bp) and ABI SOLID (75 bp) platforms, makes them particularly suitable for transcriptslevel quantification and for small RNAs sequencing.Irrespectively of the NGS platform used, the first step required for transcriptome sequencing is theconstruction of a cDNA library. Several protocols have been developed so far to this aim and eachof them is suitable for sequencing on a specific platform exclusively.Here we describe a new fast and simple method (Patent pending RM2010A000293-PCT/IB2011/052369) to prepare and amplify a representative and strand-specific cDNA librarystarting from low input total RNA (500ng) for RNA-Seq applications, that may be implemented withall major platforms currently available (Roche 454, Illumina, ABI/Solid).Our method includes the following steps: a) rRNA removal from total RNA b) retrotranscription ofthe rRNA-depleted RNA to cDNA with 5' phosphorylated Tag-random-octamers custom designedcapable of preserving strand information; c) single-strand cDNAs purification; d) ligation andamplification of the purified cDNAs, thus obtaining high yield of concatamers around 20kb long.These DNA molecules can be equally sequenced both with Illumina and Roche 454 sequencingplatforms allowing not only the quantitative but also the qualitative assessment of the transcriptomecomplexity.Moreover, we developed a suitable bioinformatic pipeline for the analysis of the sequences producedupon application of this protocol. Indeed, we developed an in house python script, named Tag_Find(available upon request), able to recognize the position and the type of tag found within the readsequence. The program returns out two files, one containing the type of tags found and their readspositions and one fastq file with non-tagged reads, cleaned up from tags. The Tag_Find efficiency

A platform independent RNA-Seq protocol for the detection of transcriptome complexity

Recent studies have demonstrated an unexpected complexity of transcription in eukaryotes. The majority of the genome is transcribed and only a little fraction of these transcripts is annotated as protein coding genes and their splice variants. Indeed, most transcripts are the result of antisense, overlapping and non-coding RNA expression. In this frame, one of the key aims of high throughput transcriptome sequencing is the detection of all RNA species present in the cell and the first crucial step for RNA-seq users is represented by the choice of the strategy for cDNA library construction. The protocols developed so far provide the utilization of the entire library for a single sequencing run with a specific platform.ResultsWe set up a unique protocol to generate and amplify a strand-specific cDNA library representative of all RNA species that may be implemented with all major platforms currently available on the market (Roche 454, Illumina, ABI/SOLiD). Our method is reproducible, fast, easy-to-perform and even allows to start from low input total RNA. Furthermore, we provide a suitable bioinformatics tool for the analysis of the sequences produced following this protocol.ConclusionWe tested the efficiency of our strategy, showing that our method is platform-independent, thus allowing the simultaneous analysis of the same sample with different NGS technologies, and providing an accurate quantitative and qualitative portrait of complex whole transcriptomes.

BEAT: Bioinformatics Exon Array Tool to store, analyze and visualize Affymetrix GeneChip Human Exon Array data from disease experiments

It is known from recent studies that more than 90% of human multi-exon genes are subject toAlternative Splicing (AS), a key molecular mechanism in which multiple transcripts may be generated from a singlegene. It is widely recognized that a breakdown in AS mechanisms plays an important role in cellular differentiationand pathologies. Polymerase Chain Reactions, microarrays and sequencing technologies have been applied to thestudy of transcript diversity arising from alternative expression. Last generation Affymetrix GeneChip Human Exon1.0 ST Arrays offer a more detailed view of the gene expression profile providing information on the AS patterns.The exon array technology, with more than five million data points, can detect approximately one million exons,and it allows performing analyses at both gene and exon level. In this paper we describe BEAT, an integrated userfriendlybioinformatics framework to store, analyze and visualize exon arrays datasets. It combines a datawarehouse approach with some rigorous statistical methods for assessing the AS of genes involved in diseases.Meta statistics are proposed as a novel approach to explore the analysis results. BEAT is available at http://beat.ba.itb.cnr.it.Results: BEAT is a web tool which allows uploading and analyzing exon array datasets using standard statisticalmethods and an easy-to-use graphical web front-end. BEAT has been tested on a dataset with 173 samples andtuned using new datasets of exon array experiments from 28 colorectal cancer and 26 renal cell cancer samplesproduced at the Medical Genetics Unit of IRCCS Casa Sollievo della Sofferenza.To highlight all possible AS events, alternative names, accession Ids, Gene Ontology terms and biochemicalpathways annotations are integrated with exon and gene level expression plots. The user can customize the resultschoosing custom thresholds for the statistical parameters and exploiting the available clinical data of the samplesfor a multivariate AS analysis.Conclusions: Despite exon array chips being widely used for transcriptomics studies, there is a lack of analysistools offering advanced statistical features and requiring no programming knowledge. BEAT provides a user-friendlyplatform for a comprehensive study of AS events in human diseases, displaying the analysis results with easilyinterpretable and interactive tables and graphics.

Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing

Technical advances such as the development of molecular cloning, Sanger sequencing, PCR and oligonucleotidemicroarrays are key to our current capacity to sequence, annotate and study complete organismal genomes.Recent years have seen the development of a variety of so-called 'next-generation' sequencing platforms, withseveral others anticipated to become available shortly. The previously unimaginable scale and economy of thesemethods, coupled with their enthusiastic uptake by the scientific community and the potential for further improvements in accuracy and read length, suggest that these technologies are destined to make a huge and ongoing impact upon genomic and post-genomic biology. However, like the analysis of microarray data and the assembly and annotation of complete genome sequences from conventional sequencing data, the management and analysis of next-generation sequencing data requires (and indeed has already driven) the development of informatics tools able to assemble, map, and interpret huge quantities of relatively or extremely short nucleotide sequence data.Here we provide a broad overview of bioinformatics approaches that have been introduced for several genomicsand functional genomics applications of next-generation sequencing.

BiP-Day 2013: "Prima Giornata della Bioinformatica Pugliese" - Workshop report

On 5 December 2013, a regional workshop on Bioinformatics in Apulia (BiP-Day 2013) was held in Bari (IT) under the patronage of the Italian Bioinformatics Society (BITS) and EMBnet. The aim of the workshop was to stimulate tighter collaboration between life science researchers and private biotech companies in the Apulia Region around cutting-edge topics in biological and clinical research, for which bioinformatics R&D is key.The programme was structured into three main sessions: 1) Regional development programmes and major infrastructures for Bioinformatics in the Apulia Region; 2) Bioinformatics projects in bio-medicine, biodiversity, agri-food and bioinformatics training programmes; 3) Research & Business: the importance of communication. Presentations are available from the workshop website associated to the programme (http://www.ba.itb.cnr.it/bip-day/programma/), and from the News section Presentations (http://www.ba.itb.cnr.it/bip-day/category/presentazioni/page/3/).

Effects of edible plant microRNAs on cancer cell proliferation: a beneficial cross-kingdom interaction

Diet in human health is no longer simple nutrition but, in the light of recent findings, it might play a pivotal role on cell health status by modulating apoptosis, detoxification, and appropriate gene response to environmental stresses. Epidemiological studies suggest a role of fruits and vegetables in protection against several diseases, and nutrients have been demonstrated to alter gene expression by DNA methylation and histone modifications [1-2]. Diet has also been found to modulate micro RNA (miRNA) expression, leading to a subsequent regulation of the effectors genes [3]. Furthermore, recent studies demonstrate that some plant/food-derived microRNAs (miRNAs) regulate gene expression in a sequence specific manner [4]. On the basis of all these findings, we have carried out a pilot study, using a combined "in-silico and wet" approach, to investigate the potential effects, and elucidate the molecular mechanisms, of edible plant miRNAs on the expression of human genes involved in cancer onset and progression. In the present paper we report the results obtained by transfecting 2 colon cancer cell lines, p53 wild type and p53 knock-out, with selected miRNAs of G. max, Z. mais and M. truncatula, which we found, by in silico analysis, to have a putative targeting activity on human oncogenes and tumor suppressor genes.

Identification of new p53 regulatory networks through NGS data analysis

MotivationAround 50% of all human tumours carry point mutations in the p53 tumour suppressor gene, which alter p53 DNA binding specificity. In tumours with p53 wild type, p53 is often rendered functionally inert by the inactivation of its positive modulators or by the activation of negative factors, which block p53 transcriptional activities [1]. We identified a new p53 direct target gene, TRIM8, belonging to the Tripartite Motif (TRIM) protein family, defined by the presence of a RING domain, one or two B-boxes and a Coiled-Coil region. We found that TRIM8 overexpression leads, through a positive feedback loop, to p53 stabilization and p53-mediated suppression of cell proliferation. In order to identify the pathways activated by TRIM8 leading to p53 stabilization we transiently transfected with TRIM8 the HCT116-p53 (wt) cell line, and sequenced the total transcriptome performing a NGS run on a 454 GS FLX platform. Here we report some statistics and the preliminary results of: i) reads mapping on the human genome and analysis of differential expressed genes; ii) functional analysis of differentially expressed genes. MethodTotal RNA was extracted from HCT116-p53 (wt) cell line 48h after transfection, depleted of rRNA, retro-transcribed, amplified and sequenced by using the pyrosequencer Roche GS FLX Titanium Series. Genome mapping, statistics and differential expression analyses were performed by using the "NGS-Trex" system (NGS Transcriptome profile Explorer) (Mignone F. et al., submitted), a automatic system designed for analyzing Next Generation Sequencing data generated from large-scale transcriptome studies. The overall procedure involves three steps: 1) creation of a project and upload of reads in a multi-fasta format; 2) reads mapping onto the reference genome after setup of appropriate parameters; 3) annotation of mapped reads; 3) data mining by using simple query forms. TRIM8 and FLAG data were submitted to NGS-Trex using default parameters that can briefly summarized as follows: reads were mapped onto human genome (min similarity 90% and min overlap 50 nt) discarding reads mapping onto more than 10 genomic regions. Mapped reads were compared to annotation to assign reads to genes and to identify new splice variants. Differentially expressed genes and splicing events were identified by computing a P-value associated to an hypergeometric distribution. Housekeeping genes were used to normalise reads count before identification of differentially expressed genes. The lists of genes showing a differential expression in the two samples were then analysed by using DAVID v(6.7), an integrated biological knowledgebase and analytic tools (text and pathway-mining tools) for large gene list functional annotation [2,3]. An additional analysis on TRIM8 and FLAG sequence samples was made for the detection and annotation of the ncRNA genome fraction. We used a bioinformatic analysis pipeline, developed by us, which is able to: 1) select ncRNA fro

Integrating bioinformatics resources for modelling Human non-coding RNA networks

IntroductionNon-coding RNAs (ncRNAs) serve as regulatory molecules for a variety of biological processes. They are roughly classified into two major categories, small non-coding RNAs (sncRNAs), such as microRNAs (miRNAs), and long non-coding RNAs (lncRNAs) according to their size. The lncRNAs have a broader spectrum of functions and are, therefore, a potential new class of cancer therapeutic target [1,2]. In addition there are other different types of ncRNAs whose role is not yet clear: circular-RNA, lincRNA, scRNA, sense-intronic and vault-RNA. New advances in translational research will require an accurate understanding of the functional relationships between protein- coding and ncRNA categories, as well as sponge regulatory networks [3,4]. To achieve this goal, we have built an integrated bioinformatics knowledge base, collecting non-redundant annotations of human ncRNAs, sequences and interactors, which provides a comprehensive access to all the knowledge available concerning ncRNAs, their interaction with other molecules and associated diseases. As key characteristics, the database overcomes the problem of different nomenclatures used by different sources and provides new clues about ncRNA functions throughout interactions inferred by network reconstruction [5].MethodsncRNA interactions include physical (i.e. molecular bindings between ncRNAs and DNA, RNAs or proteins) and functional relationships (i.e., co-expression, regulation, associated diseases, statistical and functional associations). Interactions stored in the database are in the form 'ncRNAs-mate', where the mate entity belongs to one of the following types: ncRNA, protein coding RNA (pcRNA), gene, protein, pseudogene and phenotype. In order to ensure the data quality of our interaction database we have developed a series of Extraction Transformation and Loading (ETL) modules able to extract, collect and integrate primary annotations, sequences and interactions from different public biological resources.The biological extracted entities and their relations are modelled as a network, a mathematical object composed by nodes (entities) and edges (relations) [5]. Entities redundancy has been identified by cross-link references and sequence similarity using the Cleanup software [6]. Non- coding RNAs are classified in biotypes, associated to Sequence Ontology terms [7] and integrated with data of protein coding RNAs (pcRNAs), gene, protein, pseudogene and phenotype. Furthermore, we extended the cross-reference network with data provided by Ensembl [8], using the biomaRt library of BioConductor [9].ResultsTotal amount of different entities collected in our interaction database are: 168.058 ncRNA , 5.009 pcRNA, 52.811 genes, 1.999 proteins, 15.940 pseudogenes and 849 phenotype.Moreover, total amount of interactions, based on mate type cardinalities, include: 130.383 ncRNA- ncRNA, 55.048 ncRNA-pcRNA, 1.458.925 ncRNA-gene, 99.653 ncRNA-protein, 70.482 ncRNA-phenotype, 17.217 ncR

ITSoneDB: a specialized ITS1 database for amplicon-based metagenomic characterization of environmental fungal communities

Motivations. Metagenomics is experiencing an explosive improvement from the advent of high-throughput next-generation sequencing (NGS) technologies which allows an unprecedented large-scale identification of microorganisms living in almost every environment. In particular, the use of amplicon-based metagenomic approach to explore the diversity of fungal environmental communities is increasingly expanding. At the species level, a number of studies have used the non-conserved internal transcribed spacers (ITS) 1 and 2 of the ribosomal RNA genes cluster as genetic markers to explore the fungal taxonomic diversity. Particularly, ITS1 is gaining an increasing popularity as better discriminating species marker in Fungi because of its higher variability compared to ITS2. Starting from the total DNA extracted from any environmental sample, this locus can be easily amplified with taxonomically universal primers and sequenced by means of high-throughput next generation platforms. Reference databases and robust supporting taxonomies are crucial in assigning phylogenetic affiliation to the huge amount of produced sequences. Even if a large number of ITS1 sequences are collected in public databases, a specialized resource focused particularly on this region, where sequences identity, boundaries and taxonomic assignment are validated, is still needed at present. In this work we present ITSoneDB, a new comprehensive collection of ITS1 sequences belonging to Fungi Kingdom.Methods. ITSoneDB has been generated and populated using a multi-step Python workflow. In the first step the ribosomal RNA gene cluster sequences of Fungi including the target ITS1 region were retrieved from Genbank. Then, ITS1 start and end boundaries were extracted from the Features Tables annotations, if available. In order to infer, validate and, eventually, redesign the ITS1 location, Hidden Markov Model (HMM) profiles of flanking genes for 18S and 5.8S ribosomal RNA, generated from their reference alignments stored in RFAM database, were mapped on the entire collection of retrieved nucleotide sequences, by means of the hmmsearch tool from HMMER 3.0 package.Results. At present, ITSoneDB includes 405,433 taxonomically arranged sequence entries provided with ITS1 both start and end positions defined by GenBank annotations and/or HMM based method. ITSoneDB front-end is a JAVA platform-based website for data browsing and downloading. The database can be queried by species or taxon name, GenBank accession ID or by "expanding" the target rank on a detailed fungal taxonomical tree. The complete ITS1 sequences dataset collected in ITSoneDB is available in Fasta format and the users can extract and locally save all or selected queried ITS1 sequences for further analysis.

Managing NGS differential expression uncertainty with fuzzy sets

When the reads obtained from high-throughput sequencing are mapped against a reference database, some of them - known as multireads - can map to more than one reference sequence. This event occurs because genomes contains many repeated portions and reads are generally shorter than reference sequences. Removing the multireads from the mapping results causes an underestimation of the read counts, while estimating the real read count can lead to false positives during the detection of differentially expressed sequences.

Meta-Analysis of Differential Connectivity in Gene Co-Expression Networks in Multiple Sclerosis

Differential gene expression analyses to investigate multiple sclerosis (MS) molecular pathogenesis cannot detect genes harboring genetic and/or epigenetic modifications that change the gene functions without affecting their expression. Differential co-expression network approaches may capture changes in functional interactions resulting from these alterations. We re-analyzed 595 mRNA arrays from publicly available datasets by studying changes in gene co-expression networks in MS and in response to interferon (IFN)-beta treatment. Interestingly, MS networks show a reduced connectivity relative to the healthy condition, and the treatment activates the transcription of genes and increases their connectivity in MS patients. Importantly, the analysis of changes in gene connectivity in MS patients provides new evidence of association for genes already implicated in MS by single-nucleotide polymorphism studies and that do not show differential expression. This is the case of amiloride-sensitive cation channel 1 neuronal (ACCN1) that shows a reduced number of interacting partners in MS networks, and it is known for its role in synaptic transmission and central nervous system (CNS) development. Furthermore, our study confirms a deregulation of the vitamin D system: among the transcription factors that potentially regulate the deregulated genes, we find TCF3 and SP1 that are both involved in vitamin D3-induced p27Kip1 expression. Unveiling differential network properties allows us to gain systems-level insights into disease mechanisms and may suggest putative targets for the treatment.

nc-aReNA: an integrated bioinformatics platform for non-coding RNA-seq data classification and annotation

High-throughput technologies (HT), such as microarray and especially Next-Generation Sequencing (NGS) technologies, have provided tremendous potential for profiling protein-coding and non- protein coding RNAs (ncRNAs). Recent reports of the ENCODE project underline that while 80% of the human genome is transcribed, only 2% is protein coding, suggesting that the vast majority of the genome is transcribed as non-protein-coding RNA.We present the development of a web-based bioinformatics platform, nc-aReNA, for the mapping, classification and annotation of human and mouse ncRNAs from HT-NGS data. The platform is based on a data-warehouse approach and workflow environment that includes data quality control, genome and nc-RNAome sequence alignment, differential expression profiling analysis and statistics of classified data.MethodsThe nc-aReNA architecture is based on a modular analysis pipeline, flanked by a data-warehouse, for the classification and annotation of small-RNAseqdata. The pipeline takes in input the sequenced reads in FASTQ format. After the initial steps of adaptor removal and quality check, the input reads are mapped to an in-house non-redundant ncRNA reference database (http://ncRNAdb.ba.itb.cnr.it) which collects and integrates ncRNA gene lists, from MGI (Mouse Genome Informatics) and HGNC (Human Genome Nomenclature Committee), with sequences and biotype annotations from VEGA (Vertebrate Genome Annotation), ENSEMBL, RefSeq, RFam (for tRNA sequence) and miRBase (for miRNA). NGS reads mapped in this step are classified by using Sequence Ontology (SO) (Eilbeck K. et al., 2005). Unmapped reads are aligned to the reference genome and tagged to the corresponding genomic locus.Integrated statistics are used for RPM (Reads Per Million), fold changes and False Discovery Rate (FDR) corrected p-values calculation and differential expression analysis of all (or user-chosen) ncRNA classes, by comparing two or more experimental conditions or time-courses data.An additional module, called "miRNA identification", provides the analysis of all unmapped miRNA-like reads by mean of the miRDeep2 software.All the analysis results and annotation are stored in a data-warehouse implemented with Infobright (http://www.infobright.org). A user-friendly web-based Graphical User Interface (GUI), developed by using the JAVA platform, guides the user in the submission process and displays results in tables and graphs.ResultsThe main features of the nc-aReNA are:- identification and classification of reads in known functional ncRNA categories in SO;- identification and filtering of reads mapping to ribosomal RNAs and mtDNA transcripts;- RPMs calculation for each known ncRNA;- the export of user-selected classesof ncRNA for further specific investigation;- quantification of ncRNAs expression and differential expression analysis for all identified ncRNAclasses;- graphical visualization of sample expression profiles;- additional annot

NonCode aReNA DB: a non-redundant and integrated collection of non-coding RNAs

MOTIVATION:The recent availability of next generation sequencing (NGS) technologies, has provided the scientific community with an unprecedented opportunity for large-scale analysis of genome in a large number of organisms. One of the most challenging task for bioinformaticians is to develop tools that provide biologists with an easy access to curated and non-redundant collections of sequence data.Non-coding RNAs, for a long time believed to be not-functional, are emerging as the most large and important family of gene regulators.METHODS:NonCode aReNA DataBase is a comprehensive and non-redundant source of manually curated and automatically annotated ncRNA transcripts collected from major public resources.The database is built through a set of ETL (Extraction Transformation Loading) automated processes which extracts and collects data from VEGA, ENSEMBL, RefSeq, miRBase, GtRNAdb and piRNABank. The automatic process guarantees also recurring updates.The identification of redundant sequences is made by analyzing both cross-link references and sequence similarity. Furthermore non-coding RNA sequences have been classified in diverse biotypes and associated to Sequence Ontology terms.NonCode aReNA DataBase is originally developed as a component of a bigger project, represented by a datawarehouse and an analysis workflow, for the functional annotation of ncRNAs from NGS data.RESULTS:NonCode aReNA Database is currently available as a web-resource at http://ncrnadb.ba.itb.cnr.it/. The database can be queried by using multi-criteria and ontological search, through an easy-to-use web interface. Query results can be exported as non-redundant collections of ncRNA transcripts.Currently NonCode aReNA DataBase contains 134,908 human ncRNAs classified in 24 biotypes, and next updates will include transcripts of Mus musculus and Arabidopsis thaliana

PlantPIs - An Interactive Web Resource on Plant Protease Inhibitors

PlantPIs is a web querying system for a database collection of plant protease inhibitors data. Protease inhibitors in plants are naturally occurring proteins that inhibit the function of endogenous and exogenous proteases. In this paper the design and development of a web framework providing a clear and very flexible way of querying plant protease inhibitors data is reported. The web resource is based on a relational database, containing data of plants protease inhibitors publicly accessible, and a graphical user interface providing all the necessary browsing tools, including a data exporting function. PlantPIs contains information extracted principally from MEROPS database, filtered, annotated and compared with data stored in other protein and gene public databases, using both automated techniques and domain expert evaluations. The data are organized to allow a flexible and easy way to access stored information. The database is accessible at http://www.plantpis.ba.itb.cnr.it/.

Reference databases for taxonomic assignment in metagenomics

Metagenomics is providing an unprecedented access to the environmental microbial diversity. The amplicon-basedmetagenomics approach involves the PCR-targeted sequencing of a genetic locus fitting different features. Namely,it must be ubiquitous in the taxonomic range of interest, variable enough to discriminate between different speciesbut flanked by highly conserved sequences, and of suitable size to be sequenced through next-generation platforms.The internal transcribed spacers 1 and 2 (ITS1 and ITS2) of the ribosomal DNA operon and one or morehyper-variable regions of 16S ribosomal RNA gene are typically used to identify fungal and bacterial species, respectively.In this context, reliable reference databases and taxonomies are crucial to assign amplicon sequence reads tothe correct phylogenetic ranks. Several resources provide consistent phylogenetic classification of publicly available16S ribosomal DNA sequences, whereas the state of ribosomal internal transcribed spacers reference databases isnotably less advanced. In this review, we aim to give an overview of existing reference resources for both types ofmarkers, highlighting strengths and possible shortcomings of their use for metagenomics purposes. Moreover, wepresent a new database, ITSoneDB, of well annotated and phylogenetically classified ITS1 sequences to be used asa reference collection in metagenomic studies of environmental fungal communities. ITSoneDB is available for downloadand browsing at http://itsonedb.ba.itb.cnr.it/.

Regulation of the expression of CLU isoforms in endometrial proliferative diseases

Clusterin (CLU) is a nearly ubiquitous multifunctional protein synthesized in different functionally divergent isoforms, sCLU and nCLU, playing a crucial role by keeping a balance between cell proliferation and death. Studying in vivo CLU expression we found a higher mRNA expression both in neoplastic and hyperplastic tissues in comparison to normal endometria; in particular, by RT-qPCR we demonstrated an increase of the specific sCLU isoform in the neoplastic and hyperplastic endometrial diseases. On the contrary, no CLU increase was detected at the protein level. The CLU gene transcriptional activity was upregulated in the hyperplastic and neoplastic tissues, indicating the existence of a fine post-trans-criptional regulation of CLU expression possibly responsible for the protein decrease in the malignant disease. A specific CLU immunoreactivity was present in all the endometrial glandular cells in comparison to the other cellular compartments where CLU immunoreactivity was lower or absent. In conclusion, our results suggest the existence of a complex regulatory mechanism of CLU gene expression during the progression from normal to malignant cells, possibly contributing to endometrial carcinogenesis. Moreover, the specific alteration of the sCLU:nCLU ratio associated with the pathological stage, suggests a possible usage of CLU as molecular biomarker for the diagnosis/prognosis of endometrial proliferative diseases.

The NonCode aReNA DB: a non-redundant and integrated collection of non- codingRNAs

The recent availability of high throughput tech- nologies, like next generation sequencing (NGS) platforms, has providedthescientific community with an unprecedented opportunity for large- scale analysis of genome in a large number of organisms.However,among others, one of the most challenging task for bioinformaticians is to developtools that providebiologists withaneasy access to curated and non-redundant collec- tions of sequence data.Non-coding RNAs, for a long time believed tobe not-functional, are emerging as themost large and important family of gene regulators. NonCode aReNA Database is a comprehensive and non-redundant source ofmanually curated and automatically annotated ncRNA transcripts. Originally developed as a component of a big- ger project, composed by a datawarehouse for the functional annotation of ncRNAs fromNGS data, NonCode aReNA DB is currently availableas a web-resource at http://ncrnadb.ba.itb.cnr. it/. Sequences have been classified in diverse biotypes and associated to SequenceOntology terms. The database can be queried by using multi-criteria and ontological search, through an easy-to-use web interface, and data exported as non-redundant collections of transcripts an- notated in VEGA, ENSEMBL, RefSeq, miRBase, GtRNAdb and piRNABank. The database is up- dated through an automatic pipeline and last updatewasonJanuary 2015. PresentlyNonCode aReNA DB contains 134,908 human ncRNAs clas- sified in 24 biotypes, and next update will include transcripts ofMusmusculus and Arabidopsis thal- iana.AcknowledgementsThis work was supported by the Italian MIUR Flagship Project "Epigen".

TRIM8 restores p53 tumour suppressor function by blunting N-MYC activity in chemo-resistant tumours

TRIM8 plays a key role in controlling the p53 molecular switch that sustains the transcriptional activation of cell cycle arrest genes and response to chemotherapeutic drugs. The mechanisms that regulate TRIM8, especially in cancers like clear cell Renal Cell Carcinoma (ccRCC) and colorectal cancer (CRC) where it is low expressed, are still unknown. However, recent studies suggest the potential involvement of some microRNAs belonging to miR-17-92 and its paralogous clusters, which could include TRIM8 in a more complex pathway.MethodsWe used RCC and CRC cell models for in-vitro experiments, and ccRCC patients and xenograft transplanted mice for in vivo assessments. To measure microRNAs levels we performed RT-qPCR, while steady-states of TRIM8, p53, p21 and N-MYC were quantified at protein level by Western Blotting as well as at transcript level by RT-qPCR. Luciferase reporter assays were performed to assess the interaction between TRIM8 and specific miRNAs, and the potential effects of this interaction on TRIM8 expression. Moreover, we treated our cell models with conventional chemotherapeutic drugs or tyrosine kinase inhibitors, and measured their response in terms of cell proliferation by MTT and colony suppression assays.ResultsWe showed that TRIM8 is a target of miR-17-5p and miR-106b-5p, whose expression is promoted by N-MYC, and that alterations of their levels affect cell proliferation, acting on the TRIM8 transcripts stability, as confirmed in ccRCC patients and cell lines. In addition, reducing the levels of miR-17-5p/miR-106b-5p, we increased the chemo-sensitivity of RCC/CRC-derived cells to anti-tumour drugs used in the clinic. Intriguingly, this occurs, on one hand, by recovering the p53 tumour suppressor activity in a TRIM8-dependent fashion and, on the other hand, by promoting the transcription of miR-34a that turns off the oncogenic action of N-MYC. This ultimately leads to cell proliferation reduction or block, observed also in colon cancer xenografts overexpressing TRIM8.ConclusionsIn this paper we provided evidence that TRIM8 and its regulators miR-17-5p and miR-106b-5 participate to a feedback loop controlling cell proliferation through the reciprocal modulation of p53, miR-34a and N-MYC. Our experiments pointed out that this axis is pivotal in defining drug responsiveness of cancers such ccRCC and CRC.

UTRdb and UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs

The 5' and 3' untranslated regions of eukaryotic mRNAs (UTRs) play crucial roles in the post-trans-criptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization and message stability. UTRdb is a curated database of 5' and 3' untranslated sequences of eukaryotic mRNAs, derived from several sources of primary data. Experimentally validated functional motifs are annotated and also collated as the UTRsite database where more specific information on the functional motifs and cross-links to interacting regulatory protein are provided. In the current update, the UTR entries have been organized in a gene-centric structure to better visualize and retrieve 5' and 3' UTR variants generated by alter-native initiation and termination of transcription and alternative splicing. Experimentally validated miRNA targets and conserved sequence elements are also annotated. The integration of UTRdb with genomic data has allowed the implementation of an efficient annotation system and a powerful retrieval resource for the selection and extraction of specific UTR subsets. All internet resources implemented for retrieval and functional analysis of 5' and 3' untranslated regions of eukaryotic mRNAs are accessible at http://utrdb.ba.itb.cnr.it/.

WoPPER: Web server for Position Related data analysis of gene Expression in Prokaryotes.

The structural and conformational organization of chromosomes is crucial for gene expression regulation in eukaryotes and prokaryotes as well. Up to date, gene expression data generated using either microarray or RNA-sequencing are available for many bacterial genomes. However, differential gene expression is usually investigated with methods considering each gene independently, thus not taking into account the physical localization of genes along a bacterial chromosome. Here, we present WoPPER, a web tool integrating gene expression and genomic annotations to identify differentially expressed chromosomal regions in bacteria. RNA-sequencing or microarray-based gene expression data are provided as input, along with gene annotations. The user can select genomic annotations from an internal database including 2780 bacterial strains, or provide custom genomic annotations. The analysis produces as output the lists of positionally related genes showing a coordinated trend of differential expression. Graphical representations, including a circular plot of the analyzed chromosome, allow intuitive browsing of the results. The analysis procedure is based on our previously published R-package PREDA. The release of this tool is timely and relevant for the scientific community, as WoPPER will fill an existing gap in prokaryotic gene expression data analysis and visualization tools. WoPPER is open to all users and can be reached at the following URL: https://WoPPER.ba.itb.cnr.it.

Sviluppo di una piattaforma tecnologica multiplex per diagnostica molecolare, portatile ed automatizzata, basata sulla logica strumentale del Lab-on-chip, in grado di consentire applicazioni multiparametriche in campo infettivologico

Nella pratica clinica diagnostica, è sempre più avvertita l'esigenza di test di diagnostica molecolare multiparametrici, in grado di consentire con un unico campione biologico di partenza l'identificazione e l'esclusione di un dato agente patogeno od opportunista, nel contesto di una serie di possibili candidati, di natura diversa e, potenzialmente, con diverso target chimico. Allo stesso modo la crescente mole di dati genetici e genomici sulla variabilità di un singolo raggruppamento sistematico consente e orienta sempre più verso test multiparametrici anche per la diagnostica di singoli agenti patogeni e l'identificazione a livello genetico di fattori di virulenza, farmaco- o antibiotico-resistenza. Il maturare di risultati standardizzabili in questo nuovo ambito si accompagna altresì alla necessità di disporre di nuovi strumenti diagnostici, in primo luogo molecolari, per tracciare questi algoritmi complessi e definirne i principali marcatori/indicatori, tipicamente multiparametrici. Alla luc di ciò, il progetto prevede lo sviluppo di una serie di test, pannelli e profili diagnostici per patologie infettive e per quadri fisiologici condizionati da popolazioni batteriche o di altri microrganismi, basati sulla identificazione di acidi nucleici e/o marcatori circolanti o tissutali di altra natura (proteine o altre macromolecole) specifici degli agenti selezionati. I profili diagnostici saranno mirati a consentire l'identificazione rapida e specifica in definiti ambiti clinici (es, patologie respiratorie, sepsi, tumori al colon) di uno o più agenti candidati tra un pannello di possibili agenti eziologici potenzialmente responsabili di un dato quadro patologico, in un'ottica di screening, conferma o esclusione di agenti patogeni. La configurazione dei test sarà perciò di tipo multiplex ed adotterà una soluzione tecnica basata su dispositivi multiparametrici. I contenuti analitici dei pannelli multiparametrici saranno ricavati da sorgente originale dopo accurata indagine delle informazioni biologiche pubblicate allo scopo di disporre di dati precisi e potenzialmente originali per ottenere risultati diagnostici accurati, informativi e rappresentativi delle popolazioni naturali e, quindi, di disporre di metodi competitivi e a diverso grado di risoluzione analitica per diversi ambiti applicativi (screening, caratterizzazione, terapia mirata, prognosi). La ricerca a livello di caratterizzazione genomica dei target di interesse inseriti nei menu dei singoli pannelli diagnostici sarà quindi affiancata dall'elaborazione bioinformatica delle informazioni ottenute per ricavarne tool diagnostici. I pannelli diagnostici predisposti verranno dapprima validati utilizzando piattaforme esistenti e consolidate (pannelli di metodi basati su real time PCR multiplex con profili termici omogenei piattaforma beads array Luminex) allo scopo di una prima validazione della specificità e della accuratezza dei reagenti sviluppati, e al fine di disporre di risultati intermedi (metodi diagnostici su piattaforme consolidate) già dotati di potenzialità applicative. In parallelo ai moduli di ricerca relativi ai contenuti diagnostici, verrà sviluppato un dispositivo o una serie di dispositivi basati sul concetto del LOC, atti ad ospitare test multiparametrici, con un grado di complessità scalabile da 10 a 100 distinte determinazioni su uno stesso campione biologico. Il dispositivo sarà concepito come un lab-on-a-chip basato sui seguenti principi tecnici e di design: - soluzioni microfluidiche per quanto riguarda la mobilitazione e la gestione di reagenti - meccanismi di movimentazione, meccanica, elettronica e dinamica termica che verranno selezionati tra diverse opzioni - integrazione tra processamento/estrazione del campione biologico, retro-trascrizione e amplificazione (PCR isotermica) degli acidi nucleici - detection mediante sistemi innovativi - microprocessore per elaborazione dei dati e formulazione dei risultati

Sistemi Avanzati di Meccatronica Biomedicale di Diagnosi e Terapia Medica basati su Realtà Virtuale e Aumentata, Microelettronica, e su Laboratori robotizzati ad elevato throughput.

Il progetto di ricerca VIRTUALAB si propone lo scopo di studiare, realizzare e validare tecniche diagnostiche e terapeutiche avanzate nel campo della medicina attingendo ad una base di conoscenza, o insieme di tecnologie abilitanti, come la meccatronica, l’elaborazione delle immagini e dei segnali biomedici e fisiopatologici, la realtà virtuale, la robotica, la microelettronica, la sensoristica e il software. L’integrazione delle tecnologie meccatroniche con i risultati delle conoscenze dell’industria meccatronica verso nuove applicazioni.

Ruolo

Organizzazione

Dipartimento

Area Scientifica

Settore Scientifico Disciplinare

Settore ERC 1° livello

Settore ERC 2° livello

Settore ERC 3° livello

24 PUBBLICAZIONI

A BIOINFORMATICS WORKFLOW FOR THE ANALYSIS OF NONCODING...

A BIOINFORMATIC APPROACH FOR NGS DATA TO ANALIZE...

A bioinformatics workflow for the analysis of transcriptome...

A fuzzy method for RNA-Seq differential expression analysis...

A NOVEL GENERAL-PURPOSE RNA-SEQ PROTOCOL OPTIMIZING THE DETECTION...

A platform independent RNA-Seq protocol for the detection...

BEAT: Bioinformatics Exon Array Tool to store, analyze...

Bioinformatics approaches for genomics and post genomics applications...

BiP-Day 2013: "Prima Giornata della Bioinformatica Pugliese" -...

Effects of edible plant microRNAs on cancer cell...

Identification of new p53 regulatory networks through NGS...

Integrating bioinformatics resources for modelling Human non-coding RNA...

ITSoneDB: a specialized ITS1 database for amplicon-based metagenomic...

Managing NGS differential expression uncertainty with fuzzy sets

Meta-Analysis of Differential Connectivity in Gene Co-Expression Networks...

nc-aReNA: an integrated bioinformatics platform for non-coding RNA-seq...

NonCode aReNA DB: a non-redundant and integrated collection...

PlantPIs - An Interactive Web Resource on Plant...

Reference databases for taxonomic assignment in metagenomics

Regulation of the expression of CLU isoforms in...

The NonCode aReNA DB: a non-redundant and integrated...

TRIM8 restores p53 tumour suppressor function by blunting...

UTRdb and UTRsite (RELEASE 2010): a collection of...

WoPPER: Web server for Position Related data analysis...

2 PROGETTI

Sviluppo di una piattaforma tecnologica multiplex per diagnostica...

Sistemi Avanzati di Meccatronica Biomedicale di Diagnosi e...

0 BREVETTI

0 SPINOFF