Effettua una ricerca
Ernesto Picardi
Ruolo
Ricercatore
Organizzazione
Università degli Studi di Bari Aldo Moro
Dipartimento
DIPARTIMENTO DI BIOSCIENZE, BIOTECNOLOGIE E BIOFARMACEUTICA
Area Scientifica
AREA 05 - Scienze biologiche
Settore Scientifico Disciplinare
BIO/11 - Biologia Molecolare
Settore ERC 1° livello
Non Disponibile
Settore ERC 2° livello
Non Disponibile
Settore ERC 3° livello
Non Disponibile
RNA editing is a post-transcriptional process occurring in a wide range of organisms. In human brain, the A-to-I RNA editing, in which individual adenosine (A) bases in pre-mRNA are modified to yield inosine (I), is the most frequent event. Modulating gene expression, RNA editing is essential for cellular homeostasis. Indeed, its deregulation has been linked to several neurological and neurodegenerative diseases. To date, many RNA editing sites have been identified by next generation sequencing technologies employing massive transcriptome sequencing together with whole genome or exome sequencing. While genome and transcriptome reads are not always available for single individuals, RNA-Seq data are widespread through public databases and represent a relevant source of yet unexplored RNA editing sites. In this context, we propose a simple computational strategy to identify genomic positions enriched in novel hypothetical RNA editing events by means of a new two-steps mapping procedure requiring only RNA-Seq data and no a priori knowledge of RNA editing characteristics and genomic reads. We assessed the suitability of our procedure by confirming A-to-I candidates using conventional Sanger sequencing and performing RNA-Seq as well as whole exome sequencing of human spinal cord tissue from a single individual.
A cDNA of 312 bp, similar to polygalacturonase-inhibiting pro- teins (PGIPs), was isolated by cDNA-amplified fragment length polymorphism (cDNA-AFLP) from pea roots infected with the cyst nematode Heterodera goettingiana. The deduced amino acid sequence obtained from the complete Pspgip1 coding sequence was very similar to PGIPs described from several other plant species, and was identical in both MG103738 and Progress 9 genotypes, resistant and susceptible to H. goettingiana, respec- tively. Reverse transcription-polymerase chain reaction (RT-PCR) expression analysis revealed the differential regulation of the Pspgip1 gene in the two genotypes in response to wounding and nematode challenge. Mechanical wounding induced Pspgip1 expression in MG103738 within 8 h, but this response was delayed in Progress 9. In contrast, the response to nematode infection was more complex. The transcription of Pspgip1 was triggered rapidly in both genotypes, but the expression level returned to levels observed in uninfected plants more quickly in susceptible than in resistant roots. In addition, in situ hybridiza- tion showed that Pspgip1 was expressed in the cortical cells damaged as a result of nematode invasion in both genotypes. However, it was specifically localized in the cells bordering the nematode-induced syncytia in resistant roots. This suggests a role for this gene in counteracting nematode establishment inside the root.
BACKGROUND: Recent studies have demonstrated an unexpected complexity of transcription in eukaryotes. The majority of the genome is transcribed and only a little fraction of these transcripts is annotated as protein coding genes and their splice variants. Indeed, most transcripts are the result of antisense, overlapping and non-coding RNA expression. In this frame, one of the key aims of high throughput transcriptome sequencing is the detection of all RNA species present in the cell and the first crucial step for RNA-seq users is represented by the choice of the strategy for cDNA library construction. The protocols developed so far provide the utilization of the entire library for a single sequencing run with a specific platform. RESULTS: We set up a unique protocol to generate and amplify a strand-specific cDNA library representative of all RNA species that may be implemented with all major platforms currently available on the market (Roche 454, Illumina, ABI/SOLiD). Our method is reproducible, fast, easy-to-perform and even allows to start from low input total RNA. Furthermore, we provide a suitable bioinformatics tool for the analysis of the sequences produced following this protocol. CONCLUSION: We tested the efficiency of our strategy, showing that our method is platform-independent, thus allowing the simultaneous analysis of the same sample with different NGS technologies, and providing an accurate quantitative and qualitative portrait of complex whole transcriptomes
Alternative splicing is emerging as a major mechanism for the expansion of the transcriptome and proteome diversity, particularly in human and other vertebrates. However, the proportion of alternative transcripts and proteins actually endowed with functional activity is currently highly debated. We present here a new release of ASPicDB which now provides a unique annotation resource of human protein variants generated by alternative splicing. A total of 256 939 protein variants from 17 191 multi-exon genes have been extensively annotated through state of the art machine learning tools providing information of the protein type (globular and transmembrane), localization, presence of PFAM domains, signal peptides, GPI-anchor propeptides, transmembrane and coiled-coil segments. Furthermore, full-length variants can be now specifically selected based on the annotation of CAGE-tags and polyA signal and/or polyA sites, marking transcription initiation and termination sites, respectively. The retrieval can be carried out at gene, transcript, exon, protein or splice site level allowing the selection of data sets fulfilling one or more features settled by the user. The retrieval interface also enables the selection of protein variants showing specific differences in the annotated features. ASPicDB is available at http://www.caspur.it/ASPicDB/.
New genomes are being sequenced at an increasingly rapid rate, far outpacing the rate at which manual gene annotation can be performed. Automated genome annotation is thus necessitated by this growth in genome projects; however, full-fledged annotation systems are usually home-grown and customized to a particular genome. There is thus a renewed need for accurate ab initio gene prediction methods. However, it is apparent that fully ab initio methods fall short of the required level of sensitivity and specificity for a quality annotation. Evidence in the form of expressed sequences gives the single biggest improvement in accuracy when used to inform gene predictions. Here, we present a lightweight pipeline for first-pass gene prediction on newly sequenced genomes. The two main components are ASPic, a program that derives highly accurate, albeit not necessarily complete, EST-based transcript annotations from EST alignments, and GeneID, a standard gene prediction program, which we have modified to take as evidence intron annotations. The introns output by ASPic CDS predictions is given to GeneID to constrain the exon-chaining process and produce predictions consistent with the underlying EST alignments. The pipeline was successfully tested on the entire C. elegans genome and the 44 ENCODE human pilot regions.
De novo high-throughput pyrosequencing was used to detect and characterize 2009 pandemic influenza A (H1N1) virus directly in nasophatyngeal swabs in the context of the microbial community Data were generated with a prior sequence-independent amplification by 454 pyrosequencing on GS-FLX platform (Roche). Influenza A assembled reads allowed near full-length genome reconstruction with the simultaneous analysis of site-specific heterogeneity. The molecular approach applied proved to be a powerful tool to characterize the new pandemic H1N1 influenza virus in clinical samples. This approach could be of great value in identifying possibly new reassortants that may occur in the near future.
Philadelphia (Ph+) positive leukaemias are an example of haematological malignant diseases where different chromosomal rearrangements involving both BCR and ABL1 genes generate a variety of chimeric proteins (BCR/ABL1 p210, p190 and p230) which are considered pathological "biomarkers". In addition to these three, there is a variety of fusion transcripts whose origin may depend either on diverse genetic rearrangement or on alternative/atypical splicing of the main mRNAs or on the occurrence of single-point mutations. Although the therapy of Philadelphia+ leukaemias based on Imatinib represents a triumph of medicine, not all patients benefit from such drug and may show resistance and intolerance. Furthermore, interruption of Imatinib administration is often followed by clinical relapse, suggesting a failure in the eradication of residual leukaemic stem cells. Therefore, while the targeted therapy is searching for new and implemented pharmacological inhibitors covering all the possible mutations in the kinase domain, there is urge to identify alternative molecular targets to develop other specific and effective therapeutic approaches. In this review we discuss the importance of recent advances based on the discovery of novel BCR/ABL1 variants and their potential role as new targets/biomarkers of Ph+ leukaemias in the light of the current therapeutic trends. The limits of the pharmacological inhibitors used for treating the disease can be overcome by considering other targets than the kinase enzyme. Our evaluations highlight the potential of alternative perspectives in the therapy of Ph+ leukaemias.
Technical advances such as the development of molecular cloning, Sanger sequencing, PCR and oligonucleotide microarrays are key to our current capacity to sequence, annotate and study complete organismal genomes. Recent years have seen the development of a variety of so-called 'next-generation' sequencing platforms, with several others anticipated to become available shortly. The previously unimaginable scale and economy of these methods, coupled with their enthusiastic uptake by the scientific community and the potential for further improvements in accuracy and read length, suggest that these technologies are destined to make a huge and ongoing impact upon genomic and post-genomic biology. However, like the analysis of microarray data and the assembly and annotation of complete genome sequences from conventional sequencing data, the management and analysis of next-generation sequencing data requires (and indeed has already driven) the development of informatics tools able to assemble, map, and interpret huge quantities of relatively or extremely short nucleotide sequence data. Here we provide a broad overview of bioinformatics approaches that have been introduced for several genomics and functional genomics applications of next-generation sequencing.
Differences in the inherent properties of adipose tissue-derived stem cells (ASC) may contribute to the biological specificity of the subcutaneous (Sc) and visceral (V) adipose tissue depots. In this study, three distinct subpopulations of ASC, i.e. ASCSVF, ASCBottom, and ASCCeiling, were isolated from Sc and V fat biopsies of non-obese subjects, and their gene expression and functional characteristics were investigated. Genome-wide mRNA expression profiles of ASCSVF, ASCBottom and ASCCeiling from Sc fat were significantly different as compared to their homologous subsets of V-ASCs. Furthermore, ASCSVF, ASCCeiling and ASCBottom from the same fat depot were also distinct from each other. In this respect, both principal component analysis and hierarchical clusters analysis showed that ASCCeiling and ASCSVF shared a similar pattern of closely related genes, which was highly different when compared to that of ASCBottom. However, larger variations in gene expression were found in inter-depot than in intra-depot comparisons. The analysis of connectivity of genes differently expressed in each ASC subset demonstrated that, although there was some overlap, there was also a clear distinction between each Sc-ASC and their corresponding V-ASC subsets, and among ASCSVF, ASCBottom, and ASCCeiling of Sc or V fat depots in regard to networks associated with regulation of cell cycle, cell organization and development, inflammation and metabolic responses. Finally, the release of several cytokines and growth factors in the ASC cultured medium also showed both inter- and intra-depot differences. Thus, ASCCeiling and ASCBottom can be identified as two genetically and functionally heterogeneous ASC populations in addition to the ASCSVF, with ASCBottom showing the highest degree of unmatched gene expression. On the other hand, inter-depot seem to prevail over intra-depot differences in the ASC gene expression assets and network functions, contributing to the high degree of specificity of Sc and V adipose tissue in humans.
The genome sequence of a Sphingobium strain capable of tolerating high concentrations of Ni ions, and exhibiting natural kanamycin resistance, is presented. The presence of a transposon derived kanamycin resistance gene and several genes for efflux-mediated metal resistance may explain the observed characteristics of the new Sphingobium isolate.
ExpEdit is a web application for assessing RNA editing in human at known or user-specified sites supported by transcript data obtained by RNA-Seq experiments. Mapping data (in SAM/BAM format) or directly sequence reads [in FASTQ/short read archive (SRA) format] can be provided as input to carry out a comparative analysis against a large collection of known editing sites collected in DARNED database as well as other user-provided potentially edited positions. Results are shown as dynamic tables containing University of California, Santa Cruz (UCSC) links for a quick examination of the genomic context.
Clear cell renal cell carcinoma (ccRCC) is the most common malignant renal epithelial tumor and also the most deadly. To identify molecular changes occurring in ccRCC, in the present study we performed a genome wide analysis of its entire complement of mRNAs. Gene and exon-level analyses were carried out by means of the Affymetrix Exon Array platform. To achieve a reliable detection of differentially expressed cassette exons we implemented a novel methodology that considered contiguous combinations of exon triplets and candidate differentially expressed cassette exons were identified when the expression level was significantly different only in the central exon of the triplet. More detailed analyses were performed for selected genes using quantitative RT-PCR and confocal laser scanning microscopy. Our analysis detected over 2,000 differentially expressed genes, and about 250 genes alternatively spliced and showed differential inclusion of specific cassette exons comparing tumor and non-tumoral tissues. We demonstrated the presence in ccRCC of an altered expression of the PTP4A3, LAMA4, KCNJ1 and TCF21 genes (at both transcript and protein level). Furthermore, we confirmed, at the mRNA level, the involvement of CAV2 and SFRP genes that have previously been identified. At exon level, among potential candidates we validated a differentially included cassette exon in DAB2 gene with a significant increase of DAB2 p96 splice variant as compared to the p67 isoform. Based on the results obtained, and their robustness according to both statistical analysis and literature surveys, we believe that a combination of gene/isoform expression signature may remarkably contribute, after suitable validation, to a more effective and reliable definition of molecular biomarkers for ccRCC early diagnosis, prognosis and prediction of therapeutic response.
Epstein-Barr virus (EBV) latently infects the majority of the human population and is implicated as a causal or contributory factor in numerous diseases. We sequenced 27 complete EBV genomes from a cohort of Multiple Sclerosis (MS) patients and healthy controls from Italy, although no variants showed a statistically significant association with MS. Taking advantage of the availability of ~130 EBV genomes with known geographical origins, we reveal a striking geographic distribution of EBV sub-populations with distinct allele frequency distributions. We discuss mechanisms that potentially explain these observations, and their implications for understanding the association of EBV with human disease.
RNA editing is a widespread post-transcriptional molecular phenomenon that can increase proteomic diversity, by modifying the sequence of completely or partially non-functional primary transcripts, through a variety of mechanistically and evolutionarily unrelated pathways. Editing by base substitution has been investigated in both animals and plants. However, conventional strategies based on directed Sanger sequencing are time-consuming and effectively preclude genome wide identification of RNA editing and assessment of partial and tissue-specific editing sites. In contrast, the high-throughput RNA-Seq approach allows the generation of a comprehensive landscape of RNA editing at the genome level. Short reads from Solexa/Illumina GA and ABI SOLiD platforms have been used to investigate the editing pattern in mitochondria of Vitis vinifera providing significant support for 401 C-to-U conversions in coding regions and an additional 44 modifications in non-coding RNAs. Moreover, 76% of all C-to-U conversions in coding genes represent partial RNA editing events and 28% of them were shown to be significantly tissue specific. Solexa/Illumina and SOLiD platforms showed different characteristics with respect to the specific issue of large-scale editing analysis, and the combined approach presented here reduces the false positive rate of discovery of editing events.
MitoZoa is a relational database collecting curated metazoan entries of complete or nearly complete mitochondrial genomes (mtDNA), specifically designed to assist comparative studies of mitochondrial genome-level features in a given taxon or in congeneric species of Metazoa. The principal novelties of MitoZoa are extensive corrections/improvements of the mtDNA annotations and the possibility of easily searching for data on: (1) gene order, a genomic feature useful as phylogenetic marker; (2) sequence, size and location of non-coding regions, likely containing the regulatory signals for mtDNA replication and transcription; (3) mt features/sequences of congeneric species, where saturation phenomena in nucleotide substitutions and gene order changes are expected to be absent or at least minimal. In addition, MitoZoa allows the exploration of basic mt features such as molecule topology, genetic code, gene content, and compositional parameters of the entire genome. Finally, in order to facilitate downstream analyses of retrieved data, MitoZoa entry lists can be visualized and downloaded in a tabular format, while sequences and gene order data are provided in FASTA and FASTA-like formats, respectively. The MitoZoa database is available at http://www.caspur.it/mitozoa. (C) 2010 Elsevier B.V. and Mitochondria Research Society. All rights reserved.
Abstract Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/.
Background: A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing accuracy as well as efficiency in time and space, when large clusters of more than 20,000 ESTs and genes longer than 1 Mb are processed. Traditionally, the problem has been faced by combining different tools, not specifically designed for this task. Results: We propose a fast method based on ad hoc procedures for solving the problem. Our method combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are largely confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings, that are sequences obtained from paths of a graph structure, called embedding graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the length of P and T and in the size of the output. The method was implemented into the PIntron package. PIntron requires as input a genomic sequence or region and a set of EST and/or mRNA sequences. Besides the prediction of the full-length transcript isoforms potentially expressed by the gene, the PIntron package includes a module for the CDS annotation of the predicted transcripts. Conclusions: PIntron, the software tool implementing our methodology, is available at http://www.algolab.eu/ PIntron under GNU AGPL. PIntron has been shown to outperform state-of-the-art methods, and to quickly process some critical genes. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when benchmarked with ENCODE annotations.
Adenine to Inosine RNA editing is a widespread co- and post-transcriptional mechanism mediated by ADAR enzymes acting on double stranded RNA. It has a plethora of biological effects, appears to be particularly pervasive in humans with respect to other mammals, and is implicated in a number of diverse human pathologies. Here we present the first human inosinome atlas comprising 3,041,422 A-to-I events identified in six tissues from three healthy individuals. Matched directional total-RNA-Seq and whole genome sequence datasets were generated and analysed within a dedicated computational framework, also capable of detecting hyper-edited reads. Inosinome profiles are tissue specific and edited gene sets consistently show enrichment of genes involved in neurological disorders and cancer. Overall frequency of editing also varies, but is strongly correlated with ADAR expression levels. The inosinome database is available at: http://srv00.ibbe.cnr.it/editing/.
BACKGROUND: The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). METHODS: In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). RESULTS: Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs
RNA editing is a post-transcriptional molecular process whereby the information in a genetic message is modified from that in the corresponding DNA template by means of nucleotide substitutions, insertions and/ or deletions. It occurs mostly in organelles by clade-specific diverse and unrelated biochemical mechanisms. RNA editing events have been annotated in primary databases as GenBank and at more sophisticated level in the specialized databases REDIdb, dbRES and EdRNA. At present, REDIdb is the only freely available database that focuses on the organellar RNA editing process and annotates each editing modification in its biological context. Here we present an updated and upgraded release of REDIdb with a web-interface refurbished with graphical and computational facilities that improve RNA editing investigations. Details of the REDIdb features and novelties are illustrated and compared to other RNA editing databases. REDIdb is freely queried at http:// biologia.unical.it/py_script/REDIdb/.
SUMMARY: The reliable detection of RNA editing sites from massive sequencing data remains challenging and, although several methodologies have been proposed, no computational tools have been released to date. Here, we introduce REDItools a suite of python scripts to perform high-throughput investigation of RNA editing using next-generation sequencing data. Availability and implementation: REDItools are in python programming language and freely available at http://code.google.com/p/reditools/. CONTACT: ernesto.picardi@uniba.it or graziano.pesole@uniba.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RNA editing by A-to-I deamination is the prominent co-/post-transcriptional modification in humans. It is carried out by ADAR enzymes and contributes to both transcriptomic and proteomic expansion. RNA editing has pivotal cellular effects and its deregulation has been linked to a variety of human disorders including neurological and neurodegenerative diseases and cancer. Despite its biological relevance, many physiological and functional aspects of RNA editing are yet elusive. Here, we present REDIportal, available online at http://srv00.recas.ba.infn.it/atlas/, the largest and comprehensive collection of RNA editing in humans including more than 4.5 millions of A-to-I events detected in 55 body sites from thousands of RNAseq experiments. REDIportal embeds RADAR database and represents the first editing resource designed to answer functional questions, enabling the inspection and browsing of editing levels in a variety of human samples, tissues and body sites. In contrast with previous RNA editing databases, REDIportal comprises its own browser (JBrowse) that allows users to explore A-to-I changes in their genomic context, empathizing repetitive elements in which RNA editing is prominent.
Rhodobacter sphaeroides has for a long time been investigated for its adaptive capacities to different environmental and nutritional conditions, including presence of heavy metals, which make it a valuable model organism for understanding bacterial adaptation to metal stress conditions and future environmental applications, such as bioremediation of polluted sites. To further characterize the capability of R. sphaeroides to cope with high cobalt ion concentrations, we combined the selection of adaptive defective mutants, carried out by negative selection of transposon insertional libraries on 5 mM Co(2+) -enriched solid medium, with the analysis of growing capacities and transcriptome profiling of a selected mutant (R95). A comparative analysis of results from the mutant and wild-type strains clearly indicated that the adaptive ability of R. sphaeroides strongly relies on its ability to exploit any available energy-supplying metabolisms, being able to behave as photo- or chemotrophic microorganism. The selected R95 mutant, indeed, exhibits a severe down-expression of an ABC sugar transporter, which results nonpermissive for its growth in cobalt-enriched media under aerobic conditions. Interestingly, the defective expression of the transporter does not have dramatic effects on the growth ability of the mutant when cultivated under photosynthetic conditions.
ADARs are key proteins for hematopoietic stem cell self-renewal and for survival of differentiating progenitor cells. However, their specific role in myeloid cell maturation has been poorly investigated. Here, we show that ADAR1 is present at basal level in the primary myeloid leukemia cells obtained from patients at diagnosis as well as in myeloid U-937 and THP1 cell lines and its expression correlates with the editing levels. Upon phorbol-myristate acetate (PMA) or VitaminD3/GM-CSF-driven differentiation, both ADAR1 and ADAR2 enzymes are up-regulated, with a concomitant global increase of A-to-I RNA editing. ADAR1-silencing caused an editing decrease at specific ADAR1 target genes, without, however, interfering with cell differentiation or with ADAR2 activity. Remarkably, ADAR2 is absent in the undifferentiated cell stage, due to its elimination through the ubiquitin-proteasome pathway, being strongly up-regulated at the end of the differentiation process. Of note, peripheral blood monocytes display editing events at the selected targets similar to those found in differentiated cell lines. Taken together, the data indicate that ADAR enzymes play important and distinct roles.Leukemia accepted article preview online, 09 May 2017. doi:10.1038/leu.2017.134.
RNA sequencing (RNA-Seq) has become the experimental standard in transcriptome studies. While most of the bioinformatic pipelines for the analysis of RNA-Seq data and the identification of significant changes in transcript abundance are based on the comparison of two conditions, it is common practice to perform several experiments in parallel (e.g. from different individuals, developmental stages, tissues), for the identification of genes showing a significant variation of expression across all the conditions studied. In this work we present RNentropy, a methodology based on information theory devised for this task, which given expression estimates from any number of RNA-Seq samples and conditions identifies genes or transcripts with a significant variation of expression across all the conditions studied, together with the samples in which they are over- or under-expressed. To show the capabilities offered by our methodology, we applied it to different RNA-Seq datasets: 48 biological replicates of two different yeast conditions; samples extracted from six human tissues of three individuals; seven different mouse brain cell types; human liver samples from six individuals. Results, and their comparison to different state of the art bioinformatic methods, show that RNentropy can provide a quick and in depth analysis of significant changes in gene expression profiles over any number of conditions.
While RNA editing by A-to-I deamination is a requisite for neuronal function in humans, it is under investigated in single cells. Here we fill this gap by analysing RNA editing profiles of single cells from the brain cortex of living human subjects. We show that RNA editing levels per cell are bimodally distributed and distinguish between major brain cell types thus providing new insights into neuronal dynamics.
A comprehensive knowledge of all the factors involved in splicing, both proteins and RNAs, and of their interaction network is crucial for reaching a better understanding of this process and its functions. A large part of relevant information is buried in the literature or collected in various different databases. By hand-curated screenings of literature and databases, we retrieved experimentally validated data on 71 human RNA-binding splicing regulatory proteins and organized them into a database called ‘SpliceAid-F’ (http://www.caspur.it/SpliceAidF/). For each splicing factor (SF), the database reports its functional domains, its protein and chemical interactors and its expression data. Furthermore, we collected experimentally validated RNA–SF interactions, including relevant information on the RNA-binding sites, such as the genes where these sites lie, their genomic coordinates, the splicing effects, the experimental procedures used, as well as the corresponding bibliographic references. We also collected information from experiments showing no RNA–SF binding, at least in the assayed conditions. In total, SpliceAid-F contains 4227 interactions, 2590 RNA-binding sites and 1141 ‘no-binding’ sites, including information on cellular contexts and conditions where binding was tested. The data collected in SpliceAid-F can provide significant information to explain an observed splicing pattern as well as the effect of mutations in functional regulatory elements.
RNA editing is a post-transcriptional/co-transcriptional molecular phenomenon whereby a genetic message is modified from the corresponding DNA template by means of substitutions, insertions, and/or deletions. It occurs in a variety of organisms and different cellular locations through evolutionally and biochemically unrelated proteins. RNA editing has a plethora of biological effects including the modulation of alternative splicing and fine-tuning of gene expression. RNA editing events by base substitutions can be detected on a genomic scale by NGS technologies through the REDItools package, an ad hoc suite of Python scripts to study RNA editing using RNA-Seq and DNA-Seq data or RNA-Seq data alone. REDItools implement effective filters to minimize biases due to sequencing errors, mapping errors, and SNPs. The package is freely available at Google Code repository (http://code.google.com/p/reditools/) and released under the MIT license. In the present unit we show three basic protocols corresponding to three main REDItools scripts.
The 5' and 3' untranslated regions of eukaryotic mRNAs (UTRs) play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization and message stability. UTRdb is a curated database of 5' and 3' untranslated sequences of eukaryotic mRNAs, derived from several sources of primary data. Experimentally validated functional motifs are annotated and also collated as the UTRsite database where more specific information on the functional motifs and cross-links to interacting regulatory protein are provided. In the current update, the UTR entries have been organized in a gene-centric structure to better visualize and retrieve 5' and 3'UTR variants generated by alternative initiation and termination of transcription and alternative splicing. Experimentally validated miRNA targets and conserved sequence elements are also annotated. The integration of UTRdb with genomic data has allowed the implementation of an efficient annotation system and a powerful retrieval resource for the selection and extraction of specific UTR subsets. All internet resources implemented for retrieval and functional analysis of 5' and 3' untranslated regions of eukaryotic mRNAs are accessible at http://utrdb.ba.itb.cnr.it/.
BACKGROUND: The advent of massively parallel sequencing technologies (Next Generation Sequencing, NGS) profoundly modified the landscape of human genetics.In particular, Whole Exome Sequencing (WES) is the NGS branch that focuses on the exonic regions of the eukaryotic genomes; exomes are ideal to help us understanding high-penetrance allelic variation and its relationship to phenotype. A complete WES analysis involves several steps which need to be suitably designed and arranged into an efficient pipeline.Managing a NGS analysis pipeline and its huge amount of produced data requires non trivial IT skills and computational power. RESULTS: Our web resource WEP (Whole-Exome sequencing Pipeline web tool) performs a complete WES pipeline and provides easy access through interface to intermediate and final results. The WEP pipeline is composed of several steps:1) verification of input integrity and quality checks, read trimming and filtering; 2) gapped alignment; 3) BAM conversion, sorting and indexing; 4) duplicates removal; 5) alignment optimization around insertion/deletion (indel) positions; 6) recalibration of quality scores; 7) single nucleotide and deletion/insertion polymorphism (SNP and DIP) variant calling; 8) variant annotation; 9) result storage into custom databases to allow cross-linking and intersections, statistics and much more. In order to overcome the challenge of managing large amount of data and maximize the biological information extracted from them, our tool restricts the number of final results filtering data by customizable thresholds, facilitating the identification of functionally significant variants. Default threshold values are also provided at the analysis computation completion, tuned with the most common literature work published in recent years. CONCLUSIONS: Through our tool a user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and a user-friendly web interface for annotated variant visualization.Non-IT mastered users can access through WEP to the most updated and tested WES algorithms, tuned to maximize the quality of called variants while minimizing artifacts and false positives.The web tool is available at the following web address: http://www.caspur.it/wep.
Alzheimer's Disease (AD) is the most common cause of dementia affecting the elderly population worldwide. We have performed a comprehensive transcriptome profiling of Late-Onset AD (LOAD) patients using second generation sequencing technologies, identifying 2,064 genes, 47 lncRNAs and 4 miRNAs whose expression is specifically deregulated in the hippocampal region of LOAD patients. Moreover, analyzing the hippocampal, temporal and frontal regions from the same LOAD patients, we identify specific sets of deregulated miRNAs for each region, and we confirm that the miR-132/212 cluster is deregulated in each of these regions in LOAD patients, consistent with these miRNAs playing a role in AD pathogenesis. Notably, a luciferase assay indicates that miR-184 is able to target the 3'UTR NR4A2 - which is known to be involved in cognitive functions and long-term memory and whose expression levels are inversely correlated with those of miR-184 in the hippocampus. Finally, RNA editing analysis reveals a general RNA editing decrease in LOAD hippocampus, with 14 recoding sites significantly and differentially edited in 11 genes. Our data underline specific transcriptional changes in LOAD brain and provide an important source of information for understanding the molecular changes characterizing LOAD progression.
Condividi questo sito sui social