Effettua una ricerca
Graziano Pesole
Ruolo
Professore Ordinario
Organizzazione
Università degli Studi di Bari Aldo Moro
Dipartimento
DIPARTIMENTO DI BIOSCIENZE, BIOTECNOLOGIE E BIOFARMACEUTICA
Area Scientifica
AREA 05 - Scienze biologiche
Settore Scientifico Disciplinare
BIO/11 - Biologia Molecolare
Settore ERC 1° livello
Non Disponibile
Settore ERC 2° livello
Non Disponibile
Settore ERC 3° livello
Non Disponibile
Gene expression regulatory elements are scattered in gene promoters and pre-mRNAs. In particular, RNA elements lying in untranslated regions (5' and 3'UTRs) are poorly studied because of their peculiar features (i.e., a combination of primary and secondary structure elements) which also pose remarkable computational challenges. Several years ago, we began collecting experimentally characterized UTR regulatory elements, developing the specialized database UTRsite. This paper describes the detailed guidelines to annotate cis-regulatory elements in 5' and 3' UnTranslated Regions (UTRs) by computational analyses, retracing all main steps used by UTRsite curators.
RNA editing is a post-transcriptional process occurring in a wide range of organisms. In human brain, the A-to-I RNA editing, in which individual adenosine (A) bases in pre-mRNA are modified to yield inosine (I), is the most frequent event. Modulating gene expression, RNA editing is essential for cellular homeostasis. Indeed, its deregulation has been linked to several neurological and neurodegenerative diseases. To date, many RNA editing sites have been identified by next generation sequencing technologies employing massive transcriptome sequencing together with whole genome or exome sequencing. While genome and transcriptome reads are not always available for single individuals, RNA-Seq data are widespread through public databases and represent a relevant source of yet unexplored RNA editing sites. In this context, we propose a simple computational strategy to identify genomic positions enriched in novel hypothetical RNA editing events by means of a new two-steps mapping procedure requiring only RNA-Seq data and no a priori knowledge of RNA editing characteristics and genomic reads. We assessed the suitability of our procedure by confirming A-to-I candidates using conventional Sanger sequencing and performing RNA-Seq as well as whole exome sequencing of human spinal cord tissue from a single individual.
BACKGROUND: Recent studies have demonstrated an unexpected complexity of transcription in eukaryotes. The majority of the genome is transcribed and only a little fraction of these transcripts is annotated as protein coding genes and their splice variants. Indeed, most transcripts are the result of antisense, overlapping and non-coding RNA expression. In this frame, one of the key aims of high throughput transcriptome sequencing is the detection of all RNA species present in the cell and the first crucial step for RNA-seq users is represented by the choice of the strategy for cDNA library construction. The protocols developed so far provide the utilization of the entire library for a single sequencing run with a specific platform. RESULTS: We set up a unique protocol to generate and amplify a strand-specific cDNA library representative of all RNA species that may be implemented with all major platforms currently available on the market (Roche 454, Illumina, ABI/SOLiD). Our method is reproducible, fast, easy-to-perform and even allows to start from low input total RNA. Furthermore, we provide a suitable bioinformatics tool for the analysis of the sequences produced following this protocol. CONCLUSION: We tested the efficiency of our strategy, showing that our method is platform-independent, thus allowing the simultaneous analysis of the same sample with different NGS technologies, and providing an accurate quantitative and qualitative portrait of complex whole transcriptomes
Ascidians are a fascinating group of filter-feeding marine chordates characterized by rapid evolution of both sequences and structure of their nuclear and mitochondrial genomes. Moreover, they include several model organisms used to investigate complex biological processes in chordates. To study the evolutionary dynamics of ascidians at short phylogenetic distances, we sequenced 13 new mitogenomes and analyzed them, together with 15 other available mitogenomes, using a novel approach involving detailed whole-mitogenome comparisons of conspecific and congeneric pairs. The evolutionary rate was quite homogeneous at both intra-specific and congeneric level, and the lowest congeneric rates were found in cryptic (morphologically undistinguishable) and in morphologically very similar species pairs. Moreover, congeneric nonsynonymous rates (dN) were up to two orders of magnitude higher than in intra-species pairs. Overall, a clear-cut gap sets apart conspecific from congeneric pairs. These evolutionary peculiarities allowed easily identifying an extraordinary intra-specific variability in the model ascidian Botryllus schlosseri, where most pairs show a dN value between those observed at intra-species and congeneric level, yet consistently lower than that of the C. intestinalis cryptic species pair. These data suggest ongoing speciation events producing genetically distinct B. schlosseri entities. Remarkably, these ongoing speciation events were undetectable by the cox1 barcode fragment, demonstrating that, at low phylogenetic distances, the whole mitogenome has a higher resolving power than cox1. Our study shows that whole-mitogenome comparative analyses, performed on a suitable sample of congeneric and intra-species pairs, may allow detecting not only cryptic species but also ongoing speciation events.
Alternative splicing is emerging as a major mechanism for the expansion of the transcriptome and proteome diversity, particularly in human and other vertebrates. However, the proportion of alternative transcripts and proteins actually endowed with functional activity is currently highly debated. We present here a new release of ASPicDB which now provides a unique annotation resource of human protein variants generated by alternative splicing. A total of 256 939 protein variants from 17 191 multi-exon genes have been extensively annotated through state of the art machine learning tools providing information of the protein type (globular and transmembrane), localization, presence of PFAM domains, signal peptides, GPI-anchor propeptides, transmembrane and coiled-coil segments. Furthermore, full-length variants can be now specifically selected based on the annotation of CAGE-tags and polyA signal and/or polyA sites, marking transcription initiation and termination sites, respectively. The retrieval can be carried out at gene, transcript, exon, protein or splice site level allowing the selection of data sets fulfilling one or more features settled by the user. The retrieval interface also enables the selection of protein variants showing specific differences in the annotated features. ASPicDB is available at http://www.caspur.it/ASPicDB/.
New genomes are being sequenced at an increasingly rapid rate, far outpacing the rate at which manual gene annotation can be performed. Automated genome annotation is thus necessitated by this growth in genome projects; however, full-fledged annotation systems are usually home-grown and customized to a particular genome. There is thus a renewed need for accurate ab initio gene prediction methods. However, it is apparent that fully ab initio methods fall short of the required level of sensitivity and specificity for a quality annotation. Evidence in the form of expressed sequences gives the single biggest improvement in accuracy when used to inform gene predictions. Here, we present a lightweight pipeline for first-pass gene prediction on newly sequenced genomes. The two main components are ASPic, a program that derives highly accurate, albeit not necessarily complete, EST-based transcript annotations from EST alignments, and GeneID, a standard gene prediction program, which we have modified to take as evidence intron annotations. The introns output by ASPic CDS predictions is given to GeneID to constrain the exon-chaining process and produce predictions consistent with the underlying EST alignments. The pipeline was successfully tested on the entire C. elegans genome and the 44 ENCODE human pilot regions.
De novo high-throughput pyrosequencing was used to detect and characterize 2009 pandemic influenza A (H1N1) virus directly in nasophatyngeal swabs in the context of the microbial community Data were generated with a prior sequence-independent amplification by 454 pyrosequencing on GS-FLX platform (Roche). Influenza A assembled reads allowed near full-length genome reconstruction with the simultaneous analysis of site-specific heterogeneity. The molecular approach applied proved to be a powerful tool to characterize the new pandemic H1N1 influenza virus in clinical samples. This approach could be of great value in identifying possibly new reassortants that may occur in the near future.
Background: Recent discoveries have highlighted the fact that alternative splicing and alternative transcripts are the rule, rather than the exception, in metazoan genes. Since multiple transcript and protein variants expressed by the same gene are, by definition, structurally distinct and need not to be functionally equivalent, the concept of gene orthology should be extended to the transcript level in order to describe evolutionary relationships between structurally similar transcript variants. In other words, the identification of true orthology relationships between gene products now should progress beyond primary sequence and "splicing orthology", consisting in ancestrally shared exon-intron structures, is required to define orthologous isoforms at transcript level. Results: As a starting step in this direction, in this work we performed a large scale human-mouse gene comparison with a twofold goal: first, to assess if and to which extent traditional gene annotations such as RefSeq capture genuine splicing orthology; second, to provide a more detailed annotation and quantification of true human-mouse orthologous transcripts defined as transcripts of orthologous genes exhibiting the same splicing patterns. Conclusions: We observed an identical exon/intron structure for 32% of human and mouse orthologous genes. This figure increases to 87% using less stringent criteria for gene structure similarity, thus implying that for about 13% of the human RefSeq annotated genes (and about 25% of the corresponding transcripts) we could not identify any mouse transcript showing sufficient similarity to be confidently assigned as a splicing ortholog. Our data suggest that current gene and transcript data may still be rather incomplete - with several splicing variants still unknown. The observation that alternative splicing produces large numbers of alternative transcripts and proteins, some of them conserved across species and others truly species-specific, suggests that, still maintaining the conventional definition of gene orthology, a new concept of "splicing orthology" can be defined at transcript level.
Philadelphia (Ph+) positive leukaemias are an example of haematological malignant diseases where different chromosomal rearrangements involving both BCR and ABL1 genes generate a variety of chimeric proteins (BCR/ABL1 p210, p190 and p230) which are considered pathological "biomarkers". In addition to these three, there is a variety of fusion transcripts whose origin may depend either on diverse genetic rearrangement or on alternative/atypical splicing of the main mRNAs or on the occurrence of single-point mutations. Although the therapy of Philadelphia+ leukaemias based on Imatinib represents a triumph of medicine, not all patients benefit from such drug and may show resistance and intolerance. Furthermore, interruption of Imatinib administration is often followed by clinical relapse, suggesting a failure in the eradication of residual leukaemic stem cells. Therefore, while the targeted therapy is searching for new and implemented pharmacological inhibitors covering all the possible mutations in the kinase domain, there is urge to identify alternative molecular targets to develop other specific and effective therapeutic approaches. In this review we discuss the importance of recent advances based on the discovery of novel BCR/ABL1 variants and their potential role as new targets/biomarkers of Ph+ leukaemias in the light of the current therapeutic trends. The limits of the pharmacological inhibitors used for treating the disease can be overcome by considering other targets than the kinase enzyme. Our evaluations highlight the potential of alternative perspectives in the therapy of Ph+ leukaemias.
Technical advances such as the development of molecular cloning, Sanger sequencing, PCR and oligonucleotide microarrays are key to our current capacity to sequence, annotate and study complete organismal genomes. Recent years have seen the development of a variety of so-called 'next-generation' sequencing platforms, with several others anticipated to become available shortly. The previously unimaginable scale and economy of these methods, coupled with their enthusiastic uptake by the scientific community and the potential for further improvements in accuracy and read length, suggest that these technologies are destined to make a huge and ongoing impact upon genomic and post-genomic biology. However, like the analysis of microarray data and the assembly and annotation of complete genome sequences from conventional sequencing data, the management and analysis of next-generation sequencing data requires (and indeed has already driven) the development of informatics tools able to assemble, map, and interpret huge quantities of relatively or extremely short nucleotide sequence data. Here we provide a broad overview of bioinformatics approaches that have been introduced for several genomics and functional genomics applications of next-generation sequencing.
Substantial advances in microbiology, molecular evolution and biodiversity have been carried out in recent years thanks to Metagenomics, which allows to unveil the composition and functions of mixed microbial communities in any environmental niche. If the investigation is aimed only at the microbiome taxonomic structure, a target-based metagenomic approach, here also referred as Meta-barcoding, is generally applied. This approach commonly involves the selective amplification of a species-specific genetic marker (DNA meta-barcode) in the whole taxonomic range of interest and the exploration of its taxon-related variants through High-Throughput Sequencing (HTS) technologies. The accessibility to proper computational systems for the large-scale bioinformatic analysis of HTS data represents, currently, one of the major challenges in advanced Meta-barcoding projects.
Historically, genome-wide and molecular characterization of the genus Listeria has concentrated on the important human pathogen Listeria monocytogenes and a small number of closely related species, together termed Listeria sensu strictu. More recently, a number of genome sequences for more basal, and nonpathogenic, members of the Listeria genus have become available, facilitating a wider perspective on the evolution of pathogenicity and genome level evolutionary dynamics within the entire genus (termed Listeria sensu lato). Here, we have sequenced the genomes of additional Listeria fleischmannii and Listeria newyorkensis isolates and explored the dynamics of genome evolution in Listeria sensu lato. Our analyses suggest that acquisition of genetic material through gene duplication and divergence as well as through lateral gene transfer (mostly from outside Listeria) is widespread throughout the genus. Novel genetic material is apparently subject to rapid turnover. Multiple lines of evidence point to significant differences in evolutionary dynamics between the most basal Listeria subclade and all other congeners, including both sensu strictu and other sensu lato isolates. Strikingly, these differences are likely attributable to stochastic, population-level processes and contribute to observed variation in genome size across the genus. Notably, our analyses indicate that the common ancestor of Listeria sensu lato lacked flagella, which were acquired by lateral gene transfer by a common ancestor of Listeria grayi and Listeria sensu strictu, whereas a recently functionally characterized pathogenicity island, responsible for the capacity to produce cobalamin and utilize ethanolamine/propane-2-diol, was acquired in an ancestor of Listeria sensu strictu.
Currently, there is very little information available regarding the microbiome associated with the wine production chain. Here, we used an amplicon sequencing approach based on high-throughput sequencing (HTS) to obtain a comprehensive assessment of the bacterial community associated with the production of three Apulian red wines, from grape to final product. The relationships among grape variety, the microbial community, and fermentation was investigated. Moreover, the winery microbiota was evaluated compared to the autochthonous species in vineyards that persist until the end of the winemaking process. The analysis highlighted the remarkable dynamics within the microbial communities during fermentation. A common microbial core shared among the examined wine varieties was observed, and the unique taxonomic signature of each wine appellation was revealed. New species belonging to the genus Halomonas were also reported. This study demonstrates the potential of this metagenomic approach, supported by optimized protocols, for identifying the biodiversity of the wine supply chain. The developed experimental pipeline offers new prospects for other research fields in which a comprehensive view of microbial community complexity and dynamics is desirable.
Collagen VI myopathies are genetic disorders caused by mutations in collagen 6 A1, A2 and A3 genes, ranging from the severe Ullrich congenital muscular dystrophy to the milder Bethlem myopathy, which is recapitulated by collagen-VI-null (Col6a1(-/-)) mice. Abnormalities in mitochondria and autophagic pathway have been proposed as pathogenic causes of collagen VI myopathies, but the link between collagen VI defects and these metabolic circuits remains unknown. To unravel the expression profiling perturbation in muscles with collagen VI myopathies, we performed a deep RNA profiling in bothCol6a1(-/-)mice and patients with collagen VI pathology. The interactome map identified common pathways suggesting a previously undetected connection between circadian genes and collagen VI pathology. Intriguingly,Bmal1(-/-)(also known asArntl) mice, a well-characterized model displaying arrhythmic circadian rhythms, showed profound deregulation of the collagen VI pathway and of autophagy-related genes. The involvement of circadian rhythms in collagen VI myopathies is new and links autophagy and mitochondrial abnormalities. It also opens new avenues for therapies of hereditary myopathies to modulate the molecular clock or potential gene-environment interactions that might modify muscle damage pathogenesis.
The transcription factor interferon regulatory factor 6 (IRF6) regulates craniofacial development and epidermal proliferation. We recently showed that IRF6 is a component of a regulatory feedback loop that controls the proliferative potential of epidermal cells. IRF6 is transcriptionally activated by p63 and induces its proteasome-mediated down-regulation, thereby limiting keratinocyte proliferative potential. We hypothesized that IRF6 may also be involved in skin carcinogenesis. Hence, we analyzed IRF6 expression in a large series of squamous cell carcinomas (SCCs) and found a strong down-regulation of IRF6 that correlated with tumor invasive and differentiation status. IRF6 down-regulation in SCC cell lines and primary tumors correlates with methylation on a CpG dinucleotide island located in its promoter region. To identify the molecular mechanisms regulating IRF6 potential tumor suppressive activity, we performed a genome-wide analysis by combining ChIP sequencing for IRF6 binding sites and gene expression profiling in primary human keratinocytes after siRNA-mediated IRF6 depletion. We observed dysregulation of cell cycle-related genes and genes involved in differentiation, cell adhesion, and cell-cell contact. Many of these genes were direct IRF6 targets. We also performed in vitro invasion assays showing that IRF6 down-regulation promotes invasive behavior and that reintroduction of IRF6 into SCC cells strongly inhibits cell growth. These results indicate a function for IRF6 in suppression of tumorigenesis in stratified epithelia.
Differences in the inherent properties of adipose tissue-derived stem cells (ASC) may contribute to the biological specificity of the subcutaneous (Sc) and visceral (V) adipose tissue depots. In this study, three distinct subpopulations of ASC, i.e. ASCSVF, ASCBottom, and ASCCeiling, were isolated from Sc and V fat biopsies of non-obese subjects, and their gene expression and functional characteristics were investigated. Genome-wide mRNA expression profiles of ASCSVF, ASCBottom and ASCCeiling from Sc fat were significantly different as compared to their homologous subsets of V-ASCs. Furthermore, ASCSVF, ASCCeiling and ASCBottom from the same fat depot were also distinct from each other. In this respect, both principal component analysis and hierarchical clusters analysis showed that ASCCeiling and ASCSVF shared a similar pattern of closely related genes, which was highly different when compared to that of ASCBottom. However, larger variations in gene expression were found in inter-depot than in intra-depot comparisons. The analysis of connectivity of genes differently expressed in each ASC subset demonstrated that, although there was some overlap, there was also a clear distinction between each Sc-ASC and their corresponding V-ASC subsets, and among ASCSVF, ASCBottom, and ASCCeiling of Sc or V fat depots in regard to networks associated with regulation of cell cycle, cell organization and development, inflammation and metabolic responses. Finally, the release of several cytokines and growth factors in the ASC cultured medium also showed both inter- and intra-depot differences. Thus, ASCCeiling and ASCBottom can be identified as two genetically and functionally heterogeneous ASC populations in addition to the ASCSVF, with ASCBottom showing the highest degree of unmatched gene expression. On the other hand, inter-depot seem to prevail over intra-depot differences in the ASC gene expression assets and network functions, contributing to the high degree of specificity of Sc and V adipose tissue in humans.
We tracked temporal changes in protist diversity at the Long Term Ecological Research (LTER)-MC site in the Gulf of Naples (Mediterranean Sea) on eight dates in 2011 using a metabarcoding approach. The ILLUMINA analysis of the V4 and V9 fragments of the 18S rDNA produced 869,522 and 1,410,071 sequences resulting in 6,517 and 6,519 OTUs, respectively. Marked compositional variations were recorded across the year, with less than 2% of OTUs shared among all samples and similar patterns for the two marker tags. Alveolata, Stramenopiles and Rhizaria were the most represented groups. A comparison with light microscopy data indicated an over-representation of Dinophyta in the sequence dataset, whereas Bacillariophyta showed comparable taxonomic patterns between sequences and light microscopy data. Shannon diversity values were stable from February to September, increasing thereafter with a peak in December. Community variance was mainly explained by seasonality (as temperature), trophic status (as chlorophyll a), and influence of coastal waters (as salinity). Overall, the background knowledge of the system provided a sound context for the result interpretation, showing that LTER sites provide an ideal setting for the HTS-metabarcoding characterization of protist assemblages and their relationships with environmental variations.
A metagenomic fosmid expression library established from environmental DNA (eDNA) from the shallow hot vent sediment sample collected from the Levante Bay, Vulcano Island (Aeolian archipelago) was established in Escherichia coli. Using activity-based screening assays, we have assessed 9600 fosmid clones corresponding to approximately 350 Mbp of the cloned eDNA, for the lipases/esterases/lactamases, haloalkane and haloacid dehalogenases, and glycoside hydrolases. Thirty-four positive fosmid clones were selected from the total of 120 positive hits and sequenced to yield ca. 1360 kbp of high-quality assemblies. Fosmid inserts were attributed to the members of ten bacterial phyla, including Proteobacteria, Bacteroidetes, Acidobateria, Firmicutes, Verrucomicrobia, Chloroflexi, Spirochaetes, Thermotogae, Armatimonadetes, and Planctomycetes. Of ca. 200 proteins with high biotechnological potential identified therein, we have characterized in detail three distinct α/β-hydrolases (LIPESV12_9, LIPESV12_24, LIPESV12_26) and one new α-arabinopyranosidase (GLV12_5). All LIPESV12 enzymes revealed distinct substrate specificities tested against 43 structurally diverse esters and 4 p-nitrophenol carboxyl esters. Of 16 different glycosides tested, the GLV12_5 hydrolysed only p-nitrophenol-α-(L)-arabinopyranose with a high specific activity of about 2.7 kU/mg protein. Most of the α/β-hydrolases were thermophilic and revealed a high tolerance to, and high activities in the presence of, numerous heavy metal ions. Among them, the LIPESV12_24 was the best temperature-adapted, retaining its activity after 40 min of incubation at 90 °C. Furthermore, enzymes were active in organic solvents (e.g., >30% methanol). Both LIPESV12_24 and LIPESV12_26 had the GXSXG pentapeptides and the catalytic triads Ser-Asp-His typical to the representatives of carboxylesterases of EC 3.1.1.1.
The genome sequence of a Sphingobium strain capable of tolerating high concentrations of Ni ions, and exhibiting natural kanamycin resistance, is presented. The presence of a transposon derived kanamycin resistance gene and several genes for efflux-mediated metal resistance may explain the observed characteristics of the new Sphingobium isolate.
Ochratoxin A (OTA) is a nephrotoxic and potentially carcinogenic mycotoxin produced by several species of Aspergillus and Penicillium. It is one of the major mycotoxins contaminating grain, grapes and a variety of food products, and the development of methods for reducing pre- and post-harvest contamination has drawn considerable attention. In the current study, we isolated and sequenced the genome of a novel free-living Acinetobacter strain able to degrade OTA. Biochemical studies suggest that the degradation reaction proceeds via peptide bond hydrolysis.
ExpEdit is a web application for assessing RNA editing in human at known or user-specified sites supported by transcript data obtained by RNA-Seq experiments. Mapping data (in SAM/BAM format) or directly sequence reads [in FASTQ/short read archive (SRA) format] can be provided as input to carry out a comparative analysis against a large collection of known editing sites collected in DARNED database as well as other user-provided potentially edited positions. Results are shown as dynamic tables containing University of California, Santa Cruz (UCSC) links for a quick examination of the genomic context.
Regulated mRNA translation plays a key role in control of cell cycle progression in a variety of physiological and pathological processes, including in the self-renewal and survival of stem cells and cancer stem cells. While targeting mRNA translation presents an attractive strategy for control of aberrant cell cycle progression, mRNA translation is an underdeveloped therapeutic target. Regulated mRNAs are typically controlled through interaction with multiple RNA binding proteins (RBPs) but the mechanisms by which the functions of distinct RBPs bound to a common target mRNA are coordinated are poorly understood. The challenge now is to gain insight into these mechanisms of coordination and to identify the molecular mediators that integrate multiple, often conflicting, inputs. A first step includes the identification of altered mRNA ribonucleoprotein complex components that assemble on mRNAs bound by multiple, distinct RBPs compared to those recruited by individual RBPs. This review builds upon our knowledge of combinatorial control of mRNA translation during the maturation of oocytes from Xenopus laevis, to address molecular strategies that may mediate RBP diplomacy and conflict resolution for coordinated control of mRNA translational output. Continued study of regulated ribonucleoprotein complex dynamics promises valuable new insights into mRNA translational control and may suggest novel therapeutic strategies for the treatment of disease.
Comparisons of draft genome sequences of three geographically distinct isolates of Fusarium fujikuroi with two recently published genome sequences from the same species suggest diverse profiles of secondary metabolite production within F. fujikuroi. Species- and lineage-specific genes, many of which appear to exhibit expression profiles that are consistent with roles in host-pathogen interactions and adaptation to environmental changes, are concentrated in subtelomeric regions. These genomic compartments also exhibit distinct gene densities and compositional characteristics with respect to other genomic partitions, and likely play a role in the generation of molecular diversity. Our data provide additional evidence that gene duplication, divergence, and differential loss play important roles in F. fujikuroi genome evolution and suggest that hundreds of lineage-specific genes might have been acquired through horizontal gene transfer.
Clear cell renal cell carcinoma (ccRCC) is the most common malignant renal epithelial tumor and also the most deadly. To identify molecular changes occurring in ccRCC, in the present study we performed a genome wide analysis of its entire complement of mRNAs. Gene and exon-level analyses were carried out by means of the Affymetrix Exon Array platform. To achieve a reliable detection of differentially expressed cassette exons we implemented a novel methodology that considered contiguous combinations of exon triplets and candidate differentially expressed cassette exons were identified when the expression level was significantly different only in the central exon of the triplet. More detailed analyses were performed for selected genes using quantitative RT-PCR and confocal laser scanning microscopy. Our analysis detected over 2,000 differentially expressed genes, and about 250 genes alternatively spliced and showed differential inclusion of specific cassette exons comparing tumor and non-tumoral tissues. We demonstrated the presence in ccRCC of an altered expression of the PTP4A3, LAMA4, KCNJ1 and TCF21 genes (at both transcript and protein level). Furthermore, we confirmed, at the mRNA level, the involvement of CAV2 and SFRP genes that have previously been identified. At exon level, among potential candidates we validated a differentially included cassette exon in DAB2 gene with a significant increase of DAB2 p96 splice variant as compared to the p67 isoform. Based on the results obtained, and their robustness according to both statistical analysis and literature surveys, we believe that a combination of gene/isoform expression signature may remarkably contribute, after suitable validation, to a more effective and reliable definition of molecular biomarkers for ccRCC early diagnosis, prognosis and prediction of therapeutic response.
Epstein-Barr virus (EBV) latently infects the majority of the human population and is implicated as a causal or contributory factor in numerous diseases. We sequenced 27 complete EBV genomes from a cohort of Multiple Sclerosis (MS) patients and healthy controls from Italy, although no variants showed a statistically significant association with MS. Taking advantage of the availability of ~130 EBV genomes with known geographical origins, we reveal a striking geographic distribution of EBV sub-populations with distinct allele frequency distributions. We discuss mechanisms that potentially explain these observations, and their implications for understanding the association of EBV with human disease.
The few sequenced mitochondrial (mt) genomes of the class Ascidiacea (Chordata, Tunicata), mostly belonging to congeneric species of the Phlebobranchia order, show extraordinary gene order rearrangements. In order to assess if this hypervariability in gene order is a general feature of Ascidiacea, we report here the gene arrangement of five ascidians belonging to the Aplousobranchia and Stolidobranchia orders. Our data show that Ascidiacea are characterized by: 1) extensive gene order rearrangements both within and between the three major lineages; 2) lack of significant similarities to the gene order of other deuterostomes; and 3) an extent of rearrangements comparable with that of Mollusca (especially the Gastropoda, Bivalvia, and Scaphopoda classes), a phylum with highly rearranged mtDNAs. The only conserved feature is the location of all genes on the same strand, which suggests that selective constraints are related to the mt transcription. Finally, a higher mobility of the tRNA genes is undetectable because of saturation effect, and only the partially conserved cox2-cob gene block seems to retain some phylogenetic signals.
Background: Many evidences report that alternative splicing, the mechanism which produces mRNAs and proteins with different structures and functions from the same gene, is altered in cancer cells. Thus, the identification and characterization of cancer-specific splice variants may give large impulse to the discovery of novel diagnostic and prognostic tumour biomarkers, as well as of new targets for more selective and effective therapies. Results: We present here a genome-wide analysis of the alternative splicing pattern of human genes through a computational analysis of normal and cancer-specific ESTs from seventeen anatomical groups, using data available in AspicDB, a database resource for the analysis of alternative splicing in human. By using a statistical methodology, normal and cancer-specific genes, splice sites and cassette exons were predicted in silico. The condition association of some of the novel normal/tumoral cassette exons was experimentally verified by RT-qPCR assays in the same anatomical system where they were predicted. Remarkably, the presence in vivo of the predicted alternative transcripts, specific for the nervous system, was confirmed in patients affected by glioblastoma. Conclusion: This study presents a novel computational methodology for the identification of tumor-associated transcript variants to be used as cancer molecular biomarkers, provides its experimental validation, and reports specific biomarkers for glioblastoma.
Omenn syndrome (OS) is caused by hypomorphic Rag mutations and characterized by a profound immunodeficiency associated with autoimmune-like manifestations. Both in humans and mice, OS is mediated by oligoclonal activated T and B cells. The role of microbial signals in disease pathogenesis is debated. Here, we show that Rag2(R229Q) knock-in mice developed an inflammatory bowel disease affecting both the small bowel and colon. Lymphocytes were sufficient for disease induction, as intestinal CD4 T cells with a Th1/Th17 phenotype reproduced the pathological picture when transplanted into immunocompromised hosts. Moreover, oral tolerance was impaired in Rag2(R229Q) mice, and transfer of wild-type (WT) regulatory T cells ameliorated bowel inflammation. Mucosal immunoglobulin A (IgA) deficiency in the gut resulted in enhanced absorption of microbial products and altered composition of commensal communities. The Rag2(R229Q) microbiota further contributed to the immunopathology because its transplant into WT recipients promoted Th1/Th17 immune response. Consistently, long-term dosing of broad-spectrum antibiotics (ABXs) in Rag2(R229Q) mice ameliorated intestinal and systemic autoimmunity by diminishing the frequency of mucosal and circulating gut-tropic CCR9(+) Th1 and Th17 T cells. Remarkably, serum hyper-IgE, a hallmark of the disease, was also normalized by ABX treatment. These results indicate that intestinal microbes may play a critical role in the distinctive immune dysregulation of OS.
RNA editing is a widespread post-transcriptional molecular phenomenon that can increase proteomic diversity, by modifying the sequence of completely or partially non-functional primary transcripts, through a variety of mechanistically and evolutionarily unrelated pathways. Editing by base substitution has been investigated in both animals and plants. However, conventional strategies based on directed Sanger sequencing are time-consuming and effectively preclude genome wide identification of RNA editing and assessment of partial and tissue-specific editing sites. In contrast, the high-throughput RNA-Seq approach allows the generation of a comprehensive landscape of RNA editing at the genome level. Short reads from Solexa/Illumina GA and ABI SOLiD platforms have been used to investigate the editing pattern in mitochondria of Vitis vinifera providing significant support for 401 C-to-U conversions in coding regions and an additional 44 modifications in non-coding RNAs. Moreover, 76% of all C-to-U conversions in coding genes represent partial RNA editing events and 28% of them were shown to be significantly tissue specific. Solexa/Illumina and SOLiD platforms showed different characteristics with respect to the specific issue of large-scale editing analysis, and the combined approach presented here reduces the false positive rate of discovery of editing events.
Celiac disease (CD)-associated duodenal dysbiosis has not yet been clearly defined, and the mechanisms by which CD-associated dysbiosis could concur to CD development or exacerbation are unknown. In this study, we analyzed the duodenal microbiome of CD patients.
Polyploidization is a widespread mechanism in eukaryotes and is predominant in flowering plants. Polyploids are very common among plants and are produced by multiplication of a genome derived from a single species (autopolyploids) or combination of two or more divergent genomes from different species (allopolyploids). Alteration in DNA methylation could regulate gene expression or other important epigenetic processes including dosage compensation, genomic imprinting, nucleolar dominance, de-repression of dormant transposable elements and alterations in chromatin structure among others. Polyploidization is known to involve altered DNA methylation in plants and recent studies provided evidence for changes in newly formed allopolyploids. Several genes responsible for the DNA methylation status have been studied so far. We investigated some members of the three major DNA methyltransferase family genes (MET) and their expression pattern in tetraploid plant of alfalfa (Medicago sativa L. 2n=4x=32) obtained by crossing diploid parents producing 2n gametes (bilaterla sexual polyploidization). Gene expression changes induced by polyploidization have been investigated using qRT-PCR and expression data have been validated to identify genes whose expression is affected by ploidy changes. We also carried out an evolutionary analysis on a collection of methyltransferase homologous genes identified by database searching using as a query M. sativa genes. A comprehensive overview of the distribution of conserved domains was obtained as well as phylogenetic relationships between members of the MET family. Previous observations indicate that the genome methylation status is affected by polyploidization in alfalfa. This expression study may contribute to answer the question of how this occurs, and provide useful information for crop improvement
The MITOchondrial genome database of metaZOAns (MitoZoa) is a public resource for comparative analyses of metazoan mitochondrial genomes (mtDNA) at both the sequence and genomic organizational levels. The main characteristics of the MitoZoa database are the careful revision of mtDNA entry annotations and the possibility of retrieving gene order and non-coding region (NCR) data in appropriate formats. The MitoZoa retrieval system enables basic and complex queries at various taxonomic levels using different search menus. MitoZoa 2.0 has been enhanced in several aspects, including: a re-annotation pipeline to check the correctness of protein-coding gene predictions; a standardized annotation of introns and of precursor ORFs whose functionality is post-transcriptionally recovered by RNA editing or programmed translational frameshifting; updates of taxon-related fields and a BLAST sequence similarity search tool. Database novelties and the definition of standard mtDNA annotation rules, together with the user-friendly retrieval system and the BLAST service, make MitoZoa a valuable resource for comparative and evolutionary analyses as well as a reference database to assist in the annotation of novel mtDNA sequences. MitoZoa is freely accessible at http://www.caspur.it/mitozoa.
MitoZoa is a relational database collecting curated metazoan entries of complete or nearly complete mitochondrial genomes (mtDNA), specifically designed to assist comparative studies of mitochondrial genome-level features in a given taxon or in congeneric species of Metazoa. The principal novelties of MitoZoa are extensive corrections/improvements of the mtDNA annotations and the possibility of easily searching for data on: (1) gene order, a genomic feature useful as phylogenetic marker; (2) sequence, size and location of non-coding regions, likely containing the regulatory signals for mtDNA replication and transcription; (3) mt features/sequences of congeneric species, where saturation phenomena in nucleotide substitutions and gene order changes are expected to be absent or at least minimal. In addition, MitoZoa allows the exploration of basic mt features such as molecule topology, genetic code, gene content, and compositional parameters of the entire genome. Finally, in order to facilitate downstream analyses of retrieved data, MitoZoa entry lists can be visualized and downloaded in a tabular format, while sequences and gene order data are provided in FASTA and FASTA-like formats, respectively. The MitoZoa database is available at http://www.caspur.it/mitozoa. (C) 2010 Elsevier B.V. and Mitochondria Research Society. All rights reserved.
Here we present the MSA-PAD application, a DNA multiple sequence alignment framework that uses PFAM protein domain information to align DNA sequences encoding either single or multiple protein domains. MSA-PAD has two alignment options: gene and genome mode.
Abstract Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/.
Background: A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing accuracy as well as efficiency in time and space, when large clusters of more than 20,000 ESTs and genes longer than 1 Mb are processed. Traditionally, the problem has been faced by combining different tools, not specifically designed for this task. Results: We propose a fast method based on ad hoc procedures for solving the problem. Our method combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are largely confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings, that are sequences obtained from paths of a graph structure, called embedding graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the length of P and T and in the size of the output. The method was implemented into the PIntron package. PIntron requires as input a genomic sequence or region and a set of EST and/or mRNA sequences. Besides the prediction of the full-length transcript isoforms potentially expressed by the gene, the PIntron package includes a module for the CDS annotation of the predicted transcripts. Conclusions: PIntron, the software tool implementing our methodology, is available at http://www.algolab.eu/ PIntron under GNU AGPL. PIntron has been shown to outperform state-of-the-art methods, and to quickly process some critical genes. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when benchmarked with ENCODE annotations.
Adenine to Inosine RNA editing is a widespread co- and post-transcriptional mechanism mediated by ADAR enzymes acting on double stranded RNA. It has a plethora of biological effects, appears to be particularly pervasive in humans with respect to other mammals, and is implicated in a number of diverse human pathologies. Here we present the first human inosinome atlas comprising 3,041,422 A-to-I events identified in six tissues from three healthy individuals. Matched directional total-RNA-Seq and whole genome sequence datasets were generated and analysed within a dedicated computational framework, also capable of detecting hyper-edited reads. Inosinome profiles are tissue specific and edited gene sets consistently show enrichment of genes involved in neurological disorders and cancer. Overall frequency of editing also varies, but is strongly correlated with ADAR expression levels. The inosinome database is available at: http://srv00.ibbe.cnr.it/editing/.
BACKGROUND: The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). METHODS: In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). RESULTS: Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs
SUMMARY: The reliable detection of RNA editing sites from massive sequencing data remains challenging and, although several methodologies have been proposed, no computational tools have been released to date. Here, we introduce REDItools a suite of python scripts to perform high-throughput investigation of RNA editing using next-generation sequencing data. Availability and implementation: REDItools are in python programming language and freely available at http://code.google.com/p/reditools/. CONTACT: ernesto.picardi@uniba.it or graziano.pesole@uniba.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RNA editing by A-to-I deamination is the prominent co-/post-transcriptional modification in humans. It is carried out by ADAR enzymes and contributes to both transcriptomic and proteomic expansion. RNA editing has pivotal cellular effects and its deregulation has been linked to a variety of human disorders including neurological and neurodegenerative diseases and cancer. Despite its biological relevance, many physiological and functional aspects of RNA editing are yet elusive. Here, we present REDIportal, available online at http://srv00.recas.ba.infn.it/atlas/, the largest and comprehensive collection of RNA editing in humans including more than 4.5 millions of A-to-I events detected in 55 body sites from thousands of RNAseq experiments. REDIportal embeds RADAR database and represents the first editing resource designed to answer functional questions, enabling the inspection and browsing of editing levels in a variety of human samples, tissues and body sites. In contrast with previous RNA editing databases, REDIportal comprises its own browser (JBrowse) that allows users to explore A-to-I changes in their genomic context, empathizing repetitive elements in which RNA editing is prominent.
Metagenomics is providing an unprecedented access to the environmental microbial diversity. The amplicon-based metagenomics approach involves the PCR-targeted sequencing of a genetic locus fitting different features. Namely, it must be ubiquitous in the taxonomic range of interest, variable enough to discriminate between different species but flanked by highly conserved sequences, and of suitable size to be sequenced through next-generation platforms. The internal transcribed spacers 1 and 2 (ITS1 and ITS2) of the ribosomal DNA operon and one or more hyper-variable regions of 16S ribosomal RNA gene are typically used to identify fungal and bacterial species, respectively. In this context, reliable reference databases and taxonomies are crucial to assign amplicon sequence reads to the correct phylogenetic ranks. Several resources provide consistent phylogenetic classification of publicly available 16S ribosomal DNA sequences, whereas the state of ribosomal internal transcribed spacers reference databases is notably less advanced. In this review, we aim to give an overview of existing reference resources for both types of markers, highlighting strengths and possible shortcomings of their use for metagenomics purposes. Moreover, we present a new database, ITSoneDB, of well annotated and phylogenetically classified ITS1 sequences to be used as a reference collection in metagenomic studies of environmental fungal communities. ITSoneDB is available for download and browsing at http://itsonedb.ba.itb.cnr.it/.
Clusterin (CLU) is a nearly ubiquitous multifunctional protein synthesized in different functionally divergent isoforms, sCLU and nCLU, playing a crucial role by keeping a balance between cell proliferation and death. Studying in vivo CLU expression we found a higher mRNA expression both in neoplastic and hyperplastic tissues in comparison to normal endometria; in particular, by RT-qPCR we demonstrated an increase of the specific sCLU isoform in the neoplastic and hyperplastic endometrial diseases. On the contrary, no CLU increase was detected at the protein level. The CLU gene transcriptional activity was upregulated in the hyperplastic and neoplastic tissues, indicating the existence of a fine post-trans-criptional regulation of CLU expression possibly responsible for the protein decrease in the malignant disease. A specific CLU immunoreactivity was present in all the endometrial glandular cells in comparison to the other cellular compartments where CLU immunoreactivity was lower or absent. In conclusion, our results suggest the existence of a complex regulatory mechanism of CLU gene expression during the progression from normal to malignant cells, possibly contributing to endometrial carcinogenesis. Moreover, the specific alteration of the sCLU:nCLU ratio associated with the pathological stage, suggests a possible usage of CLU as molecular biomarker for the diagnosis/prognosis of endometrial proliferative diseases.
Clusterin (CLU) is a nearly ubiquitous multifunctional protein synthesized in different functionally divergent isoforms, sCLU and nCLU, playing a crucial role by keeping a balance between cell proliferation and death. Studying in vivo CLU expression we found a higher mRNA expression both in neoplastic and hyperplastic tissues in comparison to normal endometria; in particular, by RT-qPCR we demonstrated an increase of the specific sCLU isoform in the neoplastic and hyperplastic endometrial diseases. On the contrary, no CLU increase was detected at the protein level. The CLU gene transcriptional activity was upregulated in the hyperplastic and neoplastic tissues, indicating the existence of a fine post-trans-criptional regulation of CLU expression possibly responsible for the protein decrease in the malignant disease. A specific CLU immunoreactivity was present in all the endometrial glandular cells in comparison to the other cellular compartments where CLU immunoreactivity was lower or absent. In conclusion, our results suggest the existence of a complex regulatory mechanism of CLU gene expression during the progression from normal to malignant cells, possibly contributing to endometrial carcinogenesis. Moreover, the specific alteration of the sCLU:nCLU ratio associated with the pathological stage, suggests a possible usage of CLU as molecular biomarker for the diagnosis/prognosis of endometrial proliferative diseases.
ADARs are key proteins for hematopoietic stem cell self-renewal and for survival of differentiating progenitor cells. However, their specific role in myeloid cell maturation has been poorly investigated. Here, we show that ADAR1 is present at basal level in the primary myeloid leukemia cells obtained from patients at diagnosis as well as in myeloid U-937 and THP1 cell lines and its expression correlates with the editing levels. Upon phorbol-myristate acetate (PMA) or VitaminD3/GM-CSF-driven differentiation, both ADAR1 and ADAR2 enzymes are up-regulated, with a concomitant global increase of A-to-I RNA editing. ADAR1-silencing caused an editing decrease at specific ADAR1 target genes, without, however, interfering with cell differentiation or with ADAR2 activity. Remarkably, ADAR2 is absent in the undifferentiated cell stage, due to its elimination through the ubiquitin-proteasome pathway, being strongly up-regulated at the end of the differentiation process. Of note, peripheral blood monocytes display editing events at the selected targets similar to those found in differentiated cell lines. Taken together, the data indicate that ADAR enzymes play important and distinct roles.Leukemia accepted article preview online, 09 May 2017. doi:10.1038/leu.2017.134.
RNA sequencing (RNA-Seq) has become the experimental standard in transcriptome studies. While most of the bioinformatic pipelines for the analysis of RNA-Seq data and the identification of significant changes in transcript abundance are based on the comparison of two conditions, it is common practice to perform several experiments in parallel (e.g. from different individuals, developmental stages, tissues), for the identification of genes showing a significant variation of expression across all the conditions studied. In this work we present RNentropy, a methodology based on information theory devised for this task, which given expression estimates from any number of RNA-Seq samples and conditions identifies genes or transcripts with a significant variation of expression across all the conditions studied, together with the samples in which they are over- or under-expressed. To show the capabilities offered by our methodology, we applied it to different RNA-Seq datasets: 48 biological replicates of two different yeast conditions; samples extracted from six human tissues of three individuals; seven different mouse brain cell types; human liver samples from six individuals. Results, and their comparison to different state of the art bioinformatic methods, show that RNentropy can provide a quick and in depth analysis of significant changes in gene expression profiles over any number of conditions.
While RNA editing by A-to-I deamination is a requisite for neuronal function in humans, it is under investigated in single cells. Here we fill this gap by analysing RNA editing profiles of single cells from the brain cortex of living human subjects. We show that RNA editing levels per cell are bimodally distributed and distinguish between major brain cell types thus providing new insights into neuronal dynamics.
A comprehensive knowledge of all the factors involved in splicing, both proteins and RNAs, and of their interaction network is crucial for reaching a better understanding of this process and its functions. A large part of relevant information is buried in the literature or collected in various different databases. By hand-curated screenings of literature and databases, we retrieved experimentally validated data on 71 human RNA-binding splicing regulatory proteins and organized them into a database called ‘SpliceAid-F’ (http://www.caspur.it/SpliceAidF/). For each splicing factor (SF), the database reports its functional domains, its protein and chemical interactors and its expression data. Furthermore, we collected experimentally validated RNA–SF interactions, including relevant information on the RNA-binding sites, such as the genes where these sites lie, their genomic coordinates, the splicing effects, the experimental procedures used, as well as the corresponding bibliographic references. We also collected information from experiments showing no RNA–SF binding, at least in the assayed conditions. In total, SpliceAid-F contains 4227 interactions, 2590 RNA-binding sites and 1141 ‘no-binding’ sites, including information on cellular contexts and conditions where binding was tested. The data collected in SpliceAid-F can provide significant information to explain an observed splicing pattern as well as the effect of mutations in functional regulatory elements.
p53 is a central hub in controlling cell proliferation. To maintain genome integrity in response to cellular stress, p53 directly regulates the transcription of genes involved in cell cycle arrest, DNA repair, apoptosis and/or senescence. An array of post-translational modifications and protein-protein interactions modulates its stability and activities in order to avoid malignant transformation. However, to date, it is still not clear how cells decide their own fate in response to different types of stress. Here we describe that the human TRIM8 protein, a member of the TRIM family, is a new modulator of the p53-mediated tumor suppression mechanism. We show that under stress conditions, such as UV exposure, p53 induced the expression of TRIM8, which, in turn, stabilized p53, leading to cell cycle arrest and reduction of cell proliferation through enhancement of CDKN1A (p21) and GADD45 expression. TRIM8 silencing reduced the capacity of p53 to activate genes involved in cell cycle arrest and DNA repair in response to cellular stress. Concurrently, TRIM8 overexpression induced the degradation of the MDM2 protein, the principal regulator of p53 stability. Co-immunoprecipitation experiments showed that TRIM8 physically interacted with p53, impairing its interaction with MDM2. Altogether, our results reveal a previously unknown regulatory pathway controlling p53 activity and suggest TRIM8 as a novel therapeutic target to enhance p53 tumor suppressor activity.
In some tumours, despite a wild-type p53 gene, the p53 pathway is inactivated by alterations in its regulators or by unknown mechanisms, leading to resistance to cytotoxic therapies. Understanding the mechanisms of functional inactivation of wild-type p53 in these tumours may help to define prospective targets for treating cancer by restoring p53 activity. Recently, we identified TRIM8 as a new p53 modulator, which stabilizes p53 impairing its association with MDM2 and inducing the reduction of cell proliferation. In this paper we demonstrated that TRIM8 deficit dramatically impairs p53-mediated cellular responses to chemotherapeutic drugs and that TRIM8 is down regulated in patients affected by clear cell Renal Cell Carcinoma (ccRCC), an aggressive drug-resistant cancer showing wild-type p53. These results suggest that down regulation of TRIM8 might be an alternative way to suppress p53 activity in RCC. Interestingly, we show that TRIM8 expression recovery in RCC cell lines renders these cells sensitive to chemotherapeutic treatments following p53 pathway re-activation. These findings provide the first mechanistic link between TRIM8 and the drug resistance of ccRCC and suggest more generally that TRIM8 could be used as enhancer of the chemotherapy efficacy in cancers where p53 is wild-type and its pathway is defective.
RNA editing is a post-transcriptional/co-transcriptional molecular phenomenon whereby a genetic message is modified from the corresponding DNA template by means of substitutions, insertions, and/or deletions. It occurs in a variety of organisms and different cellular locations through evolutionally and biochemically unrelated proteins. RNA editing has a plethora of biological effects including the modulation of alternative splicing and fine-tuning of gene expression. RNA editing events by base substitutions can be detected on a genomic scale by NGS technologies through the REDItools package, an ad hoc suite of Python scripts to study RNA editing using RNA-Seq and DNA-Seq data or RNA-Seq data alone. REDItools implement effective filters to minimize biases due to sequencing errors, mapping errors, and SNPs. The package is freely available at Google Code repository (http://code.google.com/p/reditools/) and released under the MIT license. In the present unit we show three basic protocols corresponding to three main REDItools scripts.
The 5' and 3' untranslated regions of eukaryotic mRNAs (UTRs) play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization and message stability. UTRdb is a curated database of 5' and 3' untranslated sequences of eukaryotic mRNAs, derived from several sources of primary data. Experimentally validated functional motifs are annotated and also collated as the UTRsite database where more specific information on the functional motifs and cross-links to interacting regulatory protein are provided. In the current update, the UTR entries have been organized in a gene-centric structure to better visualize and retrieve 5' and 3'UTR variants generated by alternative initiation and termination of transcription and alternative splicing. Experimentally validated miRNA targets and conserved sequence elements are also annotated. The integration of UTRdb with genomic data has allowed the implementation of an efficient annotation system and a powerful retrieval resource for the selection and extraction of specific UTR subsets. All internet resources implemented for retrieval and functional analysis of 5' and 3' untranslated regions of eukaryotic mRNAs are accessible at http://utrdb.ba.itb.cnr.it/.
BACKGROUND: The advent of massively parallel sequencing technologies (Next Generation Sequencing, NGS) profoundly modified the landscape of human genetics.In particular, Whole Exome Sequencing (WES) is the NGS branch that focuses on the exonic regions of the eukaryotic genomes; exomes are ideal to help us understanding high-penetrance allelic variation and its relationship to phenotype. A complete WES analysis involves several steps which need to be suitably designed and arranged into an efficient pipeline.Managing a NGS analysis pipeline and its huge amount of produced data requires non trivial IT skills and computational power. RESULTS: Our web resource WEP (Whole-Exome sequencing Pipeline web tool) performs a complete WES pipeline and provides easy access through interface to intermediate and final results. The WEP pipeline is composed of several steps:1) verification of input integrity and quality checks, read trimming and filtering; 2) gapped alignment; 3) BAM conversion, sorting and indexing; 4) duplicates removal; 5) alignment optimization around insertion/deletion (indel) positions; 6) recalibration of quality scores; 7) single nucleotide and deletion/insertion polymorphism (SNP and DIP) variant calling; 8) variant annotation; 9) result storage into custom databases to allow cross-linking and intersections, statistics and much more. In order to overcome the challenge of managing large amount of data and maximize the biological information extracted from them, our tool restricts the number of final results filtering data by customizable thresholds, facilitating the identification of functionally significant variants. Default threshold values are also provided at the analysis computation completion, tuned with the most common literature work published in recent years. CONCLUSIONS: Through our tool a user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and a user-friendly web interface for annotated variant visualization.Non-IT mastered users can access through WEP to the most updated and tested WES algorithms, tuned to maximize the quality of called variants while minimizing artifacts and false positives.The web tool is available at the following web address: http://www.caspur.it/wep.
Alzheimer's Disease (AD) is the most common cause of dementia affecting the elderly population worldwide. We have performed a comprehensive transcriptome profiling of Late-Onset AD (LOAD) patients using second generation sequencing technologies, identifying 2,064 genes, 47 lncRNAs and 4 miRNAs whose expression is specifically deregulated in the hippocampal region of LOAD patients. Moreover, analyzing the hippocampal, temporal and frontal regions from the same LOAD patients, we identify specific sets of deregulated miRNAs for each region, and we confirm that the miR-132/212 cluster is deregulated in each of these regions in LOAD patients, consistent with these miRNAs playing a role in AD pathogenesis. Notably, a luciferase assay indicates that miR-184 is able to target the 3'UTR NR4A2 - which is known to be involved in cognitive functions and long-term memory and whose expression levels are inversely correlated with those of miR-184 in the hippocampus. Finally, RNA editing analysis reveals a general RNA editing decrease in LOAD hippocampus, with 14 recoding sites significantly and differentially edited in 11 genes. Our data underline specific transcriptional changes in LOAD brain and provide an important source of information for understanding the molecular changes characterizing LOAD progression.
The huge amount of transcript data produced by high-throughput sequencing requires the development and implementation of suitable bioinformatic workflows for their analysis and interpretation. These analysis workflows, including different modules, should be specifically designed also based on the sequencing platform (Roche 454, Illumina, SOLiD) and the nature of the data (polyA or total RNA fraction, strand specificity). In the case of cDNA obtained from a total RNA preparation, in addition to polyadenylated protein coding mRNAs, a great variety of transcript sequences can be obtained, including ribosomal RNAs, mitochondrial transcripts and a large variety of functional non coding RNAs (ncRNAs). To deal with these data the analysis workflow should include specific modules to distinguish ncRNAs fractions from the large number of other functional proteincoding transcripts. To this aim we developed an analysis pipeline that, given as input a large collection of reads (particularly from Roche 454), provides the expression profile at qualitative and quantitative level of human mtDNA, ribosomal RNAs, ncRNAs and protein coding mRNAs.
Gene expression regulatory elements are scattered in gene promoters and pre-mRNAs. In particular, RNA elements lying in untranslated regions (5? and 3?UTRs) are poorly studied because of their peculiar features (i.e., a combination of primary and secondary structure elements) which also pose remarkable computational challenges. Several years ago, we began collecting experimentally characterized UTR regulatory elements, developing the specialized database UTRsite. This paper describes the detailed guidelines to annotate cis-regulatory elements in 5? and 3? UnTranslated Regions (UTRs) by computational analyses, retracing all main steps used by UTRsite curators.
RNA editing is a post-transcriptional process occurring in a wide range of organisms. In human brain, the A-to-I RNA editing, in which individual adenosine (A) bases in pre-mRNA are modified to yield inosine (I), is the most frequent event. Modulating gene expression, RNA editing is essential for cellular homeostasis. Indeed, its deregulation has been linked to several neurological and neurodegenerative diseases. To date, many RNA editing sites have been identified by next generation sequencing technologies employing massive transcriptome sequencing together with whole genome or exome sequencing. While genome and transcriptome reads are not always available for single individuals, RNA-Seq data are widespread through public databases and represent a relevant source of yet unexplored RNA editing sites. In this context, we propose a simple computational strategy to identify genomic positions enriched in novel hypothetical RNA editing events by means of a new two-steps mapping procedure requiring only RNA-Seq data and no a priori knowledge of RNA editing characteristics and genomic reads. We assessed the suitability of our procedure by confirming A-to-I candidates using conventional Sanger sequencing and performing RNA-Seq as well as whole exome sequencing of human spinal cord tissue from a single individual.
Recent studies have demonstrated an unexpected complexity of transcription in eukaryotes. The majority of the genome is transcribed and only a little fraction of these transcripts is annotated as protein coding genes and their splice variants. Indeed, most transcripts are the result of antisense, overlapping and non-coding RNA expression. In this frame, one of the key aims of high throughput transcriptome sequencing is the detection of all RNA species present in the cell and the first crucial step for RNA-seq users is represented by the choice of the strategy for cDNA library construction. The protocols developed so far provide the utilization of the entire library for a single sequencing run with a specific platform.ResultsWe set up a unique protocol to generate and amplify a strand-specific cDNA library representative of all RNA species that may be implemented with all major platforms currently available on the market (Roche 454, Illumina, ABI/SOLiD). Our method is reproducible, fast, easy-to-perform and even allows to start from low input total RNA. Furthermore, we provide a suitable bioinformatics tool for the analysis of the sequences produced following this protocol.ConclusionWe tested the efficiency of our strategy, showing that our method is platform-independent, thus allowing the simultaneous analysis of the same sample with different NGS technologies, and providing an accurate quantitative and qualitative portrait of complex whole transcriptomes.
Ascidians are a fascinating group of filter-feeding marine chordates characterized by rapid evolution of both sequences and structure of their nuclear and mitochondrial genomes. Moreover, they include several model organisms used to investigate complex biological processes in chordates. To study the evolutionary dynamics of ascidians at short phylogenetic distances, we sequenced 13 new mitogenomes and analyzed them, together with 15 other available mitogenomes, using a novel approach involving detailed whole-mitogenome comparisons of conspecific and congeneric pairs. The evolutionary rate was quite homogeneous at both intraspecific and congeneric level, and the lowest congeneric rates were found in cryptic (morphologically undistinguishable) and in morphologically very similar species pairs. Moreover, congeneric nonsynonymous rates (dN) were up to two orders of magnitude higher than in intraspecies pairs. Overall, a clear-cut gap sets apart conspecific from congeneric pairs. These evolutionary peculiarities allowed easily identifying an extraordinary intraspecific variability in the model ascidian Botryllus schlosseri, where most pairs show a dN value between that observed at intraspecies and congeneric level, yet consistently lower than that of the Ciona intestinalis cryptic species pair. These data suggest ongoing speciation events producing genetically distinct B. schlosseri entities. Remarkably, these ongoing speciation events were undetectable by the cox1 barcode fragment, demonstrating that, at low phylogenetic distances, the whole mitogenome has a higher resolving power than cox1. Our study shows that whole-mitogenome comparative analyses, performed on a suitable sample of congeneric and intraspecies pairs, may allow detecting not only cryptic species but also ongoing speciation events.
Alternative splicing is emerging as a major mechanism for the expansion of the transcriptome and proteome diversity, particularly in human and other vertebrates. However, the proportion of alternative transcripts and proteins actually endowed with functional activity is currently highly debated. We present here a new release of ASPicDB which now provides a unique annotation resource of human protein variants generated by alternative splicing. A total of 256 939 protein variants from 17 191 multi-exon genes have been extensively annotated through state of the art machine learning tools providing information of the protein type (globular and transmembrane), localization, presence of PFAM domains, signal peptides, GPIanchor propeptides, transmembrane and coiledcoil segments. Furthermore, full-length variants can be now specifically selected based on the annotation of CAGE-tags and polyA signal and/or polyA sites, marking transcription initiation and termination sites, respectively. The retrieval can be carried out at gene, transcript, exon, protein or splice site level allowing the selection of data sets fulfilling one or more features settled by the user. The retrieval interface also enables the selection of protein variants showing specific differences in the annotated features. ASPicDB is available at http://www .caspur.it/ASPicDB/.
Alternative splicing (AS) is a basic molecular phenomenon that increases the functional complexity of higher eukaryotic transcriptomes. Indeed, through AS individual gene loci can generate multiple RNAs from the same pre-mRNA. AS has been investigated in a variety of clinical and pathological studies, such as the transcriptome regulation in cancer. In human, recent works based on massive RNA sequencing indicate that >95 % of pre-mRNAs are processed to yield multiple transcripts. Given the biological relevance of AS, several computational efforts have been done leading to the implementation of novel algorithms and specifi c specialized databases. Here we describe the web application ASPicDB that allows the recovery of detailed biological information about the splicing mechanism. ASPicDB provides powerful querying systems to interrogate AS events at gene, transcript, and protein levels. Finally, ASPicDB includes web visualization instruments to browse and export results for further off-line analyses.
New genomes are being sequenced at an increasingly rapid rate, far outpacing the rate at which manual gene annotation can be performed. Automated genome annotation is thus necessitated by this growth in genome projects; however, full-fledged annotation systems are usually home-grown and customized to a particular genome. There is thus a renewed need for accurate ab initio gene prediction methods. However, it is apparent that fully ab initio methods fall short of the required level of sensitivity and specificity for a quality annotation. Evidence in the form of expressed sequences gives the single biggest improvement in accuracy when used to inform gene predictions. Here, we present a lightweight pipeline for first-pass gene prediction on newly sequenced genomes. The two main components are ASPic, a program that derives highly accurate, albeit not necessarily complete, EST-based transcript annotations from EST alignments, and GeneID, a standard gene prediction program, which we have modified to take as evidence intron annotations. The introns output by ASPic CDS predictions is given to GeneID to constrain the exon-chaining process and produce predictions consistent with the underlying EST alignments. The pipeline was successfully tested on the entire C. elegans genome and the 44 ENCODE human pilot regions.
De novo high-throughput pyrosequencing was used to detect and characterize 2009 pandemic influenza A (H1N1) virus directly in nasopharyngeal swabs in the context of the microbial community. Data were generated with a prior sequenceindependent amplification by 454 pyrosequencing on GS-FLX platform (Roche). Influenza A assembled reads allowed near full-length genome reconstruction with the simultaneous analysis of site-specific heterogeneity. The molecular approach applied proved to be a powerful tool to characterize the new pandemic H1N1 influenza virus in clinical samples. This approach could be of great value in identifying possibly new reassortants that may occur in the near future.
Recent discoveries have highlighted the fact that alternative splicing and alternative transcripts are the rule, rather than the exception, in metazoan genes. Since multiple transcript and protein variants expressed by the same gene are, by definition, structurally distinct and need not to be functionally equivalent, the concept of gene orthology should be extended to the transcript level in order to describe evolutionary relationships between structurally similar transcript variants. In other words, the identification of true orthology relationships between gene products now should progress beyond primary sequence and "splicing orthology", consisting in ancestrally shared exon-intron structures, is required to define orthologous isoforms at transcript level.Results: As a starting step in this direction, in this work we performed a large scale human- mouse gene comparison with a twofold goal: first, to assess if and to which extent traditional gene annotations such as RefSeq capture genuine splicing orthology; second, to provide a more detailed annotation and quantification of true human-mouse orthologous transcripts defined as transcripts of orthologous genes exhibiting the same splicing patterns.Conclusions: We observed an identical exon/intron structure for 32% of human and mouse orthologous genes. This figure increases to 87% using less stringent criteria for gene structure similarity, thus implying that for about 13% of the human RefSeq annotated genes (and about 25% of the corresponding transcripts) we could not identify any mouse transcript showing sufficient similarity to be confidently assigned as a splicing ortholog. Our data suggest that current gene and transcript data may still be rather incomplete - with several splicing variants still unknown. The observation that alternative splicing produces large numbers of alternative transcripts and proteins, some of them conserved across species and others truly species-specific, suggests that, still maintaining the conventional definition of gene orthology, a new concept of "splicing orthology" can be defined at transcript level. © 2010 Zambelli et al; licensee BioMed Central Ltd.
Philadelphia (Ph+) positive leukaemias are an example of haematological malignant diseases where different chromosomal rearrangements involving both BCR and ABL1 genes generate a variety of chimeric proteins (BCR/ABL1 p210, p190 and p230) which are considered pathological "biomarkers". In addition to these three, there is a variety of fusion transcripts whose origin may depend either on diverse genetic rearrangement or on alternative/atypical splicing of the main mRNAs or on the occurrence of single-point mutations. Although the therapy of Ph+ leukaemias based on Imatinib represents a triumph of medicine, not all patients benefit from such drug and may show resistance and intolerance. Furthermore, interruption of Imatinib administration is often followed by clinical relapse, suggesting a failure in the eradication of residual leukaemic stem cells. Therefore, while the targeted therapy is searching for new and implemented pharmacological inhibitors covering all the possible mutations in the kinase domain, there is urge to identify alternative molecular targets to develop other specific and effective therapeutic approaches. In this review we discuss the importance of recent advances based on the discovery of novel BCR/ABL1 variants and their potential role as new targets/biomarkers of Ph+ leukaemias in the light of the current therapeutic trends. The limits of the pharmacological inhibitors used for treating the disease can be overcome by considering other targets than the kinase enzyme. Our evaluations highlight the potential of alternative perspectives in the therapy of Ph+ leukaemias.
Technical advances such as the development of molecular cloning, Sanger sequencing, PCR and oligonucleotidemicroarrays are key to our current capacity to sequence, annotate and study complete organismal genomes.Recent years have seen the development of a variety of so-called 'next-generation' sequencing platforms, withseveral others anticipated to become available shortly. The previously unimaginable scale and economy of thesemethods, coupled with their enthusiastic uptake by the scientific community and the potential for further improvements in accuracy and read length, suggest that these technologies are destined to make a huge and ongoing impact upon genomic and post-genomic biology. However, like the analysis of microarray data and the assembly and annotation of complete genome sequences from conventional sequencing data, the management and analysis of next-generation sequencing data requires (and indeed has already driven) the development of informatics tools able to assemble, map, and interpret huge quantities of relatively or extremely short nucleotide sequence data.Here we provide a broad overview of bioinformatics approaches that have been introduced for several genomicsand functional genomics applications of next-generation sequencing.
High-throughput DNA sequencing is increasing the amount of public complete genomes even though a precise gene catalogue for each organism is not yet available. In this context, computational gene finders play a key role in producing a first and cost-effective annotation. Nowadays a compilation of gene prediction tools has been made available to the scientific community and, despite the high number, they can be divided into two main categories: (1) ab initio and (2) evidence based. In the following, we will provide an overview of main methodologies to predict correct exon-intron structures of eukaryotic genes falling in such categories. We will take into account also new strategies that commonly refine ab initio predictions employing comparative genomics or other evidence such as expression data. Finally, we will briefly introduce metrics to in house evaluation of gene predictions in terms of sensitivity and specificity at nucleotide, exon, and gene levels as well.
MicroRNAs are short (~21 base) single stranded RNAs that, in plants, are generally coded by specific genes and cleaved specifically from hairpin precursors. MicroRNAs are critical for the regulation of multiple developmental, stress related and other physiological processes in plants. The recent annotation of the genome of the grapevine (Vitis vinifera L.) allowed the identification of many putative conserved microRNA precursors, grouped into multiple gene families.Results: Here we use oligonucleotide arrays to provide the first indication that many of these microRNAs show differential expression patterns between tissues and during the maturation of fruit in the grapevine. Furthermore we demonstrate that whole transcriptome sequencing and deep-sequencing of small RNA fractions can be used both to identify which microRNA precursors are expressed in different tissues and to estimate genomic coordinates and patterns of splicing and alternative splicing for many primary miRNA transcripts.Conclusions: Our results show that many microRNAs are differentially expressed in different tissues and during fruit maturation in the grapevine. Furthermore, the demonstration that whole transcriptome sequencing can be used to identify candidate splicing events and approximate primary microRNA transcript coordinates represents a significant step towards the large-scale elucidation of mechanisms regulating the expression of microRNAs at the transcriptional and post-transcriptional levels. © 2010 Mica et al; licensee BioMed Central Ltd.
The transcription factor interferon regulatory factor 6 (IRF6) regulates craniofacial development and epidermal proliferation. We recently showed that IRF6 is a component of a regulatory feedback loop that controls the proliferative potential of epidermal cells. IRF6 is transcriptionally activated by p63 and induces its proteasome-mediated down-regulation, thereby limiting keratinocyte proliferative potential. We hypothesized that IRF6 may also be involved in Skin carcinogenesis. Hence, we analyzed IRF6 expression in a large series of squamous cell carcinomas (SCCs) and found a strong down-regulation of IRF6 that correlated with tumor invasive and differentiation status. IRF6 down-regulation in SCC cell lines and primary tumors correlates with methylation on a CpG dinucleotide island located in its promoter region. To identify the molecular mechanisms regulating IRF6 potential tumor suppressive activity, we performed a genome-wide analysis by combining ChIP sequencing for IRF6 binding sites and gene expression profiling in primary human keratinocytes after siRNA-mediated IRF6 depletion. We observed dysregulation of cell cycle-related genes and genes involved in differentiation, cell adhesion, and cell-cell contact. Many of these genes were direct IRF6 targets. We also performed in vitro invasion assays showing that IRF6 down-regulation promotes invasive behavior and that reintroduction of IRF6 into SCC cells strongly inhibits cell growth. These results indicate a function for IRF6 in suppression of tumorigenesis in stratified epithelia.
Differences in the inherent properties of adipose tissue-derived stem cells (ASC) may contribute to the biological specificity of the subcutaneous (Sc) and visceral (V) adipose tissue depots. In this study, three distinct subpopulations of ASC, i.e. ASC(SVF), ASC(Bottom), and ASC(Ceiling), were isolated from Sc and V fat biopsies of non-obese subjects, and their gene expression and functional characteristics were investigated. Genome-wide mRNA expression profiles of ASC(SVF), ASC(Bottom) and ASC(Ceiling) from Sc fat were significantly different as compared to their homologous subsets of V-ASCs. Furthermore, ASC(SVF), ASC(Ceiling) and ASC(Bottom) from the same fat depot were also distinct from each other. In this respect, both principal component analysis and hierarchical clusters analysis showed that ASC(Ceiling) and ASC(SVF) shared a similar pattern of closely related genes, which was highly different when compared to that of ASC(Bottom). However, larger variations in gene expression were found in inter-depot than in intra-depot comparisons. The analysis of connectivity of genes differently expressed in each ASC subset demonstrated that, although there was some overlap, there was also a clear distinction between each Sc-ASC and their corresponding V-ASC subsets, and among ASC(SVF), ASC(Bottom), and ASC(Ceiling) of Sc or V fat depots in regard to networks associated with regulation of cell cycle, cell organization and development, inflammation and metabolic responses. Finally, the release of several cytokines and growth factors in the ASC cultured medium also showed both inter-and intra-depot differences. Thus, ASC(Ceiling) and ASC(Bottom) can be identified as two genetically and functionally heterogeneous ASC populations in addition to the ASC(SVF), with ASC(Bottom) showing the highest degree of unmatched gene expression. On the other hand, inter-depot seem to prevail over intra-depot differences in the ASC gene expression assets and network functions, contributing to the high degree of specificity of Sc and V adipose tissue in humans.
The genome sequence of a Sphingobium strain capable of tolerating high concentrations of Ni ions, and exhibiting natural kanamycin resistance, is presented. The presence of a transposon derived kanamycin resistance gene and several genes for efflux-mediated metal resistance may explain the observed characteristics of the new Sphingobium isolate.
Ochratoxin A (OTA) is a nephrotoxic and potentially carcinogenic mycotoxin produced by several species of Aspergillus and Penicillium. It is one of the major mycotoxins contaminating grain, grapes and a variety of food products, and the development of methods for reducing pre-and post-harvest contamination has drawn considerable attention. In the current study, we isolated and sequenced the genome of a novel free-living Acinetobacter strain able to degrade OTA. Biochemical studies suggest that the degradation reaction proceeds via peptide bond hydrolysis.
Summary: ExpEdit is a web application for assessing RNA editing in human at known or user-specified sites supported by transcript data obtained by RNA-Seq experiments. Mapping data (in SAM/BAM format) or directly sequence reads [in FASTQ/short read archive (SRA) format] can be provided as input to carry out a comparative analysis against a large collection of known editing sites collected in DARNED database as well as other user-provided potentially edited positions. Results are shown as dynamic tables containing University of California, Santa Cruz (UCSC) links for a quick examination of the genomic context. © The Author 2011. Published by Oxford University Press. All rights reserved.
By a combination of rapid amplification of cDNA ends (RACE) and reverse transcription-polymerase chain reaction (RT-PCR) we identified three T cell receptor delta variable (TRDV) subgroups and five joining (TRDJ) genes expressed in spleen, tonsils and blood of Camelus dromedarius. We provide evidence that the high diversity in sequence and length of the third complementarity determining region (CDR3) is a major component of the TR delta chain variability. Moreover, the identification of the corresponding germline genes allowed us to find out for the first time in a mammalian organism that productively rearranged TRDV genes undergo somatic mutation: the mutation rate of the analysed TRDV4 region is 0.013 per base pair in spleen and 0.009 in blood. The point mutations are scattered throughout the length of the variable domain from framework region FR1 to FR4. This random distribution of the amino acid changes, instead of its CDR clustering observed in immunoglobulins (IG), indicates that somatic mutation in dromedary, while contributing to the development of the TRDV repertoire, is not under antigen selection. © 2011 Elsevier Ltd.
Comparisons of draft genome sequences of three geographically distinct isolates of Fusariumfujikuroi with two recently published genome sequences from the same species suggest diverseprofiles of secondary metabolite production within F. fujikuroi. Species- and lineage-specific genes,many of which appear to exhibit expression profiles that are consistent with roles in host-pathogeninteractions and adaptation to environmental changes, are concentrated in sub-telomeric regions.These genomic compartments also exhibit distinct gene densities and compositional characteristicswith respect to other genomic partitions, and likely play a role in the generation of moleculardiversity. Our data provide additional evidence that gene duplication, divergence and differentialloss play important roles in F. fujikuroi genome evolution and suggest that hundreds of lineagespecific genes might have been acquired through horizontal gene transfer.
Clear cell renal cell carcinoma (ccRCC) is the most common malignant renal epithelial tumor and also the most deadly. To identify molecular changes occurring in ccRCC, in the present study we performed a genome wide analysis of its entire complement of mRNAs. Gene and exon-level analyses were carried out by means of the Affymetrix Exon Array platform. To achieve a reliable detection of differentially expressed cassette exons we implemented a novel methodology that considered contiguous combinations of exon triplets and candidate differentially expressed cassette exons were identified when the expression level was significantly different only in the central exon of the triplet. More detailed analyses were performed for selected genes using quantitative RT-PCR and confocal laser scanning microscopy. Our analysis detected over 2,000 differentially expressed genes, and about 250 genes alternatively spliced and showed differential inclusion of specific cassette exons comparing tumor and non-tumoral tissues. We demonstrated the presence in ccRCC of an altered expression of the PTP4A3, LAMA4, KCNJ1 and TCF21 genes (at both transcript and protein level). Furthermore, we confirmed, at the mRNA level, the involvement of CAV2 and SFRP genes that have previously been identified. At exon level, among potential candidates we validated a differentially included cassette exon in DAB2 gene with a significant increase of DAB2 p96 splice variant as compared to the p67 isoform. Based on the results obtained, and their robustness according to both statistical analysis and literature surveys, we believe that a combination of gene/isoform expression signature may remarkably contribute, after suitable validation, to a more effective and reliable definition of molecular biomarkers for ccRCC early diagnosis, prognosis and prediction of therapeutic response.
The few sequenced mitochondrial (mt) genomes of the class Ascidiacea (Chordata, Tunicata), mostly belonging to congeneric species of the Phlebobranchia order, show extraordinary gene order rearrangements. In order to assess if this hypervariability in gene order is a general feature of Ascidiacea, we report here the gene arrangement of five ascidians belonging to the Aplousobranchia and Stolidobranchia orders. Our data show that Ascidiacea are characterized by: 1) extensive gene order rearrangements both within and between the three major lineages; 2) lack of significant similarities to the gene order of other deuterostomes; and 3) an extent of rearrangements comparable with that of Mollusca (especially the Gastropoda, Bivalvia, and Scaphopoda classes), a phylum with highly rearranged mtDNAs. The only conserved feature is the location of all genes on the same strand, which suggests that selective constraints are related to the mt transcription. Finally, a higher mobility of the tRNA genes is undetectable because of saturation effect, and only the partially conserved cox2-cob gene block seems to retain some phylogenetic signals.
Many evidences report that alternative splicing, the mechanism which produces mRNAs and proteins with different structures and functions from the same gene, is altered in cancer cells. Thus, the identification and characterization of cancer-specific splice variants may give large impulse to the discovery of novel diagnostic and prognostic tumour biomarkers, as well as of new targets for more selective and effective therapies.ResultsWe present here a genome-wide analysis of the alternative splicing pattern of human genes through a computational analysis of normal and cancer-specific ESTs from seventeen anatomical groups, using data available in AspicDB, a database resource for the analysis of alternative splicing in human. By using a statistical methodology, normal and cancer-specific genes, splice sites and cassette exons were predicted in silico. The condition association of some of the novel normal/tumoral cassette exons was experimentally verified by RT-qPCR assays in the same anatomical system where they were predicted. Remarkably, the presence in vivo of the predicted alternative transcripts, specific for the nervous system, was confirmed in patients affected by glioblastoma.ConclusionThis study presents a novel computational methodology for the identification of tumor-associated transcript variants to be used as cancer molecular biomarkers, provides its experimental validation, and reports specific biomarkers for glioblastoma.
RNA editing is a widespread post-transcriptional molecular phenomenon that can increase proteomic diversity, by modifying the sequence of completely or partially non-functional primary transcripts, through a variety of mechanistically and evolutionarily unrelated pathways. Editing by base substitution has been investigated in both animals and plants. However, conventional strategies based on directed Sanger sequencing are time-consuming and effectively preclude genome wide identification of RNA editing and assessment of partial and tissue-specific editing sites. In contrast, the high-throughput RNA-Seq approach allows the generation of a comprehensive landscape of RNA editing at the genome level. Short reads from Solexa/Illumina GA and ABI SOLiD platforms have been used to investigate the editing pattern in mitochondria of Vitis vinifera providing significant support for 401 C-to-U conversions in coding regions and an additional 44 modifications in non-coding RNAs. Moreover, 76% of all C-to-U conversions in coding genes represent partial RNA editing events and 28% of them were shown to be significantly tissue specific. Solexa/Illumina and SOLiD platforms showed different characteristics with respect to the specific issue of large-scale editing analysis, and the combined approach presented here reduces the false positive rate of discovery of editing events.
MitoZoa is a relational database collecting curated metazoan entries of complete or nearly complete mitochondrial genomes (mtDNA), specifically designed to assist comparative studies of mitochondrial genome-level features in a given taxon or in congeneric species of Metazoa. The principal novelties of MitoZoa are extensive corrections/improvements of the mtDNA annotations and the possibility of easily searching for data on: (1) gene order, a genomic feature useful as phylogenetic marker; (2) sequence, size and location of non-coding regions, likely containing the regulatory signals for mtDNA replication and transcription; (3) mt features/sequences of congeneric species, where saturation phenomena in nucleotide substitutions and gene order changes are expected to be absent or at least minimal. In addition, MitoZoa allows the exploration of basic mt features such as molecule topology, genetic code, gene content, and compositional parameters of the entire genome. Finally, in order to facilitate downstream analyses of retrieved data, MitoZoa entry lists can be visualized and downloaded in a tabular format, while sequences and gene order data are provided in FASTA and FASTA-like formats, respectively. The MitoZoa database is available at http://www.caspur.it/mitozoa. © 2010 Mitochondria Research Society.
Motif discovery has been one of the most widely studied problems in bioinformatics ever since genomic and protein sequences have been available. In particular, its application to the de novo prediction of putative over-represented transcription factor binding sites in nucleotide sequences has been, and still is, one of the most challenging flavors of the problem. Recently, novel experimental techniques like chromatin immunoprecipitation (ChIP) have been introduced, permitting the genome-wide identification of protein-DNA interactions. ChIP, applied to transcription factors and coupled with genome tiling arrays (ChIP on Chip) or next-generation sequencing technologies (ChIP-Seq) has opened new avenues in research, as well as posed new challenges to bioinformaticians developing algorithms and methods for motif discovery.
Here we present the MSA-PAD application, a DNA multiple sequence alignment framework that uses PFAM protein domain information to align DNA sequences encoding either single or multiple protein domains. MSA-PAD has two alignment options: gene and genome mode.
Alternative Splicing is one of the most important research targets in biology because it can explain the remarkably higher complexity of transcriptome and proteome of human and other metazoa with respect to their gene complement. The advent of DNA sequencing made the possibility to save the whole human genome in a ~3Gb database, that bioinformaticians can investigate by using suitable methods for deciphering its informational content. In particular, there is the necessity to develop ad-hoc tools in order to predict and validate possible splicing sites in human genes. Indeed, as a matter of fact recent experiments discovered that almost 15% of genetic diseases is caused by mutations affecting the alternative splicing pattern. This work proposes two tools - GeneParser and BowtieParser - for investigating the alternative splicing pattern, and specifically exon skip events, through the analysis of next generation sequencing data. © 2010 Springer-Verlag Berlin Heidelberg.
Next-Generation Sequencing (NGS) technology has exceptionally increased the ability to sequence DNA in a massively parallel and cost-effective manner. Nevertheless, NGS data analysis requires bioinformatics skills and computational resources well beyond the possibilities of many "wet biology" laboratories. Moreover, most of projects only require few sequencing cycles and standard tools or workflows to carry out suitable analyses for the identification and annotation of genes, transcripts and splice variants found in the biological samples under investigation. These projects can take benefits from the availability of easy to use systems to automatically analyse sequences and to mine data without the preventive need of strong bioinformatics background and hardware infrastructure. Results: To address this issue we developed an automatic system targeted to the analysis of NGS data obtained from large-scale transcriptome studies. This system, we named NGS-Trex (NGS Transcriptome profile explorer) is available through a simple web interface http://www.ngs-trex.org and allows the user to upload raw sequences and easily obtain an accurate characterization of the transcriptome profile after the setting of few parameters required to tune the analysis procedure. The system is also able to assess differential expression at both gene and transcript level (i.e. splicing isoforms) by comparing the expression profile of different samples. By using simple query forms the user can obtain list of genes, transcripts, splice sites ranked and filtered according to several criteria. Data can be viewed as tables, text files or through a simple genome browser which helps the visual inspection of the data. Conclusions: NGS-Trex is a simple tool for RNA-Seq data analysis mainly targeted to "wet biology" researchers with limited bioinformatics skills. It offers simple data mining tools to explore transcriptome profiles of samples investigated taking advantage of NGS technologies.
Chromatin immunoprecipitation followed by sequencing with next-generation technologies (ChIP-Seq) has become the de facto standard for building genome-wide maps of regions bound by a given transcription factor (TF). The regions identified, however, have to be further analyzed to determine the actual DNA-binding sites for the TF, as well as sites for other TFs belonging to the same TF complex or in general co-operating or interacting with it in transcription regulation. PscanChIP is a web server that, starting from a collection of genomic regions derived from a ChIP-Seq experiment, scans them using motif descriptors like JASPAR or TRANSFAC position-specific frequency matrices, or descriptors uploaded by users, and it evaluates both motif enrichment and positional bias within the regions according to different measures and criteria. PscanChIP can successfully identify not only the actual binding sites for the TF investigated by a ChIP-Seq experiment but also secondary motifs corresponding to other TFs that tend to bind the same regions, and, if present, precise positional correlations among their respective sites. The web interface is free for use, and there is no login requirement. It is available at http://www.beaconlab.it/pscan_chip_dev.
The reliable detection of RNA editing sites from massive sequencing data remains challenging and, although several methodologies have been proposed, no computational tools have been released to date. Here, we introduce REDItools a suite of python scripts to perform high-throughput investigation of RNA editing using next-generation sequencing data.
Metagenomics is providing an unprecedented access to the environmental microbial diversity. The amplicon-basedmetagenomics approach involves the PCR-targeted sequencing of a genetic locus fitting different features. Namely,it must be ubiquitous in the taxonomic range of interest, variable enough to discriminate between different speciesbut flanked by highly conserved sequences, and of suitable size to be sequenced through next-generation platforms.The internal transcribed spacers 1 and 2 (ITS1 and ITS2) of the ribosomal DNA operon and one or morehyper-variable regions of 16S ribosomal RNA gene are typically used to identify fungal and bacterial species, respectively.In this context, reliable reference databases and taxonomies are crucial to assign amplicon sequence reads tothe correct phylogenetic ranks. Several resources provide consistent phylogenetic classification of publicly available16S ribosomal DNA sequences, whereas the state of ribosomal internal transcribed spacers reference databases isnotably less advanced. In this review, we aim to give an overview of existing reference resources for both types ofmarkers, highlighting strengths and possible shortcomings of their use for metagenomics purposes. Moreover, wepresent a new database, ITSoneDB, of well annotated and phylogenetically classified ITS1 sequences to be used asa reference collection in metagenomic studies of environmental fungal communities. ITSoneDB is available for downloadand browsing at http://itsonedb.ba.itb.cnr.it/.
Clusterin (CLU) is a nearly ubiquitous multifunctional protein synthesized in different functionally divergent isoforms, sCLU and nCLU, playing a crucial role by keeping a balance between cell proliferation and death. Studying in vivo CLU expression we found a higher mRNA expression both in neoplastic and hyperplastic tissues in comparison to normal endometria; in particular, by RT-qPCR we demonstrated an increase of the specific sCLU isoform in the neoplastic and hyperplastic endometrial diseases. On the contrary, no CLU increase was detected at the protein level. The CLU gene transcriptional activity was upregulated in the hyperplastic and neoplastic tissues, indicating the existence of a fine post-trans-criptional regulation of CLU expression possibly responsible for the protein decrease in the malignant disease. A specific CLU immunoreactivity was present in all the endometrial glandular cells in comparison to the other cellular compartments where CLU immunoreactivity was lower or absent. In conclusion, our results suggest the existence of a complex regulatory mechanism of CLU gene expression during the progression from normal to malignant cells, possibly contributing to endometrial carcinogenesis. Moreover, the specific alteration of the sCLU:nCLU ratio associated with the pathological stage, suggests a possible usage of CLU as molecular biomarker for the diagnosis/prognosis of endometrial proliferative diseases.
A comprehensive knowledge of all the factors involved in splicing, both proteins and RNAs, and of their interaction network is crucial for reaching a better understanding of this process and its functions. A large part of relevant information is buried in the literature or collected in various different databases. By hand-curated screenings of literature and databases, we retrieved experimentally validated data on 71 human RNA-binding splicing regulatory proteins and organized them into a database called 'SpliceAid-F' (http://www.caspur.it/SpliceAidF/). For each splicing factor (SF), the database reports its functional domains, its protein and chemical interactors and its expression data. Furthermore, we collected experimentally validated RNA-SF interactions, including relevant information on the RNA-binding sites, such as the genes where these sites lie, their genomic coordinates, the splicing effects, the experimental procedures used, as well as the corresponding bibliographic references. We also collected information from experiments showing no RNA-SF binding, at least in the assayed conditions. In total, SpliceAid-F contains 4227 interactions, 2590 RNA-binding sites and 1141 'no-binding' sites, including information on cellular contexts and conditions where binding was tested. The data collected in SpliceAid-F can provide significant information to explain an observed splicing pattern as well as the effect of mutations in functional regulatory elements.
Eukaryotic cells contain a population of mitochondria, variable in number and shape, which in turn contain multiple copies of a tiny compact genome (mtDNA) whose expression and function is strictly coordinated with the nuclear one. mtDNA copy number varies between different cell or tissues types, both in response to overall metabolic and bioenergetics demands and as a consequence or cause of specific pathological conditions. Here we present a novel and reliable methodology to assess the effective mtDNA copy number per diploid genome by investigating off-target reads obtained by whole-exome sequencing (WES) experiments. We also investigate whether and how mtDNA copy number correlates with mitochondrial mass, respiratory activity and expression levels. Analyzing six different tissues from three age- and sex-matched human individuals, we found a highly significant linear correlation between mtDNA copy number estimated by qPCR and the frequency of mtDNA off target WES reads. Furthermore, mtDNA copy number showed highly significant correlation with mitochondrial gene expression levels as measured by RNA-Seq as well as with mitochondrial mass and respiratory activity. Our methodology makes thus feasible, at a large scale, the investigation of mtDNA copy number in diverse cell-types, tissues and pathological conditions or in response to specific treatments.
Essential biodiversity variables (EBVs) have been proposed by the Group on Earth Observations Biodiversity Observation Network (GEO BON) to identify a minimum set of essential measurements that are required for studying, monitoring and reporting biodiversity and ecosystem change. Despite the initial conceptualisation, however, the practical implementation of EBVs remains challenging. There is much discussion about the concept and implementation of EBVs: which variables are meaningful; which data are needed and available; at which spatial, temporal and topical scales can EBVs be calculated; and how sensitive are EBVs to variations in underlying data? To advance scientific progress in implementing EBVs we propose that both scientists and research infrastructure operators need to cooperate globally to serve and process the essential large datasets for calculating EBVs. We introduce GLOBIS-B (GLOBal Infrastructures for Supporting Biodiversity research), a global cooperation funded by the Horizon 2020 research and innovation framework programme of the European Commission. The main aim of GLOBIS-B is to bring together biodiversity scientists, global research infrastructure operators and legal interoperability experts to identify the research needs and infrastructure services underpinning the concept of EBVs. The project will facilitate the multi-lateral cooperation of biodiversity research infrastructures worldwide and identify the required primary data, analysis tools, methodologies and legal and technical bottlenecks to develop an agenda for research and infrastructure development to compute EBVs. This requires development of standards, protocols and workflows that are 'self-documenting' and openly shared to allow the discovery and analysis of data across large spatial extents and different temporal resolutions. The interoperability of existing biodiversity research infrastructures will be crucial for integrating the necessary biodiversity data to calculate EBVs, and to advance our ability to assess progress towards the Aichi targets for 2020 of the Convention on Biological Diversity (CBD).
p53 is a central hub in controlling cell proliferation. To maintain genome integrity in response to cellular stress, p53 directly regulates the transcription of genes involved in cell cycle arrest, DNA repair, apoptosis and/or senescence. An array of post-translational modifications and protein-protein interactions modulates its stability and activities in order to avoid malignant transformation. However, to date it is still not clear how cells decide their own fate in response to different types of stress. We described here that the human TRIM8 protein, a member of the TRIM family, is a new modulator of the p53-mediated tumor suppression mechanism. We showed that under stress conditions, such as UV exposure, p53 induced the expression of TRIM8, which in turn stabilized p53 leading to cell cycle arrest and reduction of cell proliferation through enhancement of CDKN1A (p21) and GADD45 expression. TRIM8 silencing reduced the capacity of p53 to activate genes involved in cell cycle arrest and DNA repair, in response to cellular stress. Concurrently, TRIM8 overexpression induced the degradation of the MDM2 protein, the principal regulator of p53 stability. Co-immunoprecipitation experiments showed that TRIM8 physically interacted with p53, impairing its interaction with MDM2. Altogether, our results reveal a previously unknown regulatory pathway controlling p53 activity and suggest TRIM8 as a novel therapeutic target to enhance p53 tumor suppressor activity.
In some tumours, despite a wild-type p53 gene, the p53 pathway is inactivated by alterations in its regulators or by unknown mechanisms, leading to resistance to cytotoxic therapies. Understanding the mechanisms of functional inactivation of wild-type p53 in these tumours may help to define prospective targets for treating cancer by restoring p53 activity. Recently, we identified TRIM8 as a new p53 modulator, which stabilizes p53 impairing its association with MDM2 and inducing the reduction of cell proliferation. In this paper we demonstrated that TRIM8 deficit dramatically impairs p53-mediated cellular responses to chemotherapeutic drugs and that TRIM8 is down regulated in patients affected by clear cell Renal Cell Carcinoma (ccRCC), an aggressive drug-resistant cancer showing wild-type p53. These results suggest that down regulation of TRIM8 might be an alternative way to suppress p53 activity in RCC. Interestingly, we show that TRIM8 expression recovery in RCC cell lines renders these cells sensitive to chemotherapeutic treatments following p53 pathway re-activation. These findings provide the first mechanistic link between TRIM8 and the drug resistance of ccRCC and suggest more generally that TRIM8 could be used as enhancer of the chemotherapy efficacy in cancers where p53 is wild-type and its pathway is defective.
The advent of massively parallel sequencing technologies (Next Generation Sequencing, NGS) profoundly modified the landscape of human genetics. In particular, Whole Exome Sequencing (WES) is the NGS branch that focuses on the exonic regions of the eukaryotic genomes; exomes are ideal to help us understanding high-penetrance allelic variation and its relationship to phenotype. A complete WES analysis involves several steps which need to be suitably designed and arranged into an efficient pipeline. Managing a NGS analysis pipeline and its huge amount of produced data requires non trivial IT skills and computational power. Results: Our web resource WEP (Whole-Exome sequencing Pipeline web tool) performs a complete WES pipeline and provides easy access through interface to intermediate and final results. The WEP pipeline is composed of several steps: 1) verification of input integrity and quality checks, read trimming and filtering; 2) gapped alignment; 3) BAM conversion, sorting and indexing; 4) duplicates removal; 5) alignment optimization around insertion/deletion (indel) positions; 6) recalibration of quality scores; 7) single nucleotide and deletion/insertion polymorphism (SNP and DIP) variant calling; 8) variant annotation; 9) result storage into custom databases to allow cross-linking and intersections, statistics and much more. In order to overcome the challenge of managing large amount of data and maximize the biological information extracted from them, our tool restricts the number of final results filtering data by customizable thresholds, facilitating the identification of functionally significant variants. Default threshold values are also provided at the analysis computation completion, tuned with the most common literature work published in recent years. Conclusions: Through our tool a user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and a user-friendly web interface for annotated variant visualization. Non-IT mastered users can access through WEP to the most updated and tested WES algorithms, tuned to maximize the quality of called variants while minimizing artifacts and false positives. The web tool is available at the following web address: http://www.caspur.it/wep
Condividi questo sito sui social