Effettua una ricerca
Monica Santamaria
Ruolo
III livello - Ricercatore
Organizzazione
Consiglio Nazionale delle Ricerche
Dipartimento
Non Disponibile
Area Scientifica
AREA 05 - Scienze biologiche
Settore Scientifico Disciplinare
BIO/11 - Biologia Molecolare
Settore ERC 1° livello
LS - LIFE SCIENCES
Settore ERC 2° livello
LS2 Genetics, Genomics, Bioinformatics and Systems Biology: Molecular and population genetics, genomics, transcriptomics, proteomics, metabolomics, bioinformatics, computational biology, biostati stics, biological modelling and simulation, systems biology, genetic epidemiology
Settore ERC 3° livello
LS2_10 Bioinformatics
Substantial advances in microbiology, molecular evolution and biodiversity have been carried out in recent years thanks to Metagenomics, which allows to unveil the composition and functions of mixed microbial communities in any environmental niche. If the investigation is aimed only at the microbiome taxonomic structure, a target-based metagenomic approach, here also referred as Meta-barcoding, is generally applied. This approach commonly involves the selective amplification of a species-specific genetic marker (DNA meta-barcode) in the whole taxonomic range of interest and the exploration of its taxon-related variants through High-Throughput Sequencing (HTS) technologies. The accessibility to proper computational systems for the large-scale bioinformatic analysis of HTS data represents, currently, one of the major challenges in advanced Meta-barcoding projects.RESULTS:BioMaS (Bioinformatic analysis of Metagenomic AmpliconS) is a new bioinformatic pipeline designed to support biomolecular researchers involved in taxonomic studies of environmental microbial communities by a completely automated workflow, comprehensive of all the fundamental steps, from raw sequence data upload and cleaning to final taxonomic identification, that are absolutely required in an appropriately designed Meta-barcoding HTS-based experiment. In its current version, BioMaS allows the analysis of both bacterial and fungal environments starting directly from the raw sequencing data from either Roche 454 or Illumina HTS platforms, following two alternative paths, respectively. BioMaS is implemented into a public web service available at https://recasgateway.ba.infn.it/ and is also available in Galaxy at http://galaxy.cloud.ba.infn.it:8080 (only for Illumina data).CONCLUSION:BioMaS is a friendly pipeline for Meta-barcoding HTS data analysis specifically designed for users without particular computing skills. A comparative benchmark, carried out by using a simulated dataset suitably designed to broadly represent the currently known bacterial and fungal world, showed that BioMaS outperforms QIIME and MOTHUR in terms of extent and accuracy of deep taxonomic sequence assignments.
Making forecasts about biodiversity and giving support to policy relies increasingly on large collections of data held electronically, and on substantial computational capability and capacity to analyse, model, simulate and predict using such data. However, the physically distributed nature of data resources and of expertise in advanced analytical tools creates many challenges for the modern scientist. Across the wider biological sciences, presenting such capabilities on the Internet (as ``Web services'') and using scientific workflow systems to compose them for particular tasks is a practical way to carry out robust ``in silico'' science. However, use of this approach in biodiversity science and ecology has thus far been quite limited.
Much biodiversity data is collected worldwide, but it remains challenging to assemble the scattered knowledge for assessing biodiversity status and trends. The concept of Essential Biodiversity Variables (EBVs) was introduced to structure biodiversity monitoring globally, and to harmonize and standardize biodiversity data from disparate sources to capture a minimum set of critical variables required to study, report and manage biodiversity change. Here, we assess the challenges of a 'Big Data' approach to building global EBV data products across taxa and spatiotemporal scales, focusing on species distribution and abundance. The majority of currently available data on species distributions derives from incidentally reported observations or from surveys where presence-only or presence-absence data are sampled repeatedly with standardized protocols. Most abundance data come from opportunistic population counts or from population time series using standardized protocols (e.g. repeated surveys of the same population from single or multiple sites). Enormous complexity exists in integrating these heterogeneous, multi-source data sets across space, time, taxa and different sampling methods. Integration of such data into global EBV data products requires correcting biases introduced by imperfect detection and varying sampling effort, dealing with different spatial resolution and extents, harmonizing measurement units from different data sources or sampling methods, applying statistical tools and models for spatial inter- or extrapolation, and quantifying sources of uncertainty and errors in data and models. To support the development of EBVs by the Group on Earth Observations Biodiversity Observation Network (GEO BON), we identify 11 key workflow steps that will operationalize the process of building EBV data products within and across research infrastructures worldwide. These workflow steps take multiple sequential activities into account, including identification and aggregation of various raw data sources, data quality control, taxonomic name matching and statistical modelling of integrated data. We illustrate these steps with concrete examples from existing citizen science and professional monitoring projects, including eBird, the Tropical Ecology Assessment and Monitoring network, the Living Planet Index and the Baltic Sea zooplankton monitoring. The identified workflow steps are applicable to both terrestrial and aquatic systems and a broad range of spatial, temporal and taxonomic scales. They depend on clear, findable and accessible metadata, and we provide an overview of current data and metadata standards. Several challenges remain to be solved for building global EBV data products: (i) developing tools and models for combining heterogeneous, multi-source data sets and filling data gaps in geographic, temporal and taxonomic coverage, (ii) integrating emerging methods and technologies for data collection such as citizen science, sensor networks, D
Currently, there is very little information available regarding the microbiome associated with the wine production chain. Here, we used an amplicon sequencing approach based on high-throughput sequencing (HTS) to obtain a comprehensive assessment of the bacterial community associated with the production of three Apulian red wines, from grape to final product. The relationships among grape variety, the microbial community, and fermentation was investigated. Moreover, the winery microbiota was evaluated compared to the autochthonous species in vineyards that persist until the end of the winemaking process. The analysis highlighted the remarkable dynamics within the microbial communities during fermentation. A common microbial core shared among the examined wine varieties was observed, and the unique taxonomic signature of each wine appellation was revealed. New species belonging to the genus Halomonas were also reported. This study demonstrates the potential of this metagenomic approach, supported by optimized protocols, for identifying the biodiversity of the wine supply chain. The developed experimental pipeline offers new prospects for other research fields in which a comprehensive view of microbial community complexity and dynamics is desirable.
Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called "Gene" and "Genome," accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user's interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/.
A holistic understanding of environmental communities is the new challenge of metagenomics. Accordingly, the amplicon-based or metabarcoding approach, largely applied to investigate bacterial microbiomes, is moving to the eukaryotic world too. Indeed, the analysis of metabarcoding data may provide a comprehensive assessment of both bacterial and eukaryotic composition in a variety of environments, including human body. In this respect, whereas hypervariable regions of the 16S rRNA are the de facto standard barcode for bacteria, the Internal Transcribed Spacer 1 (ITS1) of ribosomal RNA gene cluster has shown a high potential in discriminating eukaryotes at deep taxonomic levels. As metabarcoding data analysis rely on the availability of a well-curated barcode reference resource, a comprehensive collection of ITS1 sequences supplied with robust taxonomies, is highly needed. To address this issue, we created ITSoneDB (available at http://itsonedb.cloud.ba.infn.it/) which in its current version hosts 985 240 ITS1 sequences spanning over 134 000 eukaryotic species. Each ITS1 is mapped on the NCBI reference taxonomy with its start and end positions precisely annotated. ITSoneDB has been developed in agreement to the FAIR guidelines by enabling the users to query and download its content through a simple web-interface and access relevant metadata by cross-linking to European Nucleotide Archive.
Motivations. Metagenomics is experiencing an explosive improvement from the advent of high-throughput next-generation sequencing (NGS) technologies which allows an unprecedented large-scale identification of microorganisms living in almost every environment. In particular, the use of amplicon-based metagenomic approach to explore the diversity of fungal environmental communities is increasingly expanding. At the species level, a number of studies have used the non-conserved internal transcribed spacers (ITS) 1 and 2 of the ribosomal RNA genes cluster as genetic markers to explore the fungal taxonomic diversity. Particularly, ITS1 is gaining an increasing popularity as better discriminating species marker in Fungi because of its higher variability compared to ITS2. Starting from the total DNA extracted from any environmental sample, this locus can be easily amplified with taxonomically universal primers and sequenced by means of high-throughput next generation platforms. Reference databases and robust supporting taxonomies are crucial in assigning phylogenetic affiliation to the huge amount of produced sequences. Even if a large number of ITS1 sequences are collected in public databases, a specialized resource focused particularly on this region, where sequences identity, boundaries and taxonomic assignment are validated, is still needed at present. In this work we present ITSoneDB, a new comprehensive collection of ITS1 sequences belonging to Fungi Kingdom.Methods. ITSoneDB has been generated and populated using a multi-step Python workflow. In the first step the ribosomal RNA gene cluster sequences of Fungi including the target ITS1 region were retrieved from Genbank. Then, ITS1 start and end boundaries were extracted from the Features Tables annotations, if available. In order to infer, validate and, eventually, redesign the ITS1 location, Hidden Markov Model (HMM) profiles of flanking genes for 18S and 5.8S ribosomal RNA, generated from their reference alignments stored in RFAM database, were mapped on the entire collection of retrieved nucleotide sequences, by means of the hmmsearch tool from HMMER 3.0 package.Results. At present, ITSoneDB includes 405,433 taxonomically arranged sequence entries provided with ITS1 both start and end positions defined by GenBank annotations and/or HMM based method. ITSoneDB front-end is a JAVA platform-based website for data browsing and downloading. The database can be queried by species or taxon name, GenBank accession ID or by "expanding" the target rank on a detailed fungal taxonomical tree. The complete ITS1 sequences dataset collected in ITSoneDB is available in Fasta format and the users can extract and locally save all or selected queried ITS1 sequences for further analysis.
The Summary: Shotgun metagenomics by high-throughput sequencing may allow deep and accurate characterization of host-associated total microbiomes, including bacteria, viruses, protists and fungi. However, the analysis of such sequencing data is still extremely challenging in terms of both overall accuracy and computational efficiency, and current methodologies show substantial variability in misclassification rate and resolution at lower taxonomic ranks or are limited to specific life domains (e.g. only bacteria). We present here MetaShot, a workflow for assessing the total microbiome composition from host-associated shotgun sequence data, and show its overall optimal accuracy performance by analyzing both simulated and real datasets.
Metagenomics is providing an unprecedented access to the environmental microbial diversity. The amplicon-basedmetagenomics approach involves the PCR-targeted sequencing of a genetic locus fitting different features. Namely,it must be ubiquitous in the taxonomic range of interest, variable enough to discriminate between different speciesbut flanked by highly conserved sequences, and of suitable size to be sequenced through next-generation platforms.The internal transcribed spacers 1 and 2 (ITS1 and ITS2) of the ribosomal DNA operon and one or morehyper-variable regions of 16S ribosomal RNA gene are typically used to identify fungal and bacterial species, respectively.In this context, reliable reference databases and taxonomies are crucial to assign amplicon sequence reads tothe correct phylogenetic ranks. Several resources provide consistent phylogenetic classification of publicly available16S ribosomal DNA sequences, whereas the state of ribosomal internal transcribed spacers reference databases isnotably less advanced. In this review, we aim to give an overview of existing reference resources for both types ofmarkers, highlighting strengths and possible shortcomings of their use for metagenomics purposes. Moreover, wepresent a new database, ITSoneDB, of well annotated and phylogenetically classified ITS1 sequences to be used asa reference collection in metagenomic studies of environmental fungal communities. ITSoneDB is available for downloadand browsing at http://itsonedb.ba.itb.cnr.it/.
Mitochondrial DNA (mtDNA) mutations have been involved in disease, aging and cancer and furthermore exploited for evolutionary and forensic investigation. When investigating mtDNA mutations the peculiar aspects of mitochondrial genetics, such as heteroplasmy and threshold effect, require suitable approaches which must be sensitive enough to detect low-level heteroplasmy and, precise enough to quantify the exact mutational load. In order to establish the optimal approach for the evaluation of heteroplasmy, six methods were experimentally compared for their capacity to reveal and quantify mtDNA variants. Drawbacks and advantages of cloning, Fluorescent PCR (F-PCR), denaturing High Performance Liquid Chromatography (dHPLC), quantitative Real-Time PCR (qRTPCR), High Resolution Melting (HRM) and 454 pyrosequencing were determined. In particular, detection and quantification of a mutation in a difficult sequence context were investigated, through analysis of an insertion in a homopolymeric stretch (m.3571insC). (C) 2011 Elsevier Inc. All rights reserved.
The rapid expansion of multicellular native and alien species outbreaks in aquatic and terrestrial ecosystems (bioinvasions) may produce significant impacts on bacterial community dynamics and nutrient pathways with major ecological implications. In aquatic ecosystems, bioinvasions may cause adverse effects on the water quality resulting from changes in biological, chemical and physical properties linked to significant transformations of the microbial taxonomic and functional diversity. Here we used an effective and highly sensitive experimental strategy, bypassing the efficiency bottleneck of the traditional bacterial isolation and culturing method, to identify changes of the planktonic microbial community inhabiting a marine coastal lagoon (Varano, Adriatic Sea) under the influence of an outbreak-forming alien jellyfish species. Water samples were collected from two areas that differed in their level of confinement inside in the lagoon and jellyfish densities (W, up to 12.4 medusae m(-3); E, up to 0.03 medusae m(-3)) to conduct a snapshot microbiome analysis by a metagenomic approach. After extraction of the genetic material in the environmental water samples, we deep-sequenced metagenomic amplicons of the V5-V6 region of the 16S rRNA bacterial gene by an Illumina MiSeq platform. Experiments were carried out in triplicates, so six libraries of dual indexed amplicons of 420 bp were successfully sequenced on the MiSeq platform using a 2 x 250 bp paired-end sequencing strategy. Approximately 7.5 million paired-end reads (i.e. 15 million total reads) were generated, with an average of 2.5 million reads (1.25 M pairs) per sample replicate. The sequence data, analyzed through a novel bioinformatics pipeline (BioMaS), showed that the structure of the resident bacterial community was significantly affected by the occurrence of jellyfish outbreaks. Clear qualitative and quantitative differences were found between the western and eastern areas (characterized by many or few jellyfish), with 84 families, 153 genera and 324 species in the W samples, and 104 families, 199 genera and 331 species in the E samples. Significant differences between the two sampling areas were particularly detected in the occurrence of 16 families, 22 genera and 61 species of microbial taxa. This is the first time that a NGS platform has been used to screen the impact of jellyfish bioinvasions on the aquatic microbiome, providing a preliminary assessment of jellyfish-driven changes of the functional and structural microbial biodiversity.
Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described in a harmonized way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (i) material sampling, (ii) material sequencing, (iii) data analysis, and (iv) data archiving and publishing. Taking examples from marine research, we summarize essential variables used to describe material sampling processes and sequencing procedures in a metagenomics experiment. These aspects of metagenomics dataset generation have been to some extent addressed by the scientific community, but greater awareness and adoption is still needed. We emphasize the lack of standards relating to reporting how metagenomics datasets are analysed and how the metagenomics data analysis outputs should be archived and published. We propose best practice as a foundation for a community standard to enable reproducibility and better sharing of metagenomics datasets, leading ultimately to greater metagenomics data reuse and repurposing.
Essential biodiversity variables (EBVs) have been proposed by the Group on Earth Observations Biodiversity Observation Network (GEO BON) to identify a minimum set of essential measurements that are required for studying, monitoring and reporting biodiversity and ecosystem change. Despite the initial conceptualisation, however, the practical implementation of EBVs remains challenging. There is much discussion about the concept and implementation of EBVs: which variables are meaningful; which data are needed and available; at which spatial, temporal and topical scales can EBVs be calculated; and how sensitive are EBVs to variations in underlying data? To advance scientific progress in implementing EBVs we propose that both scientists and research infrastructure operators need to cooperate globally to serve and process the essential large datasets for calculating EBVs. We introduce GLOBIS-B (GLOBal Infrastructures for Supporting Biodiversity research), a global cooperation funded by the Horizon 2020 research and innovation framework programme of the European Commission. The main aim of GLOBIS-B is to bring together biodiversity scientists, global research infrastructure operators and legal interoperability experts to identify the research needs and infrastructure services underpinning the concept of EBVs. The project will facilitate the multi-lateral cooperation of biodiversity research infrastructures worldwide and identify the required primary data, analysis tools, methodologies and legal and technical bottlenecks to develop an agenda for research and infrastructure development to compute EBVs. This requires development of standards, protocols and workflows that are 'self-documenting' and openly shared to allow the discovery and analysis of data across large spatial extents and different temporal resolutions. The interoperability of existing biodiversity research infrastructures will be crucial for integrating the necessary biodiversity data to calculate EBVs, and to advance our ability to assess progress towards the Aichi targets for 2020 of the Convention on Biological Diversity (CBD).
Condividi questo sito sui social