Clustering Techniques for Revealing Gene Expression Patterns

Abstract

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Molecular biologists need robust computational tools to determine models that can learn to recognize DNA and amino acid sequences and assign protein structures to certain sequences. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this Chapter we describe the main clustering algorithms developed for analyzing gene expression data, comparing their results with the classification deriving by the application of unsupervised neural networks. In the analysis of gene expression data of particular interest is the search for correlated patterns, which is typically done by clustering analysis. DNA microarray technologies (Lockhart et al., 1996) allow the monitoring of thousand genes quickly and efficiently. These technologies have introduced new rules for the exploration of an organism with a genome wide-ranging vision. In particular, the study of gene expression of a complete genome (such as that of Saccharomyces cerevisiae) is now possible. Studies have also been developed (Perou et al., 1999) through the use of DNA microarrays until the complete mapping of the human genome. The production of targeted drugs and identification of drugs are other areas that can significantly benefit from these techniques. One problem inherent the use of DNA microarray technology is the huge amount of data available, the analysis of which is a significant problem per se. Several approaches are used in the analysis of gene expression data, grouped in two areas: clustering and classification. Clustering is a purely data-driven activity that uses only data from the study or experiment to group together measurements. Classification, in contrast, uses additional data, including heuristics, to assign measurements to groups. Among these, commonly statistical methods applied to microarray data are Hierarchical Clustering (Sneath & Sokal, 1973) and (Unsupervised) Neural Networks (Herrero et al., 2001): The identification of the optimal method for the analysis of these data is still a topic of discussion. In this Chapter we examine some methods for gene co-expression analysis, such as "correlation graphs" and supervised-unsupervised clustering methods. The next section is a brief exposition of the underlying background of clustering techniques. Then we detail the clustering algorithm based on correlation graphs. Next we examine the application of supervised and unsupervised techniques. The Chapter ends with some final considerations and further research directions.


Autore Pugliese

Tutti gli autori

  • C. Gallo , V. Capozzi

Titolo volume/Rivista

Non Disponibile


Anno di pubblicazione

2014

ISSN

Non Disponibile

ISBN

Non Disponibile


Numero di citazioni Wos

Nessuna citazione

Ultimo Aggiornamento Citazioni

Non Disponibile


Numero di citazioni Scopus

Non Disponibile

Ultimo Aggiornamento Citazioni

Non Disponibile


Settori ERC

Non Disponibile

Codici ASJC

Non Disponibile