MCM-Based Clustering for Time-Course Gene Expression Data

MAIN

©1996-2009 All Rights Reserved. Online Journal of Bioinformatics. You may not store these pages in any form except for your own personal use. All other usage or distribution is illegal under international copyright treaties. Permission to use any of these pages in any other way besides the before mentioned must be gained in writing from the publisher. This article is exclusively copyrighted in its entirety to OJB publications. This article may be copied once but may not be reproduced or re-transmitted without the express permission of the editors.

OJB©

Online Journal of Bioinformatics©

Volume 5 : 102-128, 2004

MCM-Based Clustering for Time-Course Gene Expression Data

Wu FX, Zhang WJ, Kusalik AJ

Division of Biomedical Engineering, Department of Computer Science, University of Saskatchewan, Saskatoon, Canada

ABSTRACT

Wu FX, Zhang WJ, Kusalik AJ MCM-based clustering for time-course gene expression data. Online J Bioinformatics 5: 102-128, 2004. Time-course gene expression data contains important information at the molecular level about underlying biological processes. A huge body of such data has been and will continuously be produced by microarray experiments. The challenge now is how to mine such data and to obtain the useful information from them. Cluster analysis has played an important role in analyzing time-course gene expression data and has been proven useful. However, most clustering techniques have not considered the inherent time dependence (dynamics) of time-course gene expression data. Accounting for the inherent dynamics of such data in cluster analysis should lead to high quality clustering. This paper proposes a model-based clustering method, called MCM-based clustering method, for time-course gene expression data. The proposed method uses Markov chain models (MCMs) to account for the inherent dynamics. It is assumed that genes in the same cluster were generated by the same MCM. For the given number of clusters, the proposed method finds cluster models using EM algorithm and an assignment of genes to these models that maximizes their posterior probabilities. Using Bayesian Information Criterion (BIC) for model selection, the proposed method may automatically give the number of clusters in a dataset. Further, this study employs the adjusted Rand index (AR1) to evaluate the quality of clustering. The performance of the proposed method is demonstrated by comparing to the k-means method with a synthetic and a real-life time-course gene expression dataset. The results indicate that MCM-based clustering method can be a useful tool to cluster time course gene expression data and can obtain higher quality clustering than other methods (e.g. the k-means method).

KEYWORDS: MCM-Clustering, Time-Course, Gene expression

MAIN

Full-Text (Subscription)