Evangelia Komisopoulou

Clustering Methods For Accurate Background/Foreground Estimation in cDNA Microarray Images

Date: 07/12/2004

ABSTRACT

DNA microarray technologies allow monitoring the expression of tens of thousands of genes simultaneously. Their applications range from the study of gene expression in yeast under different environmental stress conditions, to the comparison of gene expression profiles of tumors in cancer patients. Image analysis is needed to extract and quantitatively characterize the relative abundance of mRNA in microarray images. Mistakes made on estimating the expression levels of individual genes propagate and may seriously compromise the results of subsequent processing steps which attempt to cluster genes according to observed similarities in expression patterns.

Microarray images challenge existing analytical methods in many ways since the gene spots quite often exhibit serious imperfections, such as irregular contours, "donut shape", artifacts,low or heterogeneous intensities. These problems that are inherent in the microarray experiment make the assignment of a single value representing the foreground and background intensity of a gene spot a very challenging and error prone task. New approaches are needed to ensure the high accuracy of gene expression data. The main goal of this study was to investigate the potential and limitations of simple unsupervised clustering methods for microarray spot background and foreground intensities estimation. A new method is proposed which processes each channel (red, green) indipendentlu and first applies 1D K-Means clustering. Then based on the relation of extracted class statistics it separates the spots into three different categories. Finally, depending on the category, (which is related to the presence or not and the size of an artifact) the most appropriate clustering algorithm (1D K-Means or 1D EM) and statistic (mode, median) is selected to provide trustworthy background and foreground intensity estimates for each spot. We have tested the validity of our method's results by comparing them to the subjective ``ground truth'' and to estimates obtained using a public domain software package, ScanAlyze, written by Michael Eisen for a large number of microarrays with spots of varying quality.

Thesis Committee:
Prof. D. Brooks,
Prof. J. Dy,
Prof E.S. Manolakos (thesis advisor)