Computer Science Faculty Publications


Evaluation of Keyword Selection on Gene Clustering in Biomedical Literature Mining

Document Type

Conference Proceeding


Conference proceeding from the Fifth IASTED Conference on Computational Intelligence, August 2010, pp. 119-124.

We describe two statistical metrics, Z-score and a variant of the familiar TF-IDF, which are appropriate for identifying keywords associated with genes by mining a collection of MEDLINE® abstracts. We describe experiments in clustering genes based on the identified keyword features that different genes share with each other. The quality of clustering is measured by comparing the clusters generated by a clustering algorithm against expert-defined clusters. We evaluate the quality of clustering based on keyword features identified by the two different metrics, as well as combinations of the keywords derived from the metrics. We present these results and our analysis.

Publication Date


Publication Title

Fifth IASTED Conference on Computational Intelligence


ACTA Press

This document is currently not available here.