Evaluation of Keyword Selection on Gene Clustering in Biomedical Literature Mining
Conference proceeding from the Fifth IASTED Conference on Computational Intelligence, August 2010, pp. 119-124.
We describe two statistical metrics, Z-score and a variant of the familiar TF-IDF, which are appropriate for identifying keywords associated with genes by mining a collection of MEDLINE® abstracts. We describe experiments in clustering genes based on the identified keyword features that different genes share with each other. The quality of clustering is measured by comparing the clusters generated by a clustering algorithm against expert-defined clusters. We evaluate the quality of clustering based on keyword features identified by the two different metrics, as well as combinations of the keywords derived from the metrics. We present these results and our analysis.
Dasigi, Venu G.; Karam, O.; and Pydimarri, S., "Evaluation of Keyword Selection on Gene Clustering in Biomedical Literature Mining" (2010). Computer Science Faculty Publications. 15.
Fifth IASTED Conference on Computational Intelligence