College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
中文摘要
In traditional data clustering, similarity of a cluster of objects is measured by distance between objects. Such measures are not appropriate for categorical data. A new clustering criterion to determine the similarity between points with categorical attributes is presented. Furthermore, a new clustering algorithm for categorical attributes is addressed. A single scan of the dataset yields a good clustering, and more additional passes can be used to improve the quality further.
In traditional data clustering, similarity of a cluster of objects is measured by distance between objects. Such measures are not appropriate for categorical data. A new clustering criterion to determine the similarity between points with categorical attributes is presented. Furthermore, a new clustering algorithm for categorical attributes is addressed. A single scan of the dataset yields a good clustering, and more additional passes can be used to improve the quality further.