Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Pettinger, David; Di Fatta, Giuseppe (2009)
Languages: English
Types: Unknown
Clustering is defined as the grouping of similar items in a set, and is an important process within the field of data mining. As the amount of data for various applications continues to increase, in terms of its size and dimensionality, it is necessary to have efficient clustering methods. A popular clustering algorithm is K-Means, which adopts a greedy approach to produce a set of K-clusters with associated centres of mass, and uses a squared error distortion measure to determine convergence. Methods for improving the efficiency of K-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting a more efficient data structure, notably a multi-dimensional binary search tree (KD-Tree) to store either centroids or data points. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient K-Means techniques in parallel computational environments. In this work, we provide a parallel formulation for the KD-Tree based K-Means algorithm and address its load balancing issues.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264-323, 1999.
    • [2] M. Baker, B. Carpenter, and A. Shafi. MPJ Express: Towards thread safe Java HPC. Proceedings of the IEEE International Conference on Cluster Computing, 2006.
    • [3] J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509-517, 1975.
    • [4] P. S. Bradley and U. M. Fayyad. Refining initial points for kmeans clustering. Proceedings of Fifteenth Intl. Conference on Machine Learning, pages 91-99, 1998.
    • [5] D. Foti, D. Lipari, C. Pizzuti, and D. Talia. Scalable parallel clustering for data mining on multicomputers. Proceedings of the 15th IPDPS 2000 Workshops on Parallel and Distributed Processing, pages 390-398, 2000.
    • [6] D. Judd, P. K. McKinley, and A. K. Jain. Large-scale parallel data clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8):871-876, Aug. 1998.
    • [7] I. S. Dhillon and D. S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In LargeScale Parallel Data Mining, Lecture Notes in Computer Science, 1759:245-260, Mar. 2000.
    • [8] J.B. Kruskal and M. Wish. Multidimensional Scaling. Sage University Paper series on Quantitative Applications in the Social Sciences, Beverly Hills and London: Sage Publications, 07-011, 1978.
    • [9] K. Alsabti, S. Ranka, and V. Singh. An efficient k-means clustering algorithm. Proceedings of First Workshop on High Performance Data Mining, 1998.
    • [10] J. B. MacQueen. Some methods for classification and analysis of multivariate observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability.
    • [11] A. W. Moore. Efficient memory based learning for robot control. PhD thesis, 1991.
    • [12] D. Pelleg and A. Moore. Accelerating exact k-means algorithms with geometric reasoning. Proceedings of Fifth ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining, pages 277-281, July 1999.
    • [13] D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. Proceedings of the Seventeenth Intl. Conference on Machine Learning, pages 727-734, 2000.
    • [14] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):881- 892, July 2002.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article