Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Hassan, Diman; Aickelin, Uwe; Wagner, Christian (2014)
Languages: English
Types: Unknown
Subjects: Computer Science - Computational Engineering, Finance, and Science, Computer Science - Databases
Distance metrics are broadly used in different research areas and applications, such as bio-informatics, data mining and many other fields. However, there are some metrics, like pq-gram and Edit Distance used specifically for data with a hierarchical structure. Other metrics used for non-hierarchical data are the geometric and Hamming metrics. We have applied these metrics to The Health Improvement Network (THIN) database which has some hierarchical data. The THIN data has to be converted into a tree-like structure for the first group of metrics. For the second group of metrics, the data are converted into a frequency table or matrix, then for all metrics, all distances are found and normalised. Based on this particular data set, our research question: which of these metrics is useful for THIN data? This paper compares the metrics, particularly the pq-gram metric on finding the similarities of patients' data. It also investigates the similar patients who have the same close distances as well as the metrics suitability for clustering the whole patient population. Our results show that the two groups of metrics perform differently as they represent different structures of the data. Nevertheless, all the metrics could represent some similar data of patients as well as discriminate sufficiently well in clustering the patient population using $k$-means clustering algorithm.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] N. Augsten, M. Bo¨hlen, and J. Gamper, “The pq-gram distance between ordered labeled trees,” ACM Transactions on Database Systems (TODS), vol. 35, no. 1, p. 4, 2010.
    • [2] K. Kailing, H.-P. Kriegel, and S. Scho¨nauer, “Content-based image retrieval using multiple representations,” in Knowledge-Based Intelligent Information and Engineering Systems. Springer, 2004, pp. 982-988.
    • [3] K. Zhang and D. Shasha, “Simple fast algorithms for the editing distance between trees and related problems,” SIAM journal on computing, vol. 18, no. 6, pp. 1245-1262, 1989.
    • [4] R. Cordeiro de Amorim and B. Mirkin, “Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering,” Pattern Recognition, vol. 45, no. 3, pp. 1061-1075, 2012.
    • [5] R. Shahid, S. Bertazzon, M. L. Knudtson, and W. A. Ghali, “Comparison of distance measures in spatial analytical modeling for health service planning,” BMC health services research, vol. 9, no. 1, p. 200, 2009.
    • [6] J. Reps, J. M. Garibaldi, U. Aickelin, D. Soria, J. E. Gibson, and R. B. Hubbard, “Discovering sequential patterns in a uk general practice database,” in IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2012, pp. 960-963.
    • [7] J. Reps, J. Feyereisl, J. M. Garibaldi, U. Aickelin, J. E. Gibson, and R. B. Hubbard, “Investigating the detection of adverse drug events in a uk general practice electronic health-care database,” UKCI, the 11th Annual Workshop on Computational Intelligence, Manchester, 2011.
    • [8] A. K. Jain and R. C. Dubes, Algorithms for clustering data. PrenticeHall, Inc., 1988.
    • [9] J. F. Committee and R. P. S. of Great Britain, British national formulary (bnf). Pharmaceutical Press, 2012, vol. 64.
    • [10] J. C. Oxtoby, “Metric and topological spaces,” in Measure and Category. Springer, 1971, pp. 39-41.
    • [11] T. Ko¨ rner, “Metric and topological spaces,” 2010.
    • [12] J. C. Gower, “Euclidean distance geometry,” Mathematical Scientist, vol. 7, no. 1, pp. 1-14, 1982.
    • [13] S. Hosangadi, “Distance measures for sequences,” arXiv preprint arXiv:1208.5713, 2012.
    • [14] N. Srivastava, V. Mishra, and A. Bhattacharya, “Analyzing the sensitivity of pq-gram distance with p and q,” ACM, 2010.
    • [15] M. Maechler, Cluster Analysis Extended Rousseeuw et al., R CRAN, 2013.
  • No related research data.
  • Discovered through pilot similarity algorithms. Send us your feedback.

Share - Bookmark

Cite this article