Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Chika, Nwagwu Honour
Languages: English
Types: Doctoral thesis
Semantic and traditional databases are vulnerable to Inconsistent or Incomplete Data (IID). A data set stored in a traditional or semantic database is queried to retrieve record(s) in a tabular format. Such retrieved records can consist of many rows where each row contains an object and the associated fields (columns). However, a large set of records retrieved from a noisy data set may be wrongly analysed. For example, a data analyst may ascribe inconsistent data as consistent or incomplete data as complete where he did not identify the inconsistency or incompleteness in the data. Analysis on a large set of data can be undermined by the presence of IID in that data set. Reliance as a result is placed on the data analyst to identify and visualise the IID in the data set.\ud The IID issues are heightened in open world assumptions as evident in semantic or Resource Description Framework (RDF) databases. Unlike the closed world assumption in traditional databases where data are assumed to be complete with its own issues, in the open world assumption the data might be assumed to be unknown and IID has to be tolerated at the outset. Formal Concept Analysis (FCA) can be used to deal with IID in such databases. That is because FCA is a mathematical method that uses a lattice structure to reveal the associations among objects and attributes in a data set.\ud The existing FCA approaches that can be used in dealing with IID in RDF databases include fault tolerance, Dau's approach, and CUBIST approaches. The new FCA approaches include association rules, semi-automated and automated methods in FcaBedrock. These new FCA approaches were developed in the course of this study. To underpin this work, a series of empirical studies were carried out based on the single case study methodology. The case study, namely the Edinburgh Mouse Atlas Gene Expression Database (EMAGE) provided the real-life context according to that methodology. The existing and the new FCA approaches were used in identifying and visualising the IID in the EMAGE RDF data set.\ud The empirical studies revealed that the existing approaches used in dealing with IID in EMAGE are tedious and do not allow the IID to be easily visualised in the database. It also revealed that existing FCA approaches for dealing with IID do not exclusively visualise the IID in a data set. This is unlike the new FCA approaches, notably the semi-automated and automated FcaBedrock that can separate out and thus exclusively visualise IID in objects associated with the many value attributes that characterise such data sets. The exclusive visualisation of IID in a data set enables the data analyst to identify holistically the IID in his or her investigated data set thereby avoiding mistaken conclusions.\ud The aim was to discover how effective each FCA approach is in identifying and visualising IID, answering the research question: "How can FCA tools and techniques be used in identifying and visualising IID in RDF data?" The automated FcaBedrock approach emerged to be the best means for visually identifying IID in an RDF data set. The CUBIST approaches and the semi-automated approach were ranked as 2nd and 3rd, respectively, whilst Dau's approach ranked as 4th. Whilst the subject of IID in a semantic technology setting could be explored further, it can be concluded that the automated FcaBedrock approach best identifies and visualises the IID in an RDF thus semantic data set.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • Abele, L., Legat, C., Grimm, S., and Muller, A. W. (2013, July). Ontology-based validation of plant models. In Industrial Informatics (INDIN), 2013 11th IEEE International Conference on (pp. 236-241). IEEE.
    • Andrews, S. (2011). In-close2, a high performance formal concept miner. In Proceedings of the 19th international conference on Conceptual structures for discovering knowledge, ICCS'11, pages 50{62, Berlin, Heidelberg, Springer-Verlag.
    • Andrews, S., and McLeod, K. (2011). Gene co-expression in mouse embryo tissues. In F. Dau (Ed.), Proceedings of the 1st CUBIST (Combining and Uniting Business Intelligence with Semantic Technologies) Workshop 2011. (pp. 1-10). Dresden.
    • Andrews, S., and McLeod, K. (2013). Gene co-expression in mouse embryo tissues.
    • International Journal of Intelligent Information Technologies (IJIIT), 9(4), 55-68.
    • Andrews, S., and Orphanides, C. (2010). Analysis of Large Data Sets using Formal Concept Lattices. In CLA (pp. 104-115).
    • Onwuegbuzie, A. J., & Leech, N. L. (2007). Validity and qualitative research: An oxymoron?. Quality & Quantity, 41(2), 233-249.
    • Pensa, R. G., & Boulicaut, J. F. (2005a). Towards fault-tolerant formal concept analysis. In AI* IA 2005: Advances in Artificial Intelligence (pp. 212-223). Springer Berlin Heidelberg.
  • No related research data.
  • No similar publications.
  • BioEntity Site Name

Share - Bookmark

Cite this article