Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Danilova, N.; Stupples, D. (2012)
Publisher: IEEE
Languages: English
Types: Article
Subjects: QA75
The quality of decisions made in business and government relates directly to the quality of the information used to formulate the decision. This information may be retrieved from an organization's knowledge base (Intranet) or from the World Wide Web. Intelligence services Intranet held information can be efficiently manipulated by technologies based upon either semantics such as ontologies, or statistics such as meaning-based computing. These technologies require complex processing of large amount of textual information. However, they cannot currently be effectively applied to Web-based search due to various obstacles, such as lack of semantic tagging. A new approach proposed in this paper supports Web-based search for intelligence information utilizing evidence-based natural language processing (NLP). This approach combines traditional NLP methods for filtering of Web-search results, Grounded Theory to test the completeness of the evidence, and Evidential Analysis to test the quality of gathered information. The enriched information derived from the Web-search will be transferred to the intelligence services knowledge base for handling by an effective Intranet search system thus increasing substantially the information for intelligence analysis. The paper will show that the quality of retrieved information is significantly enhanced by the discovery of previously unknown facts derived from known facts.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [1] Berners-Lee, T., “The Semantic Web”. Scientific American, May 1, 2001.
    • [2] Rumsfeld, D., News transcript: DoD news briefing. Washington D.C.: U.S.Department of Defence,2002.
    • [3] Pugh, W., & Henzinger, M. (2001). Patent No. 768947. USA.
    • [4] Gomes, B., & Smith, B. (2000). Patent No. 684542. USA
    • [5] Autonomy. (2009, September 29). Autonomy Technology Overview. Retrieved 01 06, 2012, from Autonomy: http://publications.autonomy.com/pdfs/Power/White%20Papers/Autono my%20Technology/20090928_PI_WP_TechOverview_web.pdf
    • [6] Zhou, B., Xiong, Y., & Liu, W., “Efficient Web-page main text extraction towards online news analysis”. IEEE International Conference on e-Business Engineering, 2009 (ICEBE '09), (pp. 37 - 41).
    • [7] Adam, G., Bouras, C., & Poulopoulos, V., “CUTER: An efficient useful text extraction mechanism”. Advanced Information Networking and Applications Workshops (WAINA), 2009, pp. 703-708. Institute of Electrical and Electronics Engineers ( IEEE ).
    • [8] Hu, G., & Zhao, Q., “Study to eliminating noisy information in Webpages based on data mining”. Sixth International Conference on Natural Computation (ICNC 2010), Volume 2, pp. 660 - 663.
    • [9] Fu, L., Meng, Y., Xia, Y., & Yu, H., “Web-content extraction based on Web-page layout analysis”. Second International Conference on Information Technology and Computer Science (ITCS 2010), Ukraine, pp. 40 - 43.
    • [10] Yi, L., Liu, B., & Li, X., “Eliminating noisy information in Web-pages for data mining”. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2009, New York, NY, USA: ACM, pp. 296 - 305.
    • [11] van Rijsbergen, C. J., “Information retrieval”. London: Butterworths, 1979.
    • [12] Luhn, H., “The automatic creation of literature abstracts”. IBM Journal, 1958, pp. 159 - 165.
    • [13] Fox, C. J., “A stop list for general text”. ACM Special Interest Group on Information Retrieval Forum 24, 1990, pp. 19 - 35.
    • [14] Hirst, G. & Mohammad, S., “Measuring semantic distance, using distributional profiles of concepts”. New York: Association for Computational Linguistics, 2006.
    • [15] Fellbaum, C., “WordNet: an electronic lexical database”. Cambridge, MA, USA: The MIT Press, 1998.
    • [16] Rubenstein, H., & Goodenough, J., “Contextual correlates of synonymy”. Communications of the ACM , 8 (10), October, 1965, pp. 627 - 633.
    • [17] Corley, C., & Mihalcea, R., “Measuring the semantic similarity of texts”. EMSEE '05 Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, 2005, pp. 13 - 18. Stroudsburg, PA, USA: Association for Computational Linguistics
    • [18] Martin, P., & Turner, B., “Grounded theory and organizational research”. The Journal of Applied Behavioral Science , 22 (2), 1986, pp.141 - 157.
    • [19] Corbin, J., & Strauss, A., “Basics of qualitative research: techniques and procedures for developing grounded theory” (3rd edition ed.). London: Sage Publications, 2008.
    • [20] Shafer, G., “A mathematical theory of evidence”. Princeton: Princeton University Press, 1976.
    • [21] Zhu, X., & Gauch, S., “Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web”. SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, 2000, pp. 288 - 295. ACM New York, NY, USA.
  • No related research data.
  • Discovered through pilot similarity algorithms. Send us your feedback.

Share - Bookmark

Download from

Cite this article