Remember Me
Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:

OpenAIRE is about to release its new face with lots of new content and services.
During September, you may notice downtime in services, while some functionalities (e.g. user registration, login, validation, claiming) will be temporarily disabled.
We apologize for the inconvenience, please stay tuned!
For further information please contact helpdesk[at]openaire.eu

fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Barresi, S
Languages: English
Types: Doctoral thesis
The advances in data collection and the increasing amount of unstructured and unlabeled\ud text documents have led to the need for better disambiguation and indexing techniques,\ud which allow for the effective and intelligent organisation of large amounts of documents\ud into a small number of significant clusters; facilitating the analysis, browsing, and\ud searching of document collections. Traditionally, document clustering systems have\ud relied on bag-of-words and term frequency approaches to represent and subsequently\ud classify documents, by only taking into account document syntax and with no\ud consideration for semantic aspects. To address this issue, more complex indexing and\ud clustering techniques, which consider the semantic associations between the words\ud contained in a document and differentiate the degree of semantic importance of terms\ud during the classification process, need to be further investigated in order to enable\ud appropriate and automatic contextualisation of text documents and information.\ud This research proposes a new indexing technique, which can be used to effectively\ud represent, and subsequently cluster, collections of unstructured or structured documents.\ud The presented technique aims at overcoming some of the major problems related to the\ud bag-of-words approach; such as its lack of consideration for synonyms as well as its usual\ud failure in differentiating the degree of semantic importance of terms. The main idea\ud behind the proposed technique is to map each document into a lower dimensional space;\ud by considering the semantic associations between the words contained in the document.\ud To address the semantic problems posed by traditional indexing, the investigated method\ud focuses on word sense disambiguation and document concepts. The proposed technique\ud extracts concepts from documents and uses a set of these concepts as indexing units,\ud achieving vector dimensionality reduction as well as more cohesive and separated\ud clusters. Good results are also achieved in terms of purity, entropy, and when compared\ud with similar studies in the field of semantic-based concept indexing.
  • No references.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article

Cookies make it easier for us to provide you with our services. With the usage of our services you permit us to use cookies.
More information Ok