Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Sanchez Garcia, Moises Noe
Languages: English
Types: Doctoral thesis
Subjects: TJ
Data mining refers to the automation of data analysis to extract patterns from large amounts of data. A major breakthrough in modelling natural patterns is the recognition that nature is fractal, not Euclidean. Fractals are capable of modelling self-similarity, infinite details, infinite length and the absence of smoothness. This research was aimed at simplifying the discovery and detection of groups in data using fractal dimension. These data mining tasks were addressed efficiently. The first task defines groups of instances (clustering), the second selects useful features from non-defined (unsupervised) groups of instances and the third selects useful features from pre-defined (supervised) groups of instances. Improvements are shown on two data mining classification models: hierarchical clustering and Artificial Neural Networks (ANN). For clustering tasks, a new two-phase clustering algorithm based on the Fractal Dimension (FD), compactness and closeness of clusters is presented. The proposed method, uses self-similarity properties of the data, first divides the data into sufficiently large sub-clusters with high compactness. In the second stage, the algorithm merges the sub-clusters that are close to each other and have similar complexity. The final clusters are obtained through a very natural and fully deterministic way. The selection of different feature subspaces leads to different cluster interpretations. An unsupervised embedded feature selection algorithm, able to detect relevant and redundant features, is presented. This algorithm is based on the concept of fractal dimension. The level of relevance in the features is quantified using a new proposed entropy measure, which is less complex than the current state-of-the-art technology. The proposed algorithm is able to maintain and in some cases improve the quality of the clusters in reduced feature spaces. For supervised feature selection, for classification purposes, a new algorithm is proposed that maximises the relevance and minimises the redundancy of the features simultaneously. This algorithm makes use of the FD and the Mutual Information (MI) techniques, and combines them to create a new measure of feature usefulness and to produce a simpler and non-heuristic algorithm. The similar nature of the two techniques, FD and MI, makes the proposed algorithm more suitable for a straightforward global analysis of the data.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • 2.2.2 Number of clusters
    • 2.2.3 Cluster Validity 2.3 Feature Selection Approaches
    • 2.3.1 Filter Approach
    • 2.3.2 Wrapper Approach
    • 2.3.3 Embedded approach 2.4 Unsupervised Feature Selection Approaches 2.5 Unsupervised Feature Relevance 2.6 Statistical Entropy
    • 2.6.1 Entropy in Data Mining
    • 2.6.2 Entropy as Relevance Measure 2.7 Search Techniques Overview
    • 2.7.1 Sequential Selection Algorithms (SSA)
    • 2.7.1 Second generation of SSA 2.8 Relevance in Features
    • 2.8.1 Relevance Approaches in FS
    • 2.8.2 Relevance versus Optimality 2.9 Feature Redundancy 2.10 Mutual Information 2.11 Redundancy Feature Analysis 2.10 Redundancy Feature Analysis 2.11 Summary 3.3 Fractal Clustering 3.4 Proposed Algorithm
    • 3.4.1 Second Phase
    • 3.4.2 Stopping Criterion
    • 3.4.3 Number o f clusters
    • 3.4.4 Refining Step
    • 3.4.5 Evaluation Measure
    • databases V L D B Conference Santiago, Chile. AHA, D . W. & BA NK ERT, R. L. (1996) A comparative evaluation o f sequential feature
    • selection algorithms Artificial Intelligence and Statistics V . Springer-Verlag A 1 A H A K O O N , D. & HALGAM UGE, S. (2000) Dynamic Self-Organizing Maps with
    • Controlled G row th for Knowledge Discovery. Transactions on NeuralNetworks 11, 601-
    • 614. ARABIE, P. & H U BERT, L. (1994) Cluster Analysis in marketing research. Advanced Methods in
    • Marketing Research. , 160-189. ARYA, M., C O D Y , W., FALOUTSOS, C , RICHARDSON, J. & T O G A , A. (1993) A prototype
    • 3-D medical image database system IE E E Data Engineerig Bulletin 16, 38-42. AU NFFARTH, B., L O P E Z , M. & CERQU ID ES, J. (2008) Hopfield Networks in Relevance and
    • Micro-CT Images, findjournalIC D M 16-31. BARBARA, D. (2003) Using Self-Similarity to Cluster Large Data Sets. Data Mining and Knowledge
    • Discovery 7, 123-152. BARBASI, A.-L. (2002) Linked: The New Science ofNetworks Perseus Publishing BARNSLEY, M. (1988) Fractals Everywhere, ACADEMIC PRESS INC. (LO N D O N ) LTD. BATTITI, R. (1994) Using M utual Inform ation for Selecting Features in Supervised Neural N et
    • Learning. IE E E Transactions on Neural Networks 5. BAUMGARTNER, C. & PLA N T, C. (2004) Subspace Selection for Clustering High-
    • Dimensional Data. International Conference on Data Mining. BEIRLANT, E., D U D E W IC Z , E., G Y O R FI, L. & M EULEN, E. V. D. (1996) Nonparametric
    • Entropy Estim ation Internationaljournal on Mathematical and Statistical Science 5,17-39.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article