LOGIN TO YOUR ACCOUNT

Username
Password
Remember Me
Or use your Academic/Social account:

CREATE AN ACCOUNT

Or use your Academic/Social account:

Congratulations!

You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.

Important!

Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message

CREATE AN ACCOUNT

Name:
Username:
Password:
Verify Password:
E-mail:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Ren, L. (2008)
Languages: English
Types: Doctoral thesis
Subjects: T
Despite the success of connectionist systems in prediction and classi¯cation problems, critics argue that the lack of symbol processing and explanation capability makes them less competitive than symbolic systems. Rule extraction from neural networks makes the interpretation of the behaviour of connectionist networks possible by relating sub-symbolic and symbolic process- ing. However, most rule extraction methods focus only on speci¯c neural network architectures and present limited generalization performance. Support Vector Machine is an unsupervised learning method that has been recently applied successfully in many areas, and o®ers excellent generalization ability in comparison with other neural network, statistical, or symbolic machine learning models. In this thesis, an algorithm called Geometric and Oracle-Based Support Vector Machines Rule Extraction (GOSE) has been proposed to overcome the limitations of other rule-extraction methods by extracting comprehensible models from Support Vector Machines (SVM). This algorithm views the extraction as a geometric task. Given a trained SVM network, GOSE queries the synthetic instances and draws conjunction rules by approximating the optimization problem. The extracted rule set also represents the approximation of the SVM classi¯cation boundary. Unlike previous works in SVM rule-extraction, GOSE is broadly applicable to different networks and problems because it need not rely on training examples and network architectures. Theoretical proof guarantees that GOSE is capable of approximating the behavior of SVM networks. Empirical experiments are conducted on di®erent SVM networks from binary classification networks to multi-class networks in various classi¯cation domains. The result of experiments demonstrates that GOSE can extract comprehensible rules with high levels of accuracy and ¯delity for their corresponding networks. GOSE also exhibits superior consistency. After analyzing and applying several optimizing measures, the complexity of GOSE was improved. In brief, GOSE provides a novel way to explain how an SVM network functions.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • 7.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
    • 7.2 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . 146 A Rule Sets 157
    • A.0.1 Monk's Problem . . . . . . . . . . . . . . . . . . . . . . . . . 157
    • A.0.2 Iris Plant Problem . . . . . . . . . . . . . . . . . . . . . . . 161
    • A.0.3 Breast Cancer Problem . . . . . . . . . . . . . . . . . . . . . 162
    • 2.1 Boundary of two dichotomies . . . . . . . . . . . . . . . . . . . . . 12 2.2 The example of VC dimension . . . . . . . . . . . . . . . . . . . . . 16 2.3 The bound of risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Two-class linear separable problem . . . . . . . . . . . . . . . . . . 20 2.5 Two-class linear nonseparable problem . . . . . . . . . . . . . . . . 24 2.6 Mapping from original to feature space . . . . . . . . . . . . . . . . 26 2.7 The hyperplane of XOR problem in feature space. . . . . . . . . . . 30 2.8 Relations of two Lagrange multipliers ®1 and ®2 [40] . . . . . . . . 32 2.9 Using Rooted Binary DAG to decide the best class within four
    • classes [39]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.10 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.11 The mean of di®erent N samples converges to the integral. . . . . 41 3.1 Two classes classi¯cation . . . . . . . . . . . . . . . . . . . . . . . . 46 3.2 Rules extracted from data groups . . . . . . . . . . . . . . . . . . . 46 3.3 Rule extraction system . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.4 A unit of ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5 Rules extracted from a unit in Figure 3.4 . . . . . . . . . . . . . . . 51 3.6 Simple example of VIA algorithm in forward phase . . . . . . . . . 55 3.7 Rules extracted from a unit(Fig 3.6) . . . . . . . . . . . . . . . . . 55 3.8 a) Equation-rule b) Interval rule [59] . . . . . . . . . . . . . . . . . 62
    • 3.10 A two-dimension example to get the cross points for the initial phase
    • of RulExSVM [58] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.17 RULE PRUNING function . . . . . . . . . . . . . . . . . . . . . . . 98 5.1 The accuracy of Monk-1 under di®erent data size comparing with
    • that of SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.2 The accuracy of Monk-2 under di®erent data size comparing with
    • that of SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.3 The accuracy of Monk-3 under di®erent data size comparing with
    • that of SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.4 The prediction accuracy of Monk-1 by changing the stopping crite-
    • rion D so that the number of cluster increases from 1 to 5. The
    • N = 20 for this ¯gure. . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.5 The prediction Accuracy of Monk-2 by changing the stopping cri-
    • terion D so that the number of cluster increases from 1 to 5. The
    • N = 50 for this ¯gure. . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.6 The prediction Accuracy of Monk-3 by changing the stopping cri-
    • terion D so that the number of cluster increases from 1 to 5. The
    • N = 20 for this ¯gure. . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.7 The accuracy under di®erent size of data set . . . . . . . . . . . . . 120 5.8 The association between the cluster number and prediction accuracy 122 5.9 The accuracy under di®erent size of data set . . . . . . . . . . . . . 124
    • 2.1 Examples of Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2 XOR Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3 Distances between pairs of the inputs . . . . . . . . . . . . . . . . . 39 2.4 The mean, variance and estimated variance on ex under di®erent
    • number of N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1 Distances between two instances . . . . . . . . . . . . . . . . . . . . 82 5.1 An example of consistency . . . . . . . . . . . . . . . . . . . . . . . 108 5.2 Similarity between A and B . . . . . . . . . . . . . . . . . . . . . . 108 5.3 Test-set ¯delity(%) and consistency for Monk's, Iris and Breast Can-
    • cer problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.4 The values of D for di®erent number of clusters, where cl is the
    • abbreviation of cluster . . . . . . . . . . . . . . . . . . . . . . . . . 116 [1] Charles. A and Dennis. J. E. Analysis of generalized pattern searches. SIAM
    • Journal on Optimization, 13(3):889{903, 2003. [2] Garcez. A. A, Broda. K, and Gabbay. D. Symbolic knowledge extraction from
    • trained neural networks: a sound approach. Arti¯cial Intelligence, 125(1{
    • 2):155{207, January 2001. [3] Karaivanova. A, Dimov. I, and Ivanovska. S. A quasi-monte carlo method
    • for integration with improved convergence. In LSSC '01: Proceedings of the
    • Papers, pages 158{165, London, UK, 2001. Springer-Verlag. [4] Ultsch. A. Knowledge extraction from self-organizing neural networks. In-
    • formation and Classi¯cation, pages 301{306, 1993. [5] Vahed. A and Omlin. C. W. Rule extraction from recurrent neural networks
    • using a symbolic machine learning algorithm. In Proceedings 6th Interna-
    • tional Conference on Neural Information Processing, volume 2, pages 712{
    • 717, 1999. [6] Vahed. A and Omlin. C. W. A machine learning method for extracting
    • 16(1):59{71, 2004. [7] Boser. B, Guyon. I, and Vapnik. V. A training algorithm for optimal margin
    • classi¯ers. In Computational Learning Theory, pages 144{152, 1992. [8] Hammer. B, Rechtien. A, Strickert. M, and Villmann. T. Rule extraction from
    • Networks, pages 370{375, 2002. [9] Hjorland. B and Albrechtsen. H. Toward a new horizon in information science:
    • 46(6):400{425, 1995. [10] Thrun. S. B. Extracting provably correct rules from arti¯cial neural networks.
    • Technical report, Institut fuÄr Informatik III, UniversitaÄt Bonn, 1993. [11] Tickle. A. B, Oriowsk. M, and Diederich. J. Dedec: decision detection by
    • Centre, Queensland University, Technol., Brisbane, Qld., September 1994. [12] Tickle. A. B, Andres. R, Golea. M, and Diederich. J. The truth will come to
    • 9(6):1057{1068, 1998. [13] Burges. C. J. C. A tutorial on support vector machines for pattern recogni-
    • tion. Data Mining and Knowledge Discovery, 2(2):121{167, 1998. [14] Giles. L. C and Omlin. W. C. Extraction, insertion, and re¯nement of sym-
    • 5:307{328, 1993. Special Issue on Architectures for Integrating Symbolic and
    • Neural Processes. [15] Lee. C and Landgrebe. A. D. Feature extraction based on decision boundaries.
    • IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4):388{
    • 400, 1993. [16] McMillan. C, Mozer. M. C, and Smolensky. P. The connectionist scientist
    • 424{430, 1991. [17] Omlin. W. C and Giles. L. C. Extraction of rules from discrete-time recurrent
    • neural networks. Neural Networks, 9(1):41{52, 1996. [18] Pop. E, Hayward. R, and Diederich. J. Ruleneg: Extracting rules from a
    • 1994. [19] Rumelhart. D. E, Hinton. G. E, and Williams. R. J. Learning internal rep-
    • resentations by error propagation. pages 318{362, 1986. [20] Chaves. Ad. C. F, Vellasco. M. M. B. R, and Tanscheit. R. Fuzzy rule extrac-
    • tion from support vector machines. In Proceedings of the 5th International
    • Conference on Hybrid Intelligent Systems, pages 335{340, 2005. [21] Maire. F. A partial order for the m-of-n rule extraction algorithm. IEEE
    • Transaction on Neural Networks, 8(6):1542{1544, 1997. [22] Fung. G, Sandilya. S, and Rao. B. R. Rule extraction from linear support
    • conference on Knowledge discovery in data mining, pages 32{40, Chicago,
    • Illinois, USA, 2005. [23] Karypis. G, Han. E-H, and Kumar. V. Chameleon: Hierarchical clustering
    • using dynamic modeling. Computer, 32(8):68{75, 1999. [24] Fisher. D. H. Knowledge acquisition via incremental conceptual clustering.
    • Machine Learning, 2:139{172, 1987. [25] Halton. J. H. On the e±ciency of certain quasi-random sequences of points
    • in evaluating multi-dimensional integrals. Numerical Mathematics, 2:84{90,
    • 1960. [26] Jacobsson. H. Rule extraction from recurrent neural networks: A taxonomy
    • and review. Neural Computation, 17(6):1223{1263, 2005. [27] Karlo®. H. Linear programming. MA: BirkhaÄuser, 1991. [28] Kriegel. P. H and Pfei°e. M. Density-based clustering of uncertain data. In
    • KDD '05: Proceedings of the eleventh ACM SIGKDD international confer-
    • ence on Knowledge discovery in data mining, pages 672{677, New York, NY,
    • USA, 2005. ACM Press. [29] Morohosi. H and Fushimi. M. A practical approach to the error estimation
    • Monte Carol and Quasi-Monte Carlo Methods, pages 377{390, Berlin, 1998.
    • Springer. [30] Niederreiter. H. Random Number Generation and Quasi-Monte Carlo Meth-
    • ods. Society for Industrial Mathematics, Philadelphia, 1992. [31] Nu¶n~ez. H, Cecilio Angulo, and Andreu Catalµa. Hybrid architecture based on
    • on Arti¯cial Neural Networks, pages 646{653, 2003. [32] Simon. H. Neural networks: a comprehensive foundation. Prentice Hall PTR,
    • Upper Saddle River, NJ, USA, 1999. [33] Lu. H-J, Setiono. R, and Liu. H. Neurorule: A connectionist approach to
    • editors, Proceedings of 21th International Conference on Very Large Data
    • Bases, September 11-15, 1995, Zurich, Switzerland, pages 478{489. Morgan
    • Kaufmann, 1995. [34] Cloete. I and Zurada. J. M. Knowledge-based neurocomputing. The MIT
    • Press, Cambridge, MA, USA, 2000. [35] Gallant. S. I. Connectionist expert systems. Communications of the ACM,
    • (2):152{169, 1988. [36] Bacher. J. A probabilistic clustering model for variables of mixed type. Qual-
    • ity and Quantity, 34(13):223{235, August 2000. [37] Mahoney. J. J and Mooney. R. J. Initializing id5r with a domain theory:
    • Some negative results. Technical Report AI91-154, University of Texas at
    • Austin, 1991. [38] Neumann. J. Classi¯cation and evaluation of algorithms for rule extraction
    • from arti¯cial neural networks. PhD thesis, University of Edingburgh, 1998. [39] Platt. J, Cristianini. N, and Shawe-Taylor. J. Large margin dags for multi-
    • volume 12, pages 547{553, Cambridge, 2000. MIT Press. [40] Platt. C. J. Sequential minimal optimization: A fast algorithm for train-
    • ing support vector machines. Technical Report MSR-TR-98-14, Microsoft
    • Research, April 1998. [41] Rabunal. R. J, Dorado. J, Pazos. A, Pereira. J, and Rivero. D. A new ap-
    • through gp. Neural Computation, 16(7):1483{1523, 2004. [42] Zhang. J and Liu. Y. Svm decision boundary based discriminative subspace
    • induction. Pattern Recognition, 38(10):1746{1758, 2005. [43] Saito. K and Nakano. R. Medical diagnostic expert system based on pdp
    • works, pages 255{262. IEEE Press, 1988. [44] Saito. K and Nakano. R. Law discovery using neural networks. In Proceedings
    • of the 15th International Joint Conference on Arti¯cial Intelligence, pages
    • 1078{1083, 1997. [45] Suykens. J. A. K and Vandewalle. J. Least squares support vector machine
    • classi¯ers. Neural Processing Letters, 9(3):293{300, 1999. [46] Bochereau. L and Bourgine. P. Extraction of semantic features and logical
    • Neural Networks, volume 2, pages 579{582, Washington DC, 1990. [47] Castro. J. L, Mantas. C.J, and Benitez. J.M. Interpretation of arti¯cial neural
    • 13(1):101{116, 2002. [48] Hansen. K. L and Salamon. P. Neural network ensembles. IEEE Trans.
    • Pattern Anal. Mach. Intell., 12(10):993{1001, 1990. [49] Fu. L.M. Rule generation from neural networks. IEEE Transactions on
    • systems, Man, and Cybernetics, 24(8):1114{1124, 1994. [50] Craven. M. Extracting comprehensible models from trained neural networks.
    • PhD thesis, University of Wisconsin, Madison, WI, 1996. [51] Craven. M and Shavlik. J. Rule extraction: where do we go from here?
    • 99-1, 1999. [52] Craven. W. M. Extracting comprehensible models from trained neural
    • Wisconsin-Madison, 1996. [53] Fu. L. M. Rule learning by searching on adapted nets. In Proceedings of the
    • Ninth National Conference on Arti¯cial Intelligence, pages 590{595, Anaheim
    • CA, 1991. [54] Ishikawa. M. Neural networks approach to rule extraction. In ANNES '95:
    • Arti¯cial Neural Networks and Expert Systems, page 6. IEEE Computer So-
    • ciety, November 1995. [55] Ishikawa. M. Rule extraction by successive regularization. Neural Networks,
    • 13(10):1171{1183, 2000. [56] Siponen. M, Vesanto. J, Simula. O, and Vasara. P. An approach to automated
    • interpretation of som. In Advances in Self-Organizing Maps, pages 89{94.
    • Springer, 2001. [57] Barakat. N and Diederich. J. Learning-based rule-extraction from support
    • vector machines. In The 14th International Conference on Computer Theory
    • and applications, pages 247{252, 2004. [58] Barakat. N and Diederich. J. Eclectic rule extraction from support vector
    • machines. International Journal of Computational Intelligence, 2(1):59{62,
    • 2005. [59] Nu~nÄez. N, Angulo. C, and Catala_ . A. Rule extraction from support vec-
    • Networks Bruges, pages 107{112, Belgium, 2003. [60] Nu¶ nez. H, Angulo. C, and Catalµa. Rule based learning systems for support
    • vector machines. Neural Processing Letters, 24(1):1{18, August 2006. [61] Raymond. T. Ng and Han. J. W. Clarans: a method for clustering objects for
    • 14(5):1003{1016, 2002. [62] Berkhin. P. Survey of clustering data mining techniques. Technical report,
    • Accrue Software, San Jose, CA, 2002. [63] Frasconi. P, Gori. M, Maggini. M, and Soda. G. Representation of ¯nite state
    • 23(1):5{32, 1996. [64] Tino. P and Sajda. J. Learning and extracting initial mealy automata with a
    • modular neural network model. Neural Computation, 7(4):822{44, July 1995. [65] Tino. P and Koteles. M. Extracting ¯nite-state representations from recur-
    • tion on Neural Networks, 10(2):284{302, 1999. [66] Andrews. R, Diederich. J, and Tickle. A. B. A survey and critique of tech-
    • Based Systems, 8:373{389, 1995. [67] Andrews. R and Geva. S. Rule extraction from a constrained error back
    • propagation mlp. In Proceedings of the 5th Australian Conference on Neural
    • Networks, pages 9{12, 1994. [68] Andrews. R and Geva. S. Inserting and extracting knowledge from con-
    • strained error back propagation networks. In Proceedings of the 6th Aus-
    • tralian Conference on Neural Networks, pages 29{32, 1995. [69] Filer. R, Sethi. I, and Austin. J. A comparison between two rule extraction
    • methods for continuous input data. In Proceedings of NIPS'97 Rule Extrac-
    • tion From Trained Arti¯cial Neural Networks Workshop, pages 38{45, 1996. [70] Krishnan. R. A systematic method for decompositional rule extraction from
    • neural networks. Proceedings of NIPS'97 Rule Extraction from Trained Ar-
    • ti¯cial Neural Networks Workshop, pages 38{45, 1996. [71] Setiono. R. Extracting rules from neural networks by pruning and hidden
    • unit splitting. Neural Computation, 9(1):205{225, 1997. [72] Setiono. R. A penalty-function approach for pruning feedforward neural net-
    • works. Neural Computation, 9(1):185{204, 1997. [73] Setiono. R, Leow. W. K, and Zurada. J. M. Extraction of rules from arti¯cial
    • neural networks for nonlinearregression. volume 13, pages 564{577, 2002. [74] Sibson .R. Slink: An optimally e±cient algorithm for the single-link cluster
    • method. The Computer Journal, 16(1):30{34, 1973. [75] Sutton. S. R and Barto. G. A. Reinforcement learning. MIT Press, Cam-
    • bridge, MA, 1998. [76] Watrous. L. R and Kuhn. M. G. Induction of ¯nite-state languages using
    • second-order recurrent networks. Neural Computation, 4(3):406{414, May
    • 1992. [77] UCI Machine Learning repository.
    • MLRepository.html. [78] Haykins. S. Neural Networks: A Comprehensive Foundation. Prentice Hall
    • PTR, Upper Saddle River, NJ, USA, second edition edition, 2001. [79] Sestito. S and Dillon. T. Automated knowledge acquisition. Prentice Hall
    • (Australia), 1994. [80] Thrun. S. Extracting rules from arti¯cial neural networks with distributed
    • representations. Advances in Neural Information Processing Systems, 7:505{
    • 512, 1995. [81] Thrun. S, Bala. J, Bloedorn. E, Bratko. I, Cestnik. B, Cheng. J, De Jong.
    • Technical report, Carnegie Mellon University, 1991. [82] Hastie. T and Tibshirani. R. Classi¯cation by pairwise coupling. In Advances
    • in Neural Information Processing Systems, volume 10. The MIT Press, 1998. [83] Santos. R. T, Nievola. J. C, and Freitas. A. A. Extracting comprehensible
    • rules from neural networks via genetic algorithms. In Proceedings of 2000
    • Networks, pages 130{139, San Antonio, TX, USA, May 2000. IEEE. [84] Johansson. U, Konig. R, and Niklasson. L. Automatically balancing accu-
    • racy and comprehensibility in predictive modeling. In Proceedings of the 8th
    • International Conference on Information Fusion, volume 2, page 7, 2005. [85] Cherkassky. V and Mulier. F. Learning from data: concepts, theory, and
    • methods. John Wiley & Sons Inc., 1998. [86] Vapnik. V. Estimation of Dependences Based on Empirical Data:: Springer
    • NJ, USA, 1982. [87] Vapnik. V and Lerner. A. Pattern recognition using generalized portrait
    • method. Automation and Remote Control, 24:774{780, 1963. [88] Vapnik. N. V. The nature of statistical learning theory. Springer, New York,
    • NY, USA, 1995. [89] Vapnik. N. V. Statistical learning theory. John Wiley and Sons, INC, 1998. [90] Bala. J. W., Bloedorn. E., De Jong. K. A., Kaufman. K., Michalski. R. S.,
    • Report MLI92-9, George Mason University, Fairfax, VA, 1992. [91] Craven. M. W and Shavlik. W. J. The extraction of re¯ned rules from knowl-
    • edge based neural networks. Machine Learning, 131(1):71{101, 1993. [92] Craven. M. W and Shavlik. J. W. Using sampling and queries to extract
    • Learning, pages 37{45, 1994. [93] Feller. W. An Introduction to Probability Theory and Its applications, vol-
    • ume 2. John Wiley, New York, 3 edition, 1971. [94] Moroko®. J. W and Ca°isch. R. E. Quasi-Monte Carlo integration. J. Comp.
    • Phys., 122:218{230, 1995. [95] Omlin. C. W, Giles. C. L, and Miller. C. B. Heuristics for the extraction
    • International Joint Conference on Neural Networks, 1(7-11):33{38, 1992. [96] Rudin. W. Principles of Mathematical Analysis, 3rd edition. ., McGraw-Hill,
    • 1976. [97] Scott. D. W. Multivariate Density Estimation: Theory, Practice, and Visu-
    • alization. John Willey & Sons, New York, 1992. [98] Fu. X-J, Ong. C-J, Keerthi. S, Gih G-H, and Goh L-P. Extracting the knowl-
    • edge embedded in support vector machines. volume 1, pages 291{296, 2004. [99] Fu. X-J and Wang. L-P. Rule extraction from support vector machines.
    • Springer, 2005. [100] Chen. Y-X and Wang. Jame. Z. Support vector learning for fuzzy rule-based
    • classi¯cation systems. IEEE Transactions on Fuzzy Systems, 11:716{728,
    • 2003. [101] Jiang. Z and Chen. Y. Extracting symbolic rules from trained neural network
    • ensembles. AI Communications, 16(1):3{15, 2003. [102] Zeng. Z, Goodman. M, and Smyth. P. Learning ¯nite state machines with
    • selfclustering recurrent networks. Neural Computation, 5(6):978{990, 1993. [103] Zhou. Z. Rule extraction: using neural networks or for neural networks.
    • Journal of Computer Science and Technology, 19:249{253, 2004.
  • No related research data.
  • No similar publications.

Share - Bookmark

Download from

Cite this article