Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Publisher: Springer
Languages: English
Types: Part of book or chapter of book
Subjects: chem, aintel, ge
It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • [BB02] Christian Borgelt and Michael R. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In ICDM, pages 51-58. IEEE Computer Society, 2002.
    • [BD98] Hendrik Blockeel, Luc De Raedt: Top-Down Induction of FirstOrder Logical Decision Trees. Artif. Intell. 101(1-2): 285-297 (1998).
    • [BDK04] H. Blockeel, S. Dzeroski, B. Kompare, S. Kramer, B. Pfahringer, and W. Van Laer. Experiments in predicting biodegradability. In Appl. Art. Int. 18, pages 157-181, 2004.
    • [BZRN06] Björn Bringmann, Albrecht Zimmermann, Luc De Raedt, and Siegfried Nijssen. Don‟t be afraid of simpler patterns. In Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors, PKDD, volume 4213 of Lecture Notes in Computer Science, pages 55-66. Springer, 2006.
    • [Cod74] E.F. Codd. Recent Investigations into Relational Data Base Systems. IBM Research Report RJ1385 (April 23rd, 1974). Republished in Proc. 1974 Congress (Stockholm, Sweden, 1974). New York, N.Y.: North-Holland, 1974.
    • [CJ97] Dennis D. Cox and Susan John. SDO: a statistical method for global optimization. In Multidisciplinary design optimization (Hampton, VA, 1995), pages 315-329. SIAM, Philadelphia, PA, 1997.
    • [CPB88] R.D. III Cramer, D.E. Patterson, and Bunce J.D. Comparative Field Analysis (CoMFA). 1. The effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc. 110: 5959-5967, 1988.
    • [DTK98] L. Dehaspe, H. Toivonen, and R.D. King. Finding frequent substructures in chemical compounds. In: The Fourth International Conference on Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, Ca. 30-36, 1998.
    • [DDR97] Luc Dehaspe, Luc De Raedt: Mining Association Rules in Multiple Relations. In: ILP 1997: 125-132.
    • [DR08] Luc De Raedt. Statistical and Relational Learning. Springer, 2008.
    • [DR09] Luc De Raedt, Jan Ramon: Deriving distance metrics from generality relations. Pattern Recognition Letters 30(3): 187-191 (2009).
    • [DHS01] R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classification. Wiley, 2001.
    • [EK03] D. Enot and R.D. King. Application of inductive logic programming to structure-based drug design. Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2003.
    • [Epp95] D. Eppstein. Subgraph isomorphism in planar graphs and related problems. In Symposium on Discrete Algorithms, pages 632-640, 1995.
    • [FP08] Paolo Frasconi, Andrea Passerini: Learning with Kernels and Logical Representations. Probabilistic Inductive Logic Programming, 2008: 56- 91.
    • [Gär03] Thomas Gärtner. A survey of kernels for structured data. SIGKDD Explorations, 5(1):49-58, 2003.
    • [GE03] Johann Gasteiger and Thomas Engel. Chemoinformatics: A Textbook. Wiley-VCH, 2003.
    • [GFW03] Thomas Gärtner, Peter A. Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. In Bernhard Schölkopf and Manfred K. Warmuth, editors, COLT, volume 2777 of Lecture Notes in Computer Science, pages 129-143. Springer, 2003.
    • [HMFM64] C. Hansch, P.P. Malony, T. Fujiya, and R.M. Muir, R.M. Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature 194, 178-180, 1965.
    • [HBB03] Heiko Hofer, Christian Borgelt, and Michael R. Berthold. Large scale mining of molecular fragments with wildcards. In Michael R. Berthold, Hans-Joachim Lenz, Elizabeth Bradley, Rudolf Kruse, and Christian Borgelt, editors, IDA, volume 2810 of Lecture Notes in Computer Science, pages 376-385. Springer, 2003.
    • [HCKD04] C. Helma, T. Cramer, S. Kramer, and L. De Raedt. Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. In Journal of Chemical Information and Computer Systems 44, pages 1402-1411, 2004.
    • [HR08] Tamás Horváth and Jan Ramon. Efficient frequent connected subgraph mining in graphs of bounded treewidth. In Walter Daelemans, Bart Goethals, and Katharina Morik, editors, ECML/PKDD (1), volume 5211 of Lecture Notes in Computer Science, pages 520-535. Springer, 2008.
    • [HRW06] Tamás Horváth, Jan Ramon, and Stefan Wrobel. Frequent subgraph mining in outerplanar graphs. In Tina Eliassi-Rad, Lyle H. Ungar, Mark Craven, and Dimitrios Gunopulos, editors, KDD, pages 197- 206. ACM, 2006.
    • [HWP03] J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM), pages 549-552. IEEE Press, 2003.
    • [HWPY04] Jun Huan, Wei Wang, Jan Prins, and Jiong Yang. Spin: mining maximal frequent subgraphs from graph databases. In Won Kim, Ron Kohavi, Johannes Gehrke, and William DuMouchel, editors, KDD, pages 581-586. ACM, 2004.
    • [Ino04] Akihiro Inokuchi. Mining generalized substructures from a set of labeled graphs. In ICDM, pages 415-418. IEEE Computer Society, 2004.
    • [IWM00] A. Inokuchi, T. Washio, and H. Motoda. An APRIORI-based algorithm for mining frequent substructures from graph data. In Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), volume 1910 of Lecture Notes in Artificial Intelligence, pages 13-23. Springer-Verlag, 2000.
    • [Jon01] Donald R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21:345-383, 2001.
    • [JS98] Donald R. Jones and Matthias Schonlau. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4):455-492, December 1998.
    • [KK01] M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proceedings of the First IEEE International Conference on Data Mining (ICDM), pages 313-320. IEEE Press, 2001.
    • [KNK06] Jeroen Kazius, Siegfried Nijssen, Joost N. Kok, Thomas Bäck, and Adriaan IJzerman. Substructure mining using elaborate chemical representation. In Journal of Chemical Information and Modeling 46, 2006.
    • [KMLS92] R.D. King, S. Muggleton, R.A Lewis, and M.J.E Sternberg. Drug design by machine learning: The use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proc. Nat. Acad. Sci. U.S.A. 89, 11322-11326, 1992.
    • [KMSS96] R.D. King, S. Muggleton, A. Srinivasan, and M.J.E. Sternberg. Structure-activity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc. Nat. Acad. Sci. USA 93, 438-442, 1996.
    • [KR01] Stefan Kramer and Luc De Raedt. Feature construction with version spaces for biochemical applications. In Carla E. Brodley and Andrea Pohoreckyj Danyluk, editors, ICML, pages 258-265. Morgan Kaufmann, 2001.
    • [KRH01] Stefan Kramer, Luc De Raedt, and Christoph Helma. Molecular feature mining in hiv data. In KDD, pages 136-143, 2001.
    • [KS98] Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. In Proc. 15th International Conf. on Machine Learning, pages 260-268. Morgan Kaufmann, San Francisco, CA, 1998.
    • [Kus64] Harold J. Kushner. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, pages 97-106, March 1964.
    • [LG03] A.R. Leach, and V.J. Gillet. An Introduction to Chemoinformatics, Kluwer Academic Publishers, Dordrecht, 2003.
    • [Lin89] A. Lingas. Subgraph isomorphism for biconnected outerplanar graphs in cubic time. Theoretical Computer Science 63, 295-302, 1989.
    • [LLDF97] C.A. Lipinski, F. Lombardo, B.W. Dominy, and P. J. Feeney. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings, Adv. Drug Delivery Rev., 23(1-3), pp. 3-25, 1997.
    • [LWBS07] Daniel Lizotte, Tao Wang, Michael Bowling, and Dale Schuurmans. Automatic gait optimization with gaussian process regression. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 944-949, 2007.
    • [Mar78] Y.C. Martin. Quantitative Drug Design: A Critical Introduction, Marcel Dekker, New York, 1978.
    • [MT92] J. Matousek and R. Thomas. On the complexity of finding iso- and other morphisms for partial k-trees. Discrete mathemathics, 108(1-3), 343-364, 1992.
    • [Med79] P.B. Medewar. Advice to a Young Scientist. BasicBooks. 1979.
    • [Nij06] Siegfried Nijssen. Mining interpretable subgraphs. In Proceedings of the International Workshop on Mining and Learning with Graphs (MLG), 2006.
    • [NK04] Siegfried Nijssen and Joost N. Kok. A quickstart in frequent structure mining can make a difference. In Proceedings of the 2004 International Conference on Knowledge Discovery and Data Mining (KDD), pages 647-652. ACM Press, 2004.
    • [RN08] Jan Ramon and Siegfried Nijssen. Polynomial-delay enumeration of monotonic graph classes. Journal of Machine Learning Research, 2009.
    • [Sas02] M. J. Sasena. Flexibility and Efficiency Enhancements for Constrained Global Design Optimization with Kriging Approximations. PhD thesis, University of Michigan, 2002.
    • [SK09] A. Schierz, and R.D. King. Drugs and Drug-like compounds: Discriminating Approved Pharmaceuticals from Screening Library Compounds. In Pattern Recognition in Bioinformatics, pages 331-343, 2009.
    • [SRBB08] Leander Schietgat, Jan Ramon, Maurice Bruynooghe, Hendrik Blockeel: An Efficiently Computable Graph-Based Metric for the Classification of Small Molecules. In Discovery Science 2008: 197-209.
    • [Vish09] S. V. N. Vishwanathan, Nicol N. Schraudolph, Imre Risi Kondor, and Karsten M. Borgwardt. Graph Kernels. Journal of Machine Learning Research, 2009.
    • [WK06] Nikil Wale and George Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. In ICDM, pages 678- 689. IEEE Computer Society, 2006.
    • [YH02] X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proceedings of the Second IEEE International Conference on Data Mining (ICDM), pages 721-724. IEEE Press, 2002.
    • [YH03] Xifeng Yan and Jiawei Han. Closegraph: mining closed frequent graph patterns. In KDD, pages 286-295. ACM, 2003.
    • [ZD08] B. Zenko, and S. Dzeroski. Learning Classification Rules for Multiple Target Attributes. In PAKDD, pages 454-465, 2008.
  • No related research data.
  • No similar publications.

Share - Bookmark

Cite this article