Remember Me
Or use your Academic/Social account:


Or use your Academic/Social account:


You have just completed your registration at OpenAire.

Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions.


Please note that this site is currently undergoing Beta testing.
Any new content you create is not guaranteed to be present to the final version of the site upon release.

Thank you for your patience,
OpenAire Dev Team.

Close This Message


Verify Password:
Verify E-mail:
*All Fields Are Required.
Please Verify You Are Human:
fbtwitterlinkedinvimeoflicker grey 14rssslideshare1
Publisher: IEEE
Languages: English
Types: Unknown
Subjects: QH426, QA75
This paper presents a novel approach based on the analysis of genetic variants from publicly available genetic profiles and the manually curated database, the National Human Genome Research Institute Catalog. Using data science techniques, genetic variants are identified in the collected participant profiles then indexed as risk variants in the National Human Genome Research Institute Catalog. Indexed genetic variants or Single Nucleotide Polymorphisms are used as inputs in various machine learning algorithms for the prediction of obesity. Body mass index status of participants is divided into two classes, Normal Class and Risk Class. Dimensionality reduction tasks are performed to generate a set of principal variables - 13 SNPs - for the application of various machine learning methods. The models are evaluated using receiver operator characteristic curves and the area under the curve. Machine learning techniques including gradient boosting, generalized linear model, classification and regression trees, K-nearest neighbours, support vector machines, random forest and multilayer neural network are comparatively assessed in terms of their ability to identify the most important factors among the initial 6622 variables describing genetic variants, age and gender, to classify a subject into one of the body mass index related classes defined in this study. Our simulation results indicated that support vector machine generated high accuracy value of 90.5%.
  • The results below are discovered through our pilot algorithms. Let us know how we are doing!

    • 498, no. 7453, pp. 255-260, 2013.
    • Gerstein, “The real cost of sequencing: scaling computation to keep pace with data generation,” Genome Biol., vol. 17, no. 1, p.
    • 53, Dec. 2016.
    • J. Efron, R. Iyer, M. C. Schatz, S. Sinha, and G. E. Robinson, “Big Data: Astronomical or Genomical?,” PLOS Biol., vol. 13, no. 7, p. e1002195, 2015.
    • J. H. Moore, F. W. Asselbergs, and S. M. Williams, “Bioinformatics challenges for genome-wide association studies.,” Bioinformatics, vol. 26, no. 4, pp. 445-55, Feb. 2010.
    • P. Tarczy-Hornoch and M. Minie, “Bioinformatics Challenges and Opportunities,” in Medical Informatics, vol. 8, Boston: Springer US, 2005, pp. 63-94.
    • P. M. Visscher, M. A. Brown, M. I. McCarthy, and J. Yang, “Five Years of GWAS Discovery,” Am. J. Hum. Genet., vol. 90, no. 1, pp. 7-24, Jan. 2012.
    • J. Hardy and A. Singleton, “Genomewide association studies and human disease.,” N. Engl. J. Med., vol. 360, no. 17, pp. 1759-68, Apr. 2009.
    • [44] [46] [48] [50] [52] [54] “Generalized Linear Model for Mapping Discrete Trait Loci Implemented with LASSO Algorithm,” PLoS One, vol. 9, no. 9, p. e106985, Sep. 2014.
    • T. S. Kershaw, J. Lewis, C. Westdahl, Y. F. Wang, S. S. Rising, Z. Massey, and J. Ickovics, “Using Clinical Classification Trees to Identify Individuals at Risk of STDs During Pregnancy,” Perspect. Sex. Reprod. Health, vol. 39, no. 3, pp. 141-148, Sep.
    • Z. Yao and W. L. Ruzzo, “A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data,” BMC Bioinformatics, vol. 7, no. Suppl 1, p.
    • S11, 2006.
    • A. Meyers, B.-L. Chang, S. L. Zheng, H. Grönberg, J. Xu, and F.-C. Hsu, “A support vector machine approach for detecting gene-gene interaction,” Genet. Epidemiol., vol. 32, no. 2, pp.
    • 152-167, Feb. 2008.
    • X. Chen and H. Ishwaran, “Random forests for genomic data analysis,” Genomics, vol. 99, no. 6, pp. 323-329, Jun. 2012.
    • L. J. Lancashire, C. Lemetre, and G. R. Ball, “An introduction to artificial neural networks in bioinformatics--application to complex microarray and mass spectrometry datasets in cancer studies,” Brief. Bioinform., vol. 10, no. 3, pp. 315-329, Dec.
    • R Development Core Team, “R: A language and environment for statistical computing.” R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.Rproject.org. 2008.
    • M. Kuhn, “Building Predictive Models in R Using the caret Package,” J. Stat. Softw., vol. 28, no. 5, pp. 1-26, 2008.
    • C. Aday Curbelo Montañez, P. Fergus, A. Hussain, D. AlJumeily, B. Abdulaimma, and H. Al-Askar, “A Genetic Analytics Approach for Risk Variant Identification to Support Intervention Strategies for People Susceptible to Polygenic Obesity and Overweight,” in Intelligent Computing Theories and Application: 12th International Conference, ICIC 2016, Lanzhou, China, August 2-5, 2016, Proceedings, Part I, D.-S. Huang, V.
    • Bevilacqua, and P. Premaratne, Eds. Cham: Springer International Publishing, 2016, pp. 808-819.
    • World Health Organization, “WHO | World Health Organization,” WHO, 2016. [Online]. Available: http://www.who.int/en/. [Accessed: 15-Nov-2016].
    • Robles, “Machine learning in bioinformatics,” Brief. Bioinform., vol. 7, no. 1, pp. 86-112, 2006.
    • Data Anal., vol. 38, no. 4, pp. 367-378, Feb. 2002.
    • M. Martin, “Associations between an obesity related genetic variant (FTO rs9939609) and prostate cancer risk,” PLoS One, vol. 5, no. 10, pp. 3-9, 2010.
  • Inferred research data

    The results below are discovered through our pilot algorithms. Let us know how we are doing!

    Title Trust
  • No similar publications.

Share - Bookmark

Download from

Cite this article