Enhancing food & cosmetics: OpenAIRE Research Graph for consumer health
Ingredio is a mobile app by DENTICA LTD, that allows consumers to capture with their smartphone the ingredients on a cosmetics or food product label, and informs them in seconds about potential hazards of the product ingredients based on open peer-reviewed data. Ingredio informs the growing share of consumers who want to know what is contained in the products they use and embrace healthy eating and cosmetics without hazardous chemicals, and serves companies that want to provide healthier and greener ingredients for their products. Ingredio is a chemical library in the hands of consumers built in a unique mobile app; a powerful, fast and reliable guide to greener, cleaner, and healthier food & cosmetics choices. The application can be used without geographical restrictions for chemical ingredients listed in English.
The purpose of this work is to use and expand the Ingredio technology by working with OpenAIRE research graph to exploit the >30 Mi full-text provided by OPENAIRE, i.e. research results (publications, patents, products – covering datasets, software and other types of output) in order to generate richer information on chemical ingredients of food and cosmetics by taking advantage of the OPENAIRE APIs and available technical support. The final aim is to enrich the OpenAIRE Research Graph with new linked data that may be used seamlessly by consumers that embrace a healthy lifestyle, organic product companies, and companies that want to produce safer products and improve their practices.
Consumers are increasingly worried about potential hazards of chemical ingredients of products we use in our everyday life. Currently, there is no easy way we can be informed about the potential risk of product ingredients, either because their chemical names are too complex or because they are encoded in the product label (e.g. E302). Although access to this information is publicly available, its retrieval for consumers is challenging due to the complexity of the sources, and its comprehension is prohibitive due to the technicality of the description.
Ingredio also offers its services to companies, who want to promote organic products, a healthy lifestyle and/or provide greener and environmental-friendly products to consumers.
How it works
We offer a unique solution to exploit the OpenAIRE research graph by means of the open data in the OPENAIRE Research graph to find correlations between the chemical ingredients found in food and cosmetics with allergies, toxicity, irritation and allergies in order to inform consumers about potential hazards that these chemical ingredients pose to their health. The data blended together with our expertise in text mining adds value for all stakeholders involved in the food and cosmetics industry, from consumers to the leading industry of food and cosmetics and to small organic consumer good producers. By bringing to light this information we enhance the OpenAIRE research graph and further enhance our own mining and machine learning algorithms, web crawling algorithms, etc.
Objective 1. Develop text mining and Machine Learning algorithms to extract OpenAIRE data that link chemical ingredients of food and cosmetics to allergies, irritation, cancer, and toxicity.
We used the data model of OpenAIRE and its entities such as the CERIF semantic layer, objects, datasources, linking entities and types to define structured values for entity properties, etc. We built a Machine Learning algorithm to classify documents describing chemical toxicity in food and cosmetics consumer products using the OpenAIRE research graph to retrieve the documents of interest for the relevant toxicity data. As a training set we used information from our own database as well as relevant peer-reviewed literature from PubMed. We applied the machine learning algorithm, which showed a 92% recall in the validation, and collected data relevant to potential hazards from food and cosmetics chemical ingredients from the OpenAIRE research graph dump publications.gz. We merged the new information to the existing database of Ingredio’s 8,500 chemicals and provided this information through our app to European consumers. We followed metadata schemas of OpenAIRE to create a dump with the above-mentioned information in order to integrate it in the OpenAIRE research graph using the OpenAIRE standards and guidelines by interoperability between systems, findability and accessibility across infrastructure boundaries, ensuring sharing and reuse of research in a transparent and reliable manner. The algorithms were built such that they can support automation in a long-term fashion, we will periodically keep scrapping and mining the OpenAIRE research graph as the entries in the OpenAIRE exceed 30Mi and are constantly increasing.
Objective 2. Support the curation of the OPENAIRE data with appropriate metadata schemas for efficient integration of information in the OPENAIRE Research Graph. Metadata was used as defined by OPENAIRE to make scientific results findable and accessible across infrastructure boundaries, ensuring sharing and reuse of research in a transparent and reliable manner.
To achieve this goal, code was provided to OpenAIRE through their GitHub branch with the appropriate license. The service was integrated to the European Open Science Cloud through NI4OS-Europe at http://ingredio.ni4os.eu.The ingredio database was significantly enriched, which will aid our SME competitiveness. The major purpose of supporting metadata curation of the OpenAIRE Research Graph is twofold: First, we would like to enable users/relevant stakeholders to search for potential hazards of chemical ingredients in food and cosmetics through the OpenAIRE Research Graph entity registry that we will create. Second, we plan to use this data as a basis for enhancing our scoring function and data availability through our app.
The techniccal goals consisted of:
#1: Identifying new chemical ingredients from the OpenAIRE data to enrich the Ingredio Database and the OpenAIRE Research Graph
We constructed a model that takes as input a positively classified paper from the OpenAIRE data as relevant and outputs best candidate words that represent chemical compounds. To create such models we used the “Named-entity recognition (NER) or entity extraction” methodology, which seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations etc. To implement this we used Bidirectional Encoder Representations from Transformers (BERT). To train the BERT model we created a set of peer-reviewed articles that contain chemical ingredients and annotated with part of speech tags.
#2: Understanding the relation of chemical ingredients with the provided information
We built a model that can correlate the provided information with the potential hazard or the absence thereof. This amounts to determining whether a compound has a positive to notions like cancer, irritation, allergies and toxicity. To solve the problem of inferring the compound toxicity, we transformed the problem into a classification one. Namely, to identify whether a sentence has a meaning that can be described as “Compound A causes B” from now on these sentences will be referred to as causal. We assume that, if a given compound fulfils multiple times the criteria: both the name of a compound and the name of an adverse effect exist in a sentence and the sentence is classified as causal, then we can be more certain that the compound is related causally with the adverse effect.
In order to demonstrate the functionality of our demonstrated solution, we have deployed a web-server under the European Open Science Cloud (NI4OS-Europe) under the domain name http://www.ingredio.ni4os.eu. In this server, the user can input biomedical text and receive a classification output regarding whether there is a link between chemical ingredients of food and cosmetics with allergies, irritation, cancer and toxicity. The user can also input biomedical texts to extract chemical compound names. The user manual can be found here The Ingredio app is available to download in Google Play and App Store for free.
User groups and the impact of enhancing Ingredio
Ingredio provides services to consumers that embrace a healthy lifestyle, organic product companies, and companies that want to produce safer products and improve their practices. Our focus is to provide informed choices to consumers on the products they buy, feature organic product companies, and work with cosmetics and food companies on using healthier and greener ingredients for their products. The whole concept and workflow of Ingredio is attractive for commercialization since it is cost- and time-efficient, with high success rates in providing informed choices.
Consumers of organic products, vegans, new moms and healthy life stylists are our main targets for lead generation. This type of consumers already believe in a healthier life and are more likely not only to use our application to confirm their thoughts, but actually evangelize our product in social media.
Currently, 300 million people in EU & US buy organic products. Out of them, a 70% (~200 m) have a smartphone and have access to the internet on a daily basis (See Appendix C). US, Germany and France lead the organic market with €27.1, €7.9 and €4.8 billion (in market size) respectively. While Ingredio app can be used anywhere in the world, it is certain that these markets will have our intense interest, especially regarding our marketing efforts; as it is expected to produce the highest earnings in terms of our B2C planning. Moreover, vegetarians and vegans are estimated to reach 200 million people by 2020. The services that we provide aim these target markets and further aim to bridge the disconnection between the consumers concerns and the companies’ products. An analysis of our target groups is given in the following sections.
The application domain of Ingredio
Using the OpenAIRE data, we have enhanced the Ingredio app, while at the same time providing useful information to the OpenAIRE research graph. This information can be useful for the consumer market is changing towards the use of healthier products. However, as it will be discussed in the competition section, mobile app companies that identify the safety of chemical ingredients are very limited on the market and all of them pose geographical restrictions by scanning barcodes that are associated with specific countries. Moreover, there is significant app segmentation between food and cosmetics ingredients.
By being a chemical lab in the hands of consumers and companies that will use it in their everyday life, our company offers worldwide use and concentrates all features of other companies in one single app.
Our action plan to penetrate the market has three basic phases: a) Focus on developing our app and building the foundations for successful market entry, b) Increase the total number of users without disturbing them with ads, and c) provide the community with extra products and services. Services similar to those provided by Ingredio are very limited currently on the market, opening up a significant market opportunity.
Sustainability of Ingredio
From the primary market research that we made, we offer our app for free and provide a series of services including an advanced personal profile, detoxification guides and personalized healthy diets subscriptions and alternative product offers, available via in-app purchase. Regarding our B2B products, for the directory of bio/organic shops an annual fee is paid. This report is of added value for companies aiming for an in depth analysis from the aspect of consumers, regulations, hazard score of each ingredients, the latest research finding on each ingredient, suggestions on what ingredients to replace and many more detailed information that could be supplementary to the internal corporate chemistry labs.
The sustainability of our solution is based on the Ingredio Products, which create revenues through a B2C and a B2B model.
Related OpenAIRE Products/Services
- OpenAIRE Research Graph
- Information Inference Service (IIS), subsystem of the OpenAIRE (specific branch)
- OpenAIRE Research Graph Dumps
- OpenAIRE semantic linkage
- Webserver: http://ingredio.ni4os.eu
- Download and launch the either in Google Play or App Store
- Use the capture or upload icon to snap the ingredients of your product
- Scan a product's ingredients
- Wait for results
- Check which ingredients are safe or hazardous
- Read all about the ingredients listed in results
- Check ingredients from anywhere in the world
Training, Manuals, Helpful Material
- Link to the code
- YouTube Videos
- Website: http://www.ingred.io
- Github: https://github.com/ingredio
Dr. Zoe Cournia, Founder of Ingredio, is a Chemist with over 15 year of research experience. She is the Supervisor of this project and coordinator of the team. With her knowledge and experience in chemistry she curated the articles that were used to train the machine learning algorithm. Zoe also evaluates the classifier by checking whether the output is correctly classified or not and is supervising all activities as well as designing the project.
Eleni Cournia, MBA is the CEO of DENTICA LLC, which develops the Ingredio product. Eleni has overseen the financial operations of the project and has worked with financial assistants for the day-to-day logistics of the project, contracts of the personnel etc.
Alexis Chatzigoulas, MSc is a Developer in Ingredio and Machine Learning expert. Alexis works in the project by providing different approaches and ideas for each issue we have encountered so far.
Makis Kounadis is a Developer in Ingredio. He works as an advanced Android developer and a machine learning junior engineer. In this project, he has programmed the prototype mentioned above. More specifically, he created the scripts for the training set, applied data preprocessing, built the machine learning model, exported the classified articles and merged them with the Ingredio database.
Dimitris Papakonstantinou, MSc is a Developer in Ingredio and Machine Learning expert. Dimitris is working in developing and testing different machine learning approaches while also providing new ideas and methods in order to achieve the most accurate results.
We would like to thank the OpenAIRE team for their help and continuous support and in particular Harry Dimitropoulos, Marek Horst, Yannis Foufulas, Andrea Manocci, Alessia Bardi and Nektaria Berikou. Moreover, we would like to thank the NI4OS-Europe project of the European Open Science Cloud for providing us with access to a web-server, which we could use to upload and run our code and for their support. In particular, we would like to thank Dusan Vugragovic, Anastas Mishev, S Spasov, Emmanouil Atanasov, Marija Durchova, Kostas Koumantaros.