Skip to main content


May 3, 2021

Follow Us

Opscidia’s ontology generator

May 3, 2021


The purpose

A key problem in building efficient text-mining tools is to have proper domain-specific ontologies that can be used as building blocks for other text-mining applications. These ontologies can have many usages for researchers, innovators, librarians, and others.

The value of the Ontology Generator 

There are a lot of tools to text-mine non-technical texts and bring interesting concepts and main topics out of it. Some of these tools are based on ontologies to make Named-Entity Recognition for instance. Most of the time, these ontologies are produced by experts and very often, they simply don’t exist as soon as the topic is too narrow or technical.   
The project aims at addressing the difficult challenge of scientific topic detection on a text corpus that has no already human-made ontology. Researchers, Innovation professionals or librarians face this kind of challenge when they try to classify documents in a very specialized domain, or when they look for the next promising technologies in a specific domain for example.  
Hence, generating a specific ontology has many usages: it can be used as the first layer of larger intelligence tools, or as a classification tree for documents, or as a tool to explore a domain. 

The process

In the framework of OpenAIRE Open innovation call, Opscidia was selected to propose a solution to address this need. Opscidia is a French start-up created in 2019 with the mission of leveraging Open Science to increase the diffusion of research results and promote their reuse for innovation, public policy design and general public information.   

The solution proposed by Opscidia is an ontology generator that consists in three layers:

  • Data collection layer: here it consists mostly in harvesting the resources (API or Dumps of specific OpenAIRE communities)
  • Concept detection layer: A simple, unsupervised algorithm extracts and hierarchizes concepts related to a seed concept entered by the user. It can easily scale-up both with the amount of data and with the amount of users / requests. 
  • Visualization layer: A visualization tool represents graphically the produced ontology and links it back to the documents of the corpus from which the ontology was created. 

opscidia architecture

The results of the Ontology Generator

A simple tool for semi-automatic domain specific ontology creation has been built. 
It takes a concept as an input and extracts from a subset of OpenAIRE graph a hierarchical list of concepts associated with the user input. This list is displayed using a simple visualization layer and linked back to the scientific literature through OpenAIRE graph.

opscidia ontologies generator

You can also view a video that demonstrates the Opscidia operation within OpenAIRE in the follοwing video.

The solution is available at:

User groups and the impact of the Ontology Generator to them

Such services can benefit different groups of people willing to take advantage of recent innovations:

  • For research funders and policymakers to detect the future technological trends
  • For researchers to improve innovative text-mining approaches
  • For media and citizens, for example to fight against fake news.

For this last example, Opscidia is supported by the Vietsch Foundation to build a Science Checker, a free to access search engine that verifies scientific claims by means of analysing the pertinent and available scientific literature.

The application domain of the Ontology Generator

The exact same method could of course be applied to other OpenAIRE communities, and this is one possible continuation of this project. Similar approaches have also been used to help detect the emerging technological trends in a project with Future and Emerging Technologies group (FET) of the European commission. It can also be used to identify scientific documents useful to build a literature review. Other applications could be the inclusion into diverse downstream artificial intelligence pipelines, the enrichment of an existing ontology etc. 

Sustainability of the Ontology Generator

The Ontology generator is hosted and maintained by the Opscidia    team within its services production environment. Opscidia team is more than enthusiastic to extend this service beyond this project. Some possible extensions have already been drafted for other OpenAIRE communities and within the EOSC environment.

Please feel free to contact the Opscidia team if you have any suggestion.


OpenAIRE Covid 19 gateway:

One of the easiest use-case of this application is to promote the exploration of a community collection from OpenAIRE connect. The application built during this project is useful to draw a map of the concepts of the collection, and to link it to the content of the collection.
As an example, the methodology developed during the project has been applied on OpenAIRE Covid19 collection and integrated into the OpenAIRE Covid 19 gateway.  The result of the application of our methodology to the OpenAIRE COVID19 collection can be accessed here.

You can view the COVID-19 screencast in YouTube in the video above.


Requirements to use are very accessible. Users just need a browser and an internet access.  


YouTube Video (same as above):

Github: (access need to be granted - please ask the Opscidia team)


The team members are:

  • Sylvain Massip, CEO of Opscidia
  • Charles Letaillieur, CTO of Opscidia
  • Frejus Laleye, Timothée Babinet and Loic Rakotoson, developers at Opscidia

Contact person: Sylvain Massip


Exchanges with the technical and management OpenAIRE team have been extremely useful all along the project. We thank them for their availability.

Opscidia's social media

Please feel free to follow and get information about Opscidia through the following social media channels.