How will we do this? Together with our colleagues from CNR in Pisa and ARC in Athens, in the work package devoted to Knowledge Extraction Services (WP10), we will improve and expand the Information Inference Service created in the OpenAIREplus project. We will make improvements to the inference infrastructure: add visual workflow management and improve quality assurance. We will extend existing document content analysis functionality to extract information about structure of the document, affiliation of the authors, and sentiment of the citations. We will also enhance our automatic document classification functionality and introduce functionality of creating clusters of similar documents. We will also search for new types of links to outside knowledge bases, i.e., 3rd party, domain-specific repositories describing genes, chemicals, organisms, etc. Some solutions will be built from scratch, other will be based on software developed by the partners, like CERMINE and MadIS.Code on Github: Finally, we will work on better uptake of the project’s deliverables by making our results even more discoverable and usable by the general public. To that end, we plan to migrate our code to GitHub (star our repository now: https://github.com/openaire/openaire-mining) and to publish our data sets on Zenodo. Both the source codes and the data sets will be available on open licenses, of course! First deliverables in our work package are scheduled for August 2015. We’ll keep you up-to-date about our research on this blog, so stay tuned!
By Łukasz Bolikowski and Mateusz Kobos, ADA Lab, ICM, University of Warsaw.