Task 10.1: IIS workflow management
This task further enhances the Information Inference Service (IIS) initially developed in OpenAIREplus, a flexible framework for large-scale information processing. It will integrate enhanced resource management tools for efficient workflow execution and provide support for so-called “training workflows” for automated, periodic calibration of machine learning algorithms. It will develop a visual workflow editor for the administration of the IIS components. It will accommodate changes and enhancements of the OpenAIRE data model in WP8, and will improve the result feedback processes to the production system.
Task 10.2: Classification and clustering
The existing supervised classification and visualization modules, trained on a wide range of categorizations, will be modified and applied on large datasets of publications, taking into consideration the temporal process of the contentanalytics results. The unsupervised mechanisms (topic modelling) will be integrated in the IIS and will be enhanced for spatiotemporal modelling for trend analysis over the years, or among different countries; semi supervised analysis based on topic modelling will analyze incorporating labelled and structured information in topic modelling (e.g., pdb codes, MeshTerms, ACM classification etc.). Cross-language techniques will be introduced and applied on the above
mechanisms publications in regional repositories and OA journals.
CONTACT: (name, email)
TASK 10.3: Entity resolution and linking
This task will develop state-of-the-art methods for extracting information from individual publications. It will include further calibration of existing metadata extraction tools (CERMINE) and will extend them with capabilities, for authororganization affiliation parsing. It will include: domain-specific concept mining (gene symbols, chemicals, organisms), citation sentiment analysis using the CiTO ontology for the next generation of metrics, and rudimentary narrative pattern analysis (structure of a document, e.g. for classification purposes). Enhancement to introduce linking publications with software and data repositories (BioMedBridges, GitHub, Zenodo, PSI datasets). This task will also develop vocabularies and ontologies for representing information extracted in those tasks in semantic formats. Results of this task will allow to share and distribute information about scholarly communication maintaining high level of interoperability with existing and projected systems.
TASK 10.4: Scholarly communication network analysisBased on the information extracted on T10.2 and T10.3 (citations and classification/clustering) this task will focus on the creation of a comprehensive map of relations in the academia between people (e.g., open citation index, co-authorship network), institutions, publications, funding sources, topics, data sets, software, etc. It will create knowledge networks based on discipline, time, location to identify structuring effects on specific research areas (e.g., health:diabetes), lasting effects of networks, creation of scientific societies or structures. It will employ advanced graph mining algorithms (e.g., PageRank) to provide deeper insight to the analysis. For known author-organization affiliations, the publication topic-author network will be further mapped to the European member states and map the organization involvement and interconnection. These tools will be used by experts in specific research fields to analyse and evaluate scientific quality and assess the long-term impact.
- Last updated on .