Research Projects

A BETTER project for exploiting Big Data in Earth Observation

The main objective of BETTER is to implement an EO Big Data intermediate service layer devoted to harnessing the potential of the Copernicus and Sentinel European EO data directly from the needs of the users.The value will be demonstrated through a series of use-cases, which will be identified following a number of workshops and hackathons with potential users of this free and valuable resource. Big Data architectures, based on the BigDataEurope platform, will be customised for each use-case.

More information


Project duration

November 2017 – October 2020

Link to the project page

Big Data Europe

Big Data Europe will provide support mechanisms for all the major aspects of a data value chain, in terms of the employed data and technology assets, the participating roles and the established or evolving processes. The effectiveness of the provided support mechanisms will be assessed in different domains pertaining to Europe’s major societal challenges with respect to the needs and requirements of the related communities. To this end, BigDataEurope focuses on providing an integrated stack of tools to manipulate, publish and use large-scale data resources; tools that can be installed and used freely in a customised data processing chain with minimal knowledge of the technologies involved and integrating and industrially hardening key open-source Big Data technologies and European research prototypes into a Big Data Integrator Platform, i.e. an ecosystem of specifications and reference implementations that are both attractive to current players from all parts of the data value chain while also lowering the entry barrier for new businesses.


DIACHRON intends to address and cope with certain issues arising from the evolution of the data such as: (a) Monitor the changes of LOD datasets (tracking the evolution); (b) Identify the cause of the evolution of the datasets in respect with the real world evolution of the entities the datasets describe (provenance problem); (c) Repair various data deficiencies (curation problem); (d)Temporal and spatial quality assessment of the harvested LOD datasets and determination of the datasets versions that need to be preserved (appraisal); (e) Archive multiple versions of data and cite them accordingly to make the reference of previous data feasible (archiving and citation); (f) Retrieve and query previous versions (time traveling queries) The DIACHRON solution aims not only to store previous versions for preservation in case of future need of them, but to create a live repository of the data that captures and highlights data evolution by keeping all data (current and previous) accessible, combined with a toolset that handles the full life cycle of the Data Web.

European Data Science Academy (EDSA)

The European Data Science Academy (EDSA) will establish a virtuous learning production cycle whereby we: a) analyse the required sector specific skillsets for data analysts across the main industrial sectors in Europe; b) develop modular and adaptable data science curricula to meet these needs; and c) deliver training supported by multi-platform and multilingual learning resources based on our curricula. The curricula and learning resources will be continuously evaluated by pedagogical and data science experts during both development and deployment.


Linked-Data-basierte Kriminalanalyse (Linked Data based crime analysis)

The Web not only gives rise to new forms of crime, it also enables new technology for crime investigation. Suspects leave traces on the Web, items are being sold and bought on the Web, and a wealth of public open data about organizations and places is available on the Web. LiDaKrA aims at a holistic approach to extract, network and fuse crime-relevant information from public and private sources such as: the Web in general, the Social Web (social networks, blogs or wikis), Deep Web (eCommerce databases such as ebay or Amazon Marketplace), Dark Web (informations from the Tor network), Data Web (open data such as DBpedia or GeoNames). The technical components will be implemented in an integrated platform and will be evaluated in concrete use cases with multiple stakeholders from crime investigation authorities.  


Enabling Linked Data and Analytics for SMEs by renovating public sector information

The LinDA project addresses one of the most significant challenges of the usage and publication of Linked Data, the renovation and conversion of existing data formats into structures that support the semantic enrichment and interlinking of data. The set of tools provided by LinDA will assist enterprises, especially SMEs which often cannot afford the development and maintenance of dedicated information analysis and management departments, in efficiently developing novel data analytical services that are linked to the available public data therefore contributing to improve their competitiveness and stimulating the emergence of innovative business models.



LUCID is a BMBF funded 2-year project which will change the way how partners in supply chain networks will communicate with each other. In LUCID we research and develop on Linked Data technologies in order to allow partners in supply chains to describe their work, their company and their products for other participants. This allows for building distributed networks of supply chain partners on the Web without a centralized infrastructure. LUCID is funded by the german Federal Ministry of Education and Research (BMBF) in KMU-innovativ: Informations- und Kommunikationstechnologien initiative, which part of the IKT 2020 - Forschung für Innovation funding programme.

Open Data Incubator for Europe (ODINE)

The first Open Data Incubator for Europe. Europe is supporting the next generation of digital businesses. The Open Data Incubator for Europe (ODINE) is part of that support - it helps European citizens build sustainable businesses using open data. It will offer up to €100.000 and will set up an environment and EU-wide network, including business angels, VCs and funding agencies, to support small and medium enterprises and startups in creating commercial value from open data.

The project aims to provide a generic framework and concrete tools for supporting financial transparency, thus enhancing accountability within public administrations and reducing the possibility of corruption. A key challenge for is to provide a framework that is scalable, easy-to-use, flexible, and attractive. We will apply the project concept to three pilot scenarios targeting three different applications related to public spending: journalism, transparency initiatives and citizenship engagement. This project will involve various stakeholders, including but not limited to public administrations, citizens, NGOs, media organisations and public service companies.


The business engine for IoT pilots: Turning the Internet of things in Europe into an economically successful and socially accepted vibrant ecosystem.

The vision is to build a broad and vibrant ecosystem around the pilot projects that increase the collaboration between them, generates economic impact through new innovative business models and creates trust in the internet of things by transparent information about social challenges such as privacy and security implications. The vision will be reached by focussing on following objectives: support the collaboration and knowledge exchange between pilots and other relevant EU-projects (e.g. FIWARE), build the bridge between pilots and relevant stakeholders (e.g. potential customers such as European small and medium sized enterprises (SMEs), entrepreneurs and developers, but also researchers and policy makers) and thus expand the ecosystem further, set the ground for upcoming business building activities by creating awareness and also by facilitating and fostering societal acceptance (e.g. by running a variety of innovation activities). Building on the above mentioned developments this initiative will set the ground for the development of concrete new customer-oriented businesses based on the emerging pilots. Those business models will be derived through a proven systematic user centric ideation and validation process increasing the market acceptance and success rate of these business models. The emerging new businesses shall have a substantial economic impact in Europe.


Semantics, Coordination and Reasoning

SeReCo (Semantics, Coordination and Reasoning) is a German-French doctoral college. Its scientific purpose of SeReCo is to explore the spectrum of technologies related to semantics, reasoning and coordination in distributed and open environments (such as the Web).

The scientific purpose of SeReCo is to explore the spectrum of technologies related to semantics, reasoning and coordination in distributed and open environments (such as the Web).

Our research works include the following topics:

  • Semantic data modeling: Linked data, Ontologies, Annotations
  • Reasoning on the Semantic Web
  • Knowledge extraction, knowledge modeling and knowledge integration
  • Adaptive applications and coordination Management and protection of identity, privacy and confidentiality
  • Multi-Agent-based modeling and programming of open and decentralized systems
  • Coordination models and technologies Self-Organizing systems
  • Context aware mobile applications (using mobile devices and sensors)

Several application domains will be addressed, including social network systems, e-health, enterprise information integration, open-access scientific publishing, and innovation platforms.


SlideWiki is a collaboration platform which enables communities to build, share and play online presentations. In addition to importing PowerPoint presentations, it supports authoring of interactive online slides using HTML and LateX. Slides and their containers (called Deck), are versioned thereby enabling change tracking. Users can create their own themes on top of existing themes or re-use other's themes.

SlideWiki aims to exploit the wisdom, creativity and productivity of the crowd for the creation of qualitative, rich, engaging educational content. With SlideWiki users can create and collaborate on slides, diagrams, assessments and arrange this content in richly-structured course presentations.

SlideWiki empowers communities of educators to author, share and re-use sophisticated educational content in a truly collaborative way. Existing presentations can be imported and transformed into interactive courses using HTML and LaTeX. All content in SlideWiki is versioned thereby allowing users to revise, adapt and re-mix all content. Self-test questions can be attached to each individual slide and are aggregated on the presentation level into comprehensive self-assessment tests. Users can create their own presentation themes. Slidewiki supports the semi-automatic translation of courses in more than 50 languages.

With SlideWiki we aim to make educational content dramatically more accessible, interactive, engaging and qualitative. More information about SlideWiki can be found in the documentation.


Smart infrastructures and citizens’ participation in the digital society are increasingly data-driven. Sharing, connecting, managing, analysing and understanding data on the Web will enable better services for citizens, communities and industry. However, turning web data into successful services for the public and private sector requires skilled web and data scientists, and it still requires further research. WDAqua aims at advancing the state of the art by intertwining training, research and innovation efforts, centred around data-driven question answering. Question answering is immediately useful to a wide audience of end users, and we will demonstrate this in settings including e-commerce, public sector information, publishing and smart cities. Steps to answering a question are (1) understanding a spoken question, (2) analysing the question’s text, (3) finding data to answer the question, and (4) presenting the answer(s). Every individual research project in WDAqua connects at least two of these steps.


Open Mobility Vocabulary

Future mobility poses new challenges for the innovative data-based services. Some examples are: route planning according to energy aspects or multimodal mobility services - sharing services, public transportation, taxis in complex environments. Development of such services requires integration of various data sources: e.g. map data, vehicle data, weather data, mobility service descriptions, events information, etc. These data sets often have proprietary data structures. The MobiVoc initiative MobiVoc intends to enable a data communication among all available data sources by providing a powerfyl vocabulary for modeling the mobility data.


A crowdsourcing platform to provide information about scientific events, research groups, tools, journals etc.

This semantic wiki at OpenResearch aims at making the world of science more visible and accessible. Information about scientific events, research projects, publishers, journals etc. is scattered around on the Web. For researchers (especially young ones without decades of experience) it is often difficult to find the relevant venues, people or tools. Also research is increasingly dynamic and multi-disciplinary, so the boundaries between communities blur and new research directions emerge. With this semantic Wiki, we aim to make information about scientific events, research groups, tools, journals etc. more accessible. OpenResearch is not restricted to any field of science.

Besides conducting their actual research, scholars often need to search for matching, high-profile scientific events to publish their research results, for projects relevant to their research, for potential partners and related research schools, for funding possibilities that support their particular research agenda, or for available tools supporting their research methodology. For lack of better support, they rely a lot on individual experience and informal community wisdom, they do simple Web or mailing list searches and are stuck with simplistic rankings such as calls for papers sorted by deadline. To better support researchers, we propose, a crowd-sourced collection of structured information related to research. You can think of OpenResearch being yellow pages, event calendar, news wire, project directory of science all together. Everybody can easily participate and all information is freely available (also for download and republication).


Supply Chain Organization Reference (SCOR) vocabulary

Advanced, highly specialized economies require instant, robust and efficient information flows within its value-added and supply chain networks. Especially also in the context of the recent Industry 4.0, smart manufacturing or cyber-physical systems initiatives more efficient and effective information exchange in sypply networks is of paramount importance. The Supply Chain Operation Reference (SCOR) is a cross-industry approach to lay the groundwork for this goal by defining a conceptual model for supply chain related information. We describe an semantics-based approach for facilitating information flows in supply networks and enabling a round-trip between the definition of metrics and KPIs as well as they automatized execution and propagation. It is centered around the SCORVoc vocabulary which represents the Supply Chain Council's SCOR standard entirely as an RDF vocabulary.


Skills and Recruitment Ontology (SARO)

The Skills and Recruitment Ontology (SARO) is a domain ontology representing occupations, skills and recruitment. It is modeled by considering several similar context models, but is mainly inspired by the European Skills, Competences, Qualifications and Occupations ontology (ESCO) and The ontology is structured along four dimensions: job posts, skills, qualifications and users.

Job posts refers to job advertisements by organizations. Advertised job openings comprise various essential attributes, such as the job role, title, the relevant sector and other related descriptions (defined by, e.g. job location, date posted, working hours, etc.).

One of the most important job requirements that are usually explicitly define is the list of qualifications fitting for this role, including fundamental skills which are required to fulfil the role. SARO also describes the proficiency level for each skill.

Skills are harnessed by a group of users based on their tasks. For example, an educator or trainer could develop training resources related to certain skills or competences. In order to do so, specific skill can be chosen by considering the skill gap of another user group, e.g. the domain specialist.

The ontology also considers and maintains two additional registers for Awarding Body (defined by ESCO) and Curriculum. The former is an official or otherwise recognized institution, organization or company which is able to provide qualifications and certifications. Based on these, curricula can be formed.


Interest-based RDF update propagation framework

iRap is an RDF update propagation framework that propagates only interesting parts of an update from the source dataset to the target dataset. iRap filters interesting parts of changesets from the source dataset based on graph-pattern-based interest expressions registered by a target dataset user.

Many LOD datasets, such as DBpedia and LinkedGeoData, are voluminous and process large amount of requests from diverse applications. Replication of Linked Data datasets enhances flexibility of information sharing and integration infrastructures. Since hosting a replica of large datasets, such as DBpedia and LinkedGeoData, is costly, organizations might want to host only a relevant subset of the data. However, due to the evolving nature of these datasets in terms of content and ontology, maintaining a consistent and up-to-date replica of the relevant data is a challenge. We present an approach and its implementation for interest-based RDF update propagation, which propagates only interesting parts of updates from the source to the target dataset. Our approach is based on a formal definition for graph-pattern-based interest expressions that is used to filter interesting parts of updates from the source dataset.


A Visualization Approach Employing Automatic Binding of Linked Data to Visualizations

As the Web of Data is growing steadily, the demand for user-friendly means for exploring, analyzing and visualizing Linked Data is also increasing. The key challenge for visualizing Linked Data consists in providing a clear overview of the data and supporting non-technical users in finding and configuring suitable visualizations for a specified subset of the data. 

In order to accomplish this, we are developing a largely automatic visualization workflow based on the state-of-the-art linked data approaches, which guides users through the process of creating visualizations by automatically categorizing and binding data to a visualization’s parameters. 

The LinkDAViz visualization workflow starts with the selection and exploration of a RDF dataset. After specifying the part of data to be visualized a ranked list of recommended visualizations is computed and presented to the user. Finaly, when one of the recommendations is selected, the resulting visualization is displayed, ready for customization. 

The screencast is available here.

OpenCourseWare observatory

A survey to assess the quality of Open CourseWare as a base project for SlideWiki project.

OpenCourseWare observatory is a currently a survey to assess quality of Open CourseWare. A number of selected courses from different OCW systems is assessed based on predefined metrics. The objectives of this study is to determine the quality of OCW which helps to: identify renowned OCW creators and publishers, diagnose the strengths and weaknesses of particular OCW, evaluate the employed creation and curation methods as well as predict the future performance of OCW.


The Web of Data contains a wealth of knowledge belonging to a large number of domains. Retrieving data from such precious interlinked knowledge bases is an issue. By taking the structure of data into account, it is expected that upcoming generation of search engines is approaching to question answering systems, which directly answer user questions. But developing a question answering over these interlinked data sources is still challenging because of two inherent characteristics: First, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain question. Second, constructing a federated formal query across different datasets requires exploiting links between these datasets on both the schema and instance levels. In this respect, several challenges such as resource disambiguation, vocabulary mismatch, inference, link traversal are raised. In this dissertation, we address these challenges in order to build a question answering system for Linked Data. We present our question answering system Sina, which transforms user-supplied queries (i.e. either natural language queries or keyword queries) into conjunctive SPARQL queries over a set of interlinked data sources. Sina encounters the following challenges: - Query segmentation - Query disambiguation - Query reformulation - Formal query construction - Data fusion on Linked Data - Query cleaning

Demand and Supply Cloud

Balancing the demand and supply of data through stakeholder match-making

The data value chain is an important concept that involves identifying the various activities and roles in manufacturing a non-tangible data product. In our information society, data increasingly becomes a commodity and the basis for many products and services. With this portal we strive to balance the demand and supply of data, with the aim of generating a new Economic Data Ecosystem that has the Web of Data as its core. Through the Demand and Supply as a Service (DSAAS), we enable data producers to advertise the data they produce, and thus data consumers can search for the data they require. The data consumers can also publish a request for specific data, if this is not already provided by a producer. In this way, we aim to enable and encourage data re-use and exploitation, providing the means to generate value through a data product.


Linked Data Publication and Consumption Framework

A  major obstacle to the wider use of semantic technology is the perceived complexity of RDF data by stakeholders who are not familiar with the Linked Data paradigm, or are otherwise unaware of the dataset's underlying schema. In order to help overcome this barrier, we introduce the concept of RDF softening, which aims to preserve the semantic richness of the data model while catering for simplified and workable views of the data. We address the softening objective with the ExConQuer Framework, which facilitates the publication and consumption of RDF in a variety of generic formats. Through the Query Builder Tool, we aim to lower the entry barrier for any stakeholder requiring the use of Linked Open Data. We enable the user to explore existing Linked Data and generate a SPARQL query, then proceed to download and convert the results in a number of formats. Through the PAM Tool, the user is able to explore existing queries executed on various datasets through filters, and re-load them on the Query Builder tool to edit or re-run them.


A Quality Assessment Framework for Linked Open Datasets

Luzzu is a Quality Assessment Framework that provides an integrated platform that: (1) assesses Linked Data quality using a library of generic and user-provided domain specific quality metrics in a scalable manner; (2) provides queryable quality metadata on the assessed datasets; (3) assembles detailed quality reports on assessed datasets. Furthermore, we aim to create an infrastructure that:

  • can be easily extended by users by creating their custom and domain-specific pluggable metrics, either by employing a novel declarative quality metric specification language or conventional imperative plugins;
  • employs a comprehensive ontology framework for representing and exchanging all quality related information in the assessment workflow;
  • implements quality-driven dataset ranking algorithms facilit- ating use-case driven discovery and retrieval.


Web-based semantic annotation tool for PDF documents

SemAnn allows you to semantically annotate (using RDF triples) text in PDFs. These annotations are then used for recommending similar PDF documents that the reader might find relevant.

SemAnn is an open source web-based semantic annotation tool for PDF files with a special focus on academic publications. SemAnn allows users to collaboratively annotate text, thus making knowledge contained in those PDF files accessible as RDF graphs for further querying. The tool can be used with arbitrary ontologies as annotation vocabularies. The user can enter annotations of various levels of expressivity – from simple typed annotations (e.g. annotations typed as DBpedia resources or ontology classes) to describing relationships between annotations themselves (e.g. describing the citation context of an annotation). Structural context of annotations is made available for querying by the tool’s capability of tracking the hierarchy of annotations. This enables reasoners to answer questions such as “find papers where the problem statement of the paper addresses dynamic programming languages.”. It is hence capable of viewing annotations in the context of scientific discourse like the motivation, problem statement, etc (but not limited to it). With its recommendations of similar papers, SemAnn provides an immediate benefit in return for making the effort of annotation. The justification of recommendations includes information about matches by structural context. Code is available on Github.


Semantic Tags for Open Data Portals

Semantic Tags for Open Data Portals interlinks several Open Data Portals through Global Semantic Tags.

Semantic Tags for Open Data Portals is a platform that hosts Global Tags, interlinking several Open Data Portals (ODP) both from governments and civil society through meaningful tags. Using the MUTO ontology, local tags in each portal can be connected to the STODaP server, making it easy for users to access several data sources tagged similarly in different portal. The relation between tags is driven by translations and synonyms, and also by co-occurence.

The work is a partnership between Greco, at Federal University of Rio de Janeiro, and EIS, at University of Bonn and Fraunhofer IAIS.