Home >The Catalogue>Services> PSNC/HPC4Poland, LINKED DATA PIPELINES SERVICE
SERVICES

PSNC/HPC4Poland, LINKED DATA PIPELINES SERVICE

TYPE
Data, AI & Technology, Product Development
REGION
Poland, Wielkopolska
LANGUAGE
English

The main goal of these pipeline is to define and deploy (semi-) automatic processes to carry out the necessary steps to transform and publish different input datasets for various heterogeneous sources as Linked Data, i.e. automatic processes for data preparation and integration using Linked Data. Hence, they connect different data processing components to carry out the transformation of data into RDF format or the translation of queries to/from SPARQL and the native data access interface, plus their linking, and including also the mapping specifications to process the input datasets. Each pipeline instance is configured to support specific input dataset types (e.g. same format, model and delivery form etc.).

SERVICE DESCRIPTION

The service enables the execution of pipelines having the following design goals:

  • Capability of a pipeline to be directly re-executable and re-applicable (e.g. extended/updated datasets)
  • Easy reusability of a pipeline
  • Easy adaptation of a pipeline for new input datasets
  • Automatic execution of a pipeline as far as possible, though the final target is to create fully automated processes

Pipelines should support both (mostly) static data and dynamic data (e.g. sensor data) Following the best practices for linked data publication, these pipelines i) take as input selected datasets that are collected from heterogeneous sources (shapefiles, GeoJSON, CSV, relational databases, RESTful APIs), ii) curate and/or pre-process the datasets when needed, iii) select and/or create/extend the vocabularies (e.g., ontologies) for the representation of data in semantic format, iv) process and transform the datasets into RDF triples according to underlying ontologies, v) perform any necessary post-processing operations on the RDF data, vi) identify links with other datasets, and vii) publish the generated datasets as Linked Data and applying required access control mechanisms.

The transformation process depends on different aspects of the data like format of the available input data, the purpose (target use case) of the transformation and the volatility of the data (how dynamic is the data). For the purpose of the specific pipeline tasks various components were used to reach the final goal of transformation of Linked Data. The list of relevant components identified and used in each pipeline instance will be discussed in the later subsections of this deliverable.

Another aspect of choosing the most suitable tools for transformation of the source data depends on the targeted usage of the transformed Linked Data and the goal for accessing the data integrated with other datasets also influences the preferred tools to be used. Finally, based on how often the data is changing (i.e. rate of change) the transformation methods and the related tools are to be further determined.

Based on the above-mentioned characteristics i.e. mode/format of input data sources there are broadly two main approaches for making the transformation for a dataset:

Data upgrade or semantic lifting, which consists of generating RDF data from the source dataset according to mapping descriptions and then storing it in semantic triple store On-the-fly query transformation, which allows evaluating SPARQL [2] queries over a virtual RDF dataset, by re-writing those queries into source query language according to the mapping descriptions. In this scenario, data physically stays at their source and a new layer is provided to enable access to it over the virtual RDF dataset. This applies mainly to highly dynamic relational datasets (e.g. sensor data) or RESTful APIs.

SPECIAL ACCESS CONDITIONS

No

PREREQUISITES

Mapping tools embedded in the service.

CASE EXAMPLES

Data integration Data upgrade Linked Data publication Knowledge discovery.

SUCCESS STORY

SERVICE CAN BE COMBINED WITH

Data Quality components, Data analytics components.

LINKS

SERVICE OFFERED BY

MEMBER
PSNC/HPC4 Poland
TYPE
DIH
COUNTRY
Poland

MORE INFORMATION ABOUT THIS SERVICE

Data Controller: INSTITUTO TECNOLÓGICO DE INFORMÁTICA (G96278734)
Purposes and legal basis: We will use your personal data to contact you back and answer your inquiries and provide you with information regarding our activity and in connection with our developments, research and services.
Data recipients: Your personal data will only be shared with the DIH your inquiry or information request may concern.
Rights: Regarding your personal data you have the right to access, rectify, erase, data portability, restrict processing , object, consent withdrawal and to file a complaint before the Supervisory Authority. More info
Exercise of rights: You can exercise the aforementioned rights by sending an e-mail to the e-mail address: dpo@iti.es or by sending a letter to the address Camino de Vera s/n, CPI Edif. 8, Acceso B, 46022 Valencia (Spain).