Home >The Catalogue>Services> The Data Cycle Hub, Access and support to Big Data/AI stack environment
SERVICES

The Data Cycle Hub, Access and support to Big Data/AI stack environment

TYPE
Technology provision
REGION
Valencia
LANG
English, Spanish

Access and support to a ready-to-use Big Data and AI experimental environment based on open source technologies as Spark and Jupyter, for developing/deploying big data processing and analytics applications. This service provides access to this infrastructure, together with experimentation support services.

SERVICE DESCRIPTION

This service brings access to a data processing experimentation environment based on open source technologies. In particular, it deploys a system consisting in a Jupyter instance and one or several virtual machines hosting a Spark service. The user specifies the number of machines and the required computing resources.

Features:

  • The Spark node cluster is provisioned as fully configured, with the corresponding interconnected master and workers (but both the number of nodes and necessary resources of each one are defined by the user).
  •  In all spark nodes, HDFS is installed, allowing accessibility to datasets in an experiment.
  • Password-secured.
  • Some sample notebooks are provided including basic instructions to exemplify HDFS interaction and spark session and context initialization.
  •  The technology is deployed in an automated way from ansible playbooks, creating its own security group and network interface to be used in a cluster deployment.
  • Python libraries available to be installed.
  •  Cluster monitoring.

The service includes technical support for the configuration and deployment of the experimentation infrastructure, as well as issue reporting.

SPECIAL ACCESS CONDITIONS

Access is provided by SSH

PREREQUISITES

The user must be registered

CASE EXAMPLES

A user has a large dataset that needs to be analyzed and does not have the required computing resources not the know-how to deploy a Spark service. This service will provide a ready-to-use Big Data analysis stack (with Spark), and then the data can be uploaded and the analysis performed using Jupyter notebooks.

SERVICE OFFERED BY

MEMBER
The Data Cycle Hub
TYPE
DIH
COUNTRY
Spain

MORE INFORMATION ABOUT THIS SERVICE