Access and support to a ready-to-use Big Data and AI experimental environment based on open source technologies as Spark and Jupyter, for developing/deploying big data processing and analytics applications. This service provides access to this infrastructure, together with experimentation support services.
SERVICE DESCRIPTION
This service brings access to a data processing experimentation environment based on open source technologies. In particular, it deploys a system consisting in a Jupyter instance and one or several virtual machines hosting a Spark service. The user specifies the number of machines and the required computing resources. Features:
- The Spark node cluster is provisioned as fully configured, with the corresponding interconnected master and workers (but both the number of nodes and necessary resources of each one are defined by the user).
- In all Spark nodes, HDFS is installed, allowing accessibility to datasets in an experiment.
- Password-secured.
- Some sample notebooks are provided including basic instructions to exemplify HDFS interaction and Spark session and context initialization.
- The technology is deployed in an automated way from Ansible playbooks, creating its own security group and network interface to be used in a cluster deployment.
- Python libraries available to be installed.
- Cluster monitoring. The service includes technical support for the configuration and deployment of the experimentation infrastructure, as well as issue reporting.
SPECIAL ACCESS CONDITIONS
Access is provided by SSH
PREREQUISITES
The user must be registered
CASE EXAMPLES
A user has a large dataset that needs to be analyzed and does not have the required computing resources not the know-how to deploy a Spark service. This service will provide a ready-to-use Big Data analysis stack (with Spark), and then the data can be uploaded and the analysis performed using Jupyter notebooks.
SERVICE CAN BE COMBINED WITH
It can be combined with any service that requires to process large amounts of data (with Spark) or perform machine learning tasks in python (with Jupyter). Examples could be databases, visualization tools, data ingestion tools, etc.