TeraLab provides dedicated, secure, sovereign, distributed environments for big data processing. TeraLab proposes a twofold offer: on the one hand, advice and support for the definition and specification of a customized distributed environment; on the other hand, a turn-key infrastructure to host such big data configuration. TeraLab will favour, whenever possible, the use of popular, open-source solutions, like Hadoop and Spark.
For each project hosted in TeraLab, we provision a workspace, which is a dedicated network of virtual servers. For this service, your workspace will be tailored for distributed big data processing and storage. A workspace is highly customizable in terms of CPU, RAM, hot storage, cold storage (backup and archive), Linux distribution, and security configuration.
The TeraLab team has experience in setting up distributed clusters with state-of-the-art tools such as Hadoop/Spark, MongoDB, or ElasticSearch. Additional big data solutions can be installed upon agreement. This service can be coupled with Data Science tools for distributed computing, such as PySpark, or editing tools like JupyterLab.
The user has a choice between two types of distributed environments, both natively supported by the Hadoop ecosystem:
- Single-node pseudo-distributed: good fit for early stage development and prototyping. Allows the user to get familiar with the platform and to translate classic code into distributed code. In this case, a more robust node can be supplied.
- Multi-node distributed cluster: better suited for more mature projects already implemented in distributed mode. It allows the user to experiment with different configurations to optimize their performance. In this option, the full processing and storage capacity allocated is divided among the nodes. The workspace is provided with an initial configuration of the Big Data tools. The user is responsible for any later customization.
SPECIAL ACCESS CONDITIONS
Acceptance of TeraLab's Terms and Conditions
A company analysing millions of historical records, which can be processed in parallel.
For over 5 years, the French post service has exposed their mail routing data for anomaly detection/analysis and routing process optimisation.
TeraLab has provided secure big data clusters to selected academics and startups in data science for processing several Terabytes of such routing data.
SERVICE CAN BE COMBINED WITH
Data Science Experimentation Kit.
Users of this service are kindly asked to include in their proposal an estimation of their required infrastructure (number of VMs, CPU, RAM and Storage for each VM).