Hopsworks is an open-source big data platform developed by Logicalclocks (https://www.logicalclocks.com) hosted by RISE as a service part of the ICE Connect platform. It makes it possible for developers to implement and operate large-scale ML pipelines. For example, end-to-end ML pipeline orchestration, including data ingestion, data preparation, training and model serving as well as distributed deep learning on multiple GPUs.
Hopsworks is a big data platform that integrates popular platforms from the Apache Hadoop ecosystem, for example Apache Spark, TensorFlow, Airtflow, Kafka, and many others. All services provided by Hopsworks can be accessed using either a REST API or a User Interface. But the real value add of Hopsworks is that it makes big data and AI frameworks easier to use by introducing new concepts for collaborative data science (projects, users, and datasets) and ubiquitous support for TLS certificates, opening the platform for integration with the outside world (IoT/mobile devices and external applications).
Hopsworks is accessed via a user interface and allows users to access data, services. Each user is assigned a Hopswork project, which is a sandbox containing datasets, other users, and code. Users familiar with GitHub will recognize a project as the equivalent of a GitHub repository – users manage membership of the project themselves and what code and data should be in the project.
In a project, a user can have the role of “data owner” (the administrator) or a “data scientist”. Data scientists are restricted to only uploading programs, running programs, and visualizing results. Data scientists are not allowed change membership of the project or import/export data from the project. This enables data owners to manage the analysis of sensitive datasets within the confines of a project by inviting a data scientist into the project to carry out analysis.
- Feature Store - data warehouse for ML
- Distributed Deep Learning - faster with more GPUs
- HopsFS - NVMe Speed with Big Data
- Horizontally Scalable - ingestion, Dataprep, training, serving
- Notebooks for development - First-class Python Support
- Versioning - Code, infrastructure, data
- Model Serving on Kubernetes - TF Serving, MLeap, SkLearn
- End-to-End ML Pipelines - Orchestrated by Airflow
- Secure Multi-tenancy - Project-based restricted Access
- Encryption - TLS/SSL everywhere
- AI-Asset Governance - Models, Experiment, data, GPUs
Healthcare – manage and analyze genomic and medical data.
Finance – provide AI capabilities for fraud detection.
Betting – provide AI capabilities to address challenges from regulators and deal with cybercrime.
Any other use case involving dealing with massive amount of data and data analytics.