Looking for a Kubernetes-as-a-service solution to run GPU accelerated AI workloads? The RISE ICE Kubernetes service is a part of the ICE Connect platform and makes it possible to deploy and manage containerized workloads at scale.
The service is based on the Rancher Kubernetes platform (https://rancher.com) and runs on a powerful compute cluster consisting of more than 1100 Intel Xeon CPU cores, 4 TB memory, 2 PB storage and more than 140 Nvidia accelerated GPUs offering staggering 3 Petaflops of performance. The cluster is multi-tenant and shared among all users, thus making it possible to easily and cost efficiently scale elastic workloads to meet user needs.
Kubernetes is an open-source container orchestration engine to automate management of Docker containers and microservices. It offers service discovery, storage orchestration, automated rollouts and rollbacks, secret management and self-healing as well as many other features such as automatic scheduling to get the most of the hardware. Simply put it, Kubernetes provide the tools users need to build and deploy reliable and scalable distributed applications.
The RISE ICE Kubernetes service builds on the Rancher platform to offer additional features to the Kubernetes stack. This includes user management, access control, cluster monitoring and of course workload management.
Each user is offered a unique Rancher project, and a given number of resources set by a quota limit is then assigned to the project. Using the project, users can deploy Kubernetes workloads separated and isolated from other users. Rancher’s UI or REST APIs makes it possible for users to manage their workloads, for example setting quota limits to control how much resources a workload maximum can consume from the total project quota, managing storage volumes, or controlling network routing to publicly expose Kubernetes services to Internet users or other workloads.
The RISE ICE Kubernetes service also makes it possible to deploy and manage so-called Helm applications. Via a built-in “App store” users can publish their Kubernetes applications as Helm charts and make them available to other users. With a few “clicks” the users can easily setup, deploy and run complex applications which are typically up and running within a few seconds. Several Helm charts are already now available, for example a Helm chart to run GPU accelerated Jupiter notebooks. Several popular machine learning frameworks and toolkits including Tensorflow and Pytorch are supported.
Large-scale machine learning. Generic software-as-a-services.