Skip to content

This repo covers Kubeflow Environment with LABs: Kubeflow GUI, Jupyter Notebooks on pods, Kubeflow Pipelines, Experiments, KALE, KATIB (AutoML: Hyperparameter Tuning), KFServe (Model Serving), Training Operators (Distributed Training), Projects, etc.

Notifications You must be signed in to change notification settings

omerbsezer/Fast-Kubeflow

Repository files navigation

Fast-Kubeflow

This repo covers Kubeflow Environment with LABs: Kubeflow GUI, Jupyter Notebooks running on Kubernetes Pod, Kubeflow Pipeline, KALE (Kubeflow Automated PipeLines Engine), KATIB (AutoML: Finding Best Hyperparameter Values), KFServe (Model Serving), Training Operators (Distributed Training), Projects, etc. Possible usage scenarios are aimed to update over time.

Kubeflow is powerful tool that runs on Kubernetes (K8s) with containers (process isolation, scaling, distributed and parallel training). Kubeflow can be installed on-premise (WSL2 or MiniKF), and Cloud (AWS, Azure, GCP; ref: https://www.kubeflow.org/docs/started/installing-kubeflow/)

This repo makes easy to learn and apply projects on your local machine with MiniKF, Virtualbox 6.1.40 and Vagrant without any FEE (Min: 16GB RAM, 6 CPU cores, 70-80 GB Disk space).

Prerequisite

  • Have a knowledge of
    • Container Technology (Docker). You can learn it from here => Fast-Docker
    • Container Orchestration Technology (Kubernetes). You can learn it from here => Fast-Kubernetes

Keywords: Kubeflow, Pipeline, MLOps, AIOps, Distributed Training, Model Serving, ML Containers.

Quick Look (HowTo): Scenarios - Hands-on LABs

Table of Contents

Motivation

Why should we use / learn Kubeflow?

  • Kubeflow uses containers on Kubernetes to run steps of Machine Learning and Deep Learning algorithms on the computer clusters.
  • Kubeflow provides Machine Learning (ML) data pipeline.
  • It saves pipelines, experiments, runs (experiment tracking on Kubeflow), models (model deployment).
  • It provides easy, repeatable, portable deployments on a diverse infrastructure (for example, experimenting on a laptop, then moving to an on-premises cluster or to the cloud).
  • Kubeflow provides deploying and managing loosely-coupled microservices and scaling based on demand.
  • Kubeflow is free, open source platform that runs on on-premise or any cloud (AWS, Google Cloud, Azure, etc.).
  • It includes Jupyter Notebook to develop ML algorithms, user interface to show pipeline.
  • "Kubeflow started as an open sourcing of the way Google ran TensorFlow internally, based on a pipeline called TensorFlow Extended. It began as just a simpler way to run TensorFlow jobs on Kubernetes, but has since expanded to be a multi-architecture, multi-cloud framework for running entire machine learning pipelines." (ref: kubeflow.org)
  • Kubeflow applies to become a CNCF incubating project, it is announced on 24 October 2022 (ref: opensource.googleblog.com).
  • Distributed and Parallel training become more important day by day, because the number of the parameters is increasing (especially deep learning models: billions to trillion parameters). Increasing parameter provides better results but it also causes the longer training and it needs more computing power. With Kubeflow, Kubernetes and containers, distributed learning is achieved with many GPUs. Please have look Training-Operators (Distributed Training) part for details.
  • CERN uses Kubeflow and Training operators to speed up the training (3D-GAN) on parallel multiple GPUs (1 single training time: From 2.5 days = 60 hours to 30 minutes, video/presentation: https://www.youtube.com/watch?v=HuWt1N8NFzU)

What is Kubelow

  • "The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable." (ref: kubeflow.org)
  • "Kubeflow has developed into an end-to-end, extendable ML platform, with multiple distinct components to address specific stages of the ML lifecycle: model development (Kubeflow Notebooks), model training (Kubeflow Pipelines and Kubeflow Training Operator), model serving (KServe), and automated machine learning (Katib)" (ref: opensource.googleblog.com).
  • Kubeflow is a type of ML data pipeline application that provides to create ML data pipeline (saving model and artifacts, running multiple times) like Airflow

How Kubeflow Works?

  • Kubeflow works on Kubernetes platform with Docker Containers.

  • Kubernetes creates the node clusters with many servers and PCs. Kubeflow is a distributed application (~35 pods) running on the Kubernetes platform. Kubeflow pods are running on the different nodes if there are several nodes connected to the Kubernetes cluster.

  • Containers include Python Machine learning (ML) codes that are each step of the ML pipeline (e.g. Dowloading data function, decision tree classifier, linear regression classifier, evaluation part, etc.)

    image

  • Containers' outputs can be able to connect to the other containers' inputs. With this feature, it is possible to create DAG (Directed Acyclic Graph) with containers. Each function can be able to run on the seperate containers.

    image (ref: kubeflow-pipelines towardsdatascience)

  • If you want to learn the details of the working of Kubeflow, you should learn:

      1. Docker Containers
      1. Kubernetes

What is Container (Docker)?

  • Docker is a tool that reduces the gap between Development/Deployment phase of a software development cycle.

  • Docker is like VM but it has more features than VMs (no kernel, only small app and file systems, portable)

    • On Linux Kernel (2000s) two features are added (these features support Docker):
      • Namespaces: Isolate process.
      • Control Groups: Resource usage (CPU, Memory) isolation and limitation for each process.
  • Without Docker containers, each VM consumes 30% resources (Memory, CPU)

    image (Ref: Docker.com)

    image (Ref: docs.docker.com)

  • To learn about Docker and Containers, please go to this repo: Fast-Docker

What is Kubernetes?

  • "Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available." (Ref: Kubernetes.io)

    image (Ref: Kubernetes.io)

    image (Ref: Kubernetes.io)

  • To learn about Kubernetes, please go to this repo: https://github.com/omerbsezer/Fast-Kubernetes

Installing Kubeflow

Kubeflow Basics

  • Kubeflow is an ML distributed application that contains following parts:
    • Kubeflow Jupyter Notebook (creating multiple notebook pods)
    • Kubeflow Pipelines
    • KALE (Kubeflow Automated PipeLines Engine)
    • Kubeflow Runs and Experiment (which store all run and experiment)
    • KATIB (AutoML: Finding Best Hyperparameter Values)
    • KFServe (Model Serving)
    • Training-Operators (Distributed Training)

Kubeflow Jupyter Notebook

  • Kubeflow creates Notebook using containers and K8s pod.

  • When user wants to run new notebook, user can configure:

    • which image should be base image under the notebook pod,
    • how many CPU core and RAM the notebook pod should use,
    • if there is GPU in the K8s cluster, should this use or not for the notebook pod,
    • how much volume space (workspace volume) should be use for this notebook pod,
    • should the existing volume space be shared with other notebook pods,
    • should persistent volume be used (PV, PVC with NFS volume),
    • which environment variables or secrets should be reachable from notebook pod,
    • should this notebook pod run on which server in the cluster, with which pods (K8s affinity, tolerations)

    image

    image

  • After launching notebook pod, it creates pod and we can connect it to open the notebook.

    image

    image

    image

  • After creating notebook pod, in MiniKF, it triggers to create volume automatically (with ROK storage class), user can reach files and even downloads the files.

    image

Kubeflow Pipeline

  • Kubeflow Pipelines is based on Argo Workflows which is a container-native workflow engine for kubernetes.

  • Kubeflow Pipelines consists of (ref: Kubeflow-Book):

    • Python SDK: allows you to create and manipulate pipelines and their components using Kubeflow Pipelines domain-specific language.
    • DSL compiler: allows you to transform your pipeline defined in python code into a static configuration reflected in a YAML file.
    • Pipeline Service: creates a pipeline run from the static configuration or YAML file.
    • Kubernetes Resources: the pipeline service connects to kubernetes API in order to define the resources needed to run the pipeline defined in the YAML file.
    • Artifact Storage: Kubeflow Pipelines storages metadata and artifacts. Metadata such as experiments, jobs, runs and metrics are stored in a MySQL database. Artifacts such as pipeline packages, large scale metrics and views are stored in an artifact store such as MinIO server.
  • Have a look it:

KALE (Kubeflow Automated PipeLines Engine)

  • KALE (Kubeflow Automated pipeLines Engine) is a project that aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows.

  • Kale bridges this gap by providing a simple UI to define Kubeflow Pipelines workflows directly from you JupyterLab interface, without the need to change a single line of code (ref: https://github.com/kubeflow-kale/kale).

  • With KALE, each cells are tagged and worklow can be created by connecting cells, then after compiling, Kubeflow Pipeline is created and run.

  • KALE feature helps data scientist to run on Kubeflow quickly without creating any container manually.

    image (ref: KALE Tags)

  • Have a look to KALE and KATIB Project:

KATIB (AutoML: Finding Best Hyperparameter Values)

KFServe (Model Serving)

Training-Operators (Distributed Training)

Minio (Object Storage) and ROK (Data Management Platform)

Project 1: Creating ML Pipeline with Custom Docker Images (Decision Tree, Logistic Regression, SVM, Naive Bayes, Xg Boost)

Project 2: KALE (Kubeflow Automated PipeLines Engine) and KATIB (AutoML: Finding Best Hyperparameter Values)

Project 3: KALE (Kubeflow Automated PipeLines Engine) and KServe (Model Serving) for Model Prediction

Project 4: Distributed Training with Tensorflow (MNIST data)

Other Useful Resources Related Kubeflow

References

About

This repo covers Kubeflow Environment with LABs: Kubeflow GUI, Jupyter Notebooks on pods, Kubeflow Pipelines, Experiments, KALE, KATIB (AutoML: Hyperparameter Tuning), KFServe (Model Serving), Training Operators (Distributed Training), Projects, etc.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published