Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce KubeVirt live migration enhancement #1348

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mansam
Copy link
Contributor

@mansam mansam commented Feb 13, 2025

KubeVirt Live Migration

Summary

Implement a pipeline for orchestrating Live Migration between Kubernetes clusters.
This pipeline represents a third migration type (cold, warm live) and the first
that has an entirely provider-specific implementation. The cluster admin will be
responsible for establishing connectivity between the source and target clusters.
KubeVirt will be responsible for the migration mechanics, including storage migration.
Forklift will only need to create resources on the source and target clusters and
wait for migration to complete.

Motivation

Migrating between clusters without VM downtime is a clear benefit to users. The motivation
to do this orchestration with Forklift is that it already does the hard work of building
the inventory of resources on the source, mapping source resources to the destination,
and managing the migration pipeline.

Goals

  • Orchestrate live migration of a Kubevirt VM from one cluster to another via a "live" Plan.
    • Ensure necessary shared resources are accessible on the destination, including instance types,
      ssh keys, and configmaps which may be mounted by multiple VMs.

Non-Goals

  • Automatically establish intercluster connectivity
  • Migrate resources unrelated to VMs that may be necessary for application availability
    after migration (services, routes, etc)
  • Implement live migration for providers other than KubeVirt

Proposal

User Stories

Story 1

As a cluster admin, I want to migrate a VM from one cluster to another to rebalance workloads
without downtime.

Implementation Overview

Forklift was designed with an assumption that the migration process is approximately
the same for each source hypervisor. This assumption lead to a design where the providers
all share the same two (cold, warm) migration pipelines with provider-specific implementations
of pipeline steps. It has become clear over time that this assumption has not held. A substantial
amount of provider-specific branching has been added to the pipelines over time, as well as branching
within the shared steps to deal with storage- or provider-specific idiosyncrasies.

KubeVirt live migration requires a workflow that is so different from cold and warm migration that it
is not reasonable to repurpose the existing pipelines for live migration; a new pipeline needs to
be implemented. Moreover, the live migration pipeline is entirely provider specific. Even if it
were possible to implement live migration for another source hypervisor, it would be so different
in requirements that the pipeline implemented for the KubeVirt provider would not be usable. Due to these considerations
it is necessary to design and implement a flow for using provider-specific migration pipelines.

Migration Prerequisites

Connectivity

The source and target clusters need to be connected such that KubeVirt can communicate cluster-to-cluster
to transfer state. Submariner is one option for this. In any case, configuring connectivity is an administrator
responsibility outside the scope of Forklift.

VirtualMachineInstanceTypes and VirtualMachinePreferences

Validation should check whether the target cluster has VirtualMachineInstanceTypes and VirtualMachinePreferences
that match those used by the VMs on the source cluster. This can be done by looking for resources with
the same name as those referenced by the source VMs, and then comparing the contents to see if they are
identical. If the referenced resources are not present or do not match, appropriate warnings should be raised.
Whether this should be a hard stop on the migration could be configured at the provider level.

Proposed Migration Pipeline
        {Name: Started},
        {Name: PreHook, All: HasPreHook},
        {Name: CreateEmptyDataVolumes},
        {Name: EnsureResources}, // secrets and configmaps
        {Name: CreateStandbyVM},
        {Name: CreateTargetMigration},
        {Name: WaitForTargetMigration},
        {Name: CreateSourceMigration},
        {Name: WaitForStateTransfer},
        {Name: PostHook, All: HasPostHook},
        {Name: Completed}
  • CreateEmptyDataVolumes: KubeVirt is going to handle storage migration, so all that is necessary
    for Forklift to do is create blank target DataVolumes.
  • EnsureResources: Any secrets or configmaps that are mounted by the VM on the source need to be
    duplicated to the target namespace. Multiple VMs could rely on the same configmap or secret, so Forklift
    will allow this step to pass if secrets or configmaps with the correct names (and Forklift labels) already
    exist.
  • CreateStandbyVM: The target VM needs to be created mounting the blank disks and any secrets or configmaps.
    It also needs to be created in the running state, with a special KubeVirt annotation indicating that the VM is to be
    started in migration target mode.
  • CreateTargetMigration: A VirtualMachineInstanceMigration needs to be created in the target cluster.
  • WaitForTargetMigration: Once the target VMIM is reconciled and ready, it will present a migration endpoint
    to use for the state transfer.
  • CreateSourceMigration: A VMIM must be created in the source cluster, specifying the source VM and the migration
    endpoint from the target migration.
  • WaitForStateTransfer: Once the source VMIM is created, KubeVirt will handle the state transfer and
    Forklift only needs to wait for the destination VM to report ready. KubeVirt will handle shutdown of the
    source VM.

CR Changes

The current implementation of the Plan CR has a boolean to indicate a warm migration, so the CR
needs to be extended to support other migration types. An optional string field must be added to accept a migration
type, that if populated takes precedence over the boolean flag.

Provider adapter changes

The provider adapter interface needs to be expanded to handle provider-specific migration paths.
A new "Migrator" component would be responsible for indicating whether the provider supports a given
migration path and whether it provides its own implementation of any portions of the migration path.

A draft of the new component interface might look something like this:

type Migrator interface {
	SupportsPath(path string) bool
	Itinerary(path string) (libitr.Itinerary, bool)
	Pipeline(path string) (libitr.Pipeline, bool)
	Predicate(path string) (libitr.Predicate, bool)
	Phase(vmStatus *planapi.VMStatus) error
}

The migration runner in plan/migration.go would be updated to defer to the provider
implementation if available. (Integration would be at the points where the itinerary
is selected, the pipeline is generated, and where individual phases are executed.)

Approaching it in this way allows the provider adapter to take responsibility for
portions of the migration flow (or the entire flow) without requiring a full reimplementation
of the migration flows for each provider all at once.

Security, Risks, and Mitigations

Forklift will require new access to read and create VirtualMachineInstanceMigration instances
on the source and target clusters. Otherwise, the usual security risks apply for cluster to cluster migrations: Forklift
has significant access to secrets and other resources on both clusters, and we need to ensure
that the user deploying the migration plan has the appropriate rights in the source and target
namespaces.

Design Details

Test Plan

Unit tests will be written to ensure that the Migrator component logic
behaves correctly and that the migration runner defers to provider specific
implementations correctly.

Integration tests need to be written to ensure that the Kubevirt live migration path
succeeds.

Upgrade / Downgrade Strategy

This enhancement requires an operator change to deploy a revised Plan CR and
updated controller image. Existing plans are compatible with the updated controller;
plans created using the new migration type field will appear to the old version of
the controller as though they were cold migrations. No special handling is required
to upgrade or downgrade since the changes are purely additive.

Signed-off-by: Sam Lucidi <slucidi@redhat.com>
@mansam mansam requested review from mnecas and yaacov as code owners February 13, 2025 21:18
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 15.37%. Comparing base (f1fe5d0) to head (3bb2a38).
Report is 17 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1348      +/-   ##
==========================================
- Coverage   15.45%   15.37%   -0.09%     
==========================================
  Files         112      112              
  Lines       23377    23837     +460     
==========================================
+ Hits         3613     3664      +51     
- Misses      19479    19888     +409     
  Partials      285      285              
Flag Coverage Δ
unittests 15.37% <ø> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

## Summary

Implement a pipeline for orchestrating Live Migration between Kubernetes clusters.
This pipeline represents a third migration type (cold, warm live) and the first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP: cold, warm, live

Comment on lines +49 to +50
* Migrate resources unrelated to VMs that may be necessary for application availability
after migration (services, routes, etc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like good idea for additional RFE

Comment on lines +134 to +136
The provider adapter interface needs to be expanded to handle provider-specific migration paths.
A new "Migrator" component would be responsible for indicating whether the provider supports a given
migration path and whether it provides its own implementation of any portions of the migration path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fully understanding what the path is supposed to represent, is it the migration type?

Copy link
Contributor Author

@mansam mansam Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, migration path == migration type.

@mnecas
Copy link
Member

mnecas commented Feb 19, 2025

@mansam do you know the status of the needed features in Kubevirt? Could you please link them here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants