-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce KubeVirt live migration enhancement #1348
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Sam Lucidi <slucidi@redhat.com>
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1348 +/- ##
==========================================
- Coverage 15.45% 15.37% -0.09%
==========================================
Files 112 112
Lines 23377 23837 +460
==========================================
+ Hits 3613 3664 +51
- Misses 19479 19888 +409
Partials 285 285
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
## Summary | ||
|
||
Implement a pipeline for orchestrating Live Migration between Kubernetes clusters. | ||
This pipeline represents a third migration type (cold, warm live) and the first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NP: cold, warm, live
* Migrate resources unrelated to VMs that may be necessary for application availability | ||
after migration (services, routes, etc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like good idea for additional RFE
The provider adapter interface needs to be expanded to handle provider-specific migration paths. | ||
A new "Migrator" component would be responsible for indicating whether the provider supports a given | ||
migration path and whether it provides its own implementation of any portions of the migration path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fully understanding what the path
is supposed to represent, is it the migration type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, migration path == migration type.
@mansam do you know the status of the needed features in Kubevirt? Could you please link them here? |
KubeVirt Live Migration
Summary
Implement a pipeline for orchestrating Live Migration between Kubernetes clusters.
This pipeline represents a third migration type (cold, warm live) and the first
that has an entirely provider-specific implementation. The cluster admin will be
responsible for establishing connectivity between the source and target clusters.
KubeVirt will be responsible for the migration mechanics, including storage migration.
Forklift will only need to create resources on the source and target clusters and
wait for migration to complete.
Motivation
Migrating between clusters without VM downtime is a clear benefit to users. The motivation
to do this orchestration with Forklift is that it already does the hard work of building
the inventory of resources on the source, mapping source resources to the destination,
and managing the migration pipeline.
Goals
ssh keys, and configmaps which may be mounted by multiple VMs.
Non-Goals
after migration (services, routes, etc)
Proposal
User Stories
Story 1
As a cluster admin, I want to migrate a VM from one cluster to another to rebalance workloads
without downtime.
Implementation Overview
Forklift was designed with an assumption that the migration process is approximately
the same for each source hypervisor. This assumption lead to a design where the providers
all share the same two (cold, warm) migration pipelines with provider-specific implementations
of pipeline steps. It has become clear over time that this assumption has not held. A substantial
amount of provider-specific branching has been added to the pipelines over time, as well as branching
within the shared steps to deal with storage- or provider-specific idiosyncrasies.
KubeVirt live migration requires a workflow that is so different from cold and warm migration that it
is not reasonable to repurpose the existing pipelines for live migration; a new pipeline needs to
be implemented. Moreover, the live migration pipeline is entirely provider specific. Even if it
were possible to implement live migration for another source hypervisor, it would be so different
in requirements that the pipeline implemented for the KubeVirt provider would not be usable. Due to these considerations
it is necessary to design and implement a flow for using provider-specific migration pipelines.
Migration Prerequisites
Connectivity
The source and target clusters need to be connected such that KubeVirt can communicate cluster-to-cluster
to transfer state. Submariner is one option for this. In any case, configuring connectivity is an administrator
responsibility outside the scope of Forklift.
VirtualMachineInstanceTypes and VirtualMachinePreferences
Validation should check whether the target cluster has
VirtualMachineInstanceTypes
andVirtualMachinePreferences
that match those used by the VMs on the source cluster. This can be done by looking for resources with
the same name as those referenced by the source VMs, and then comparing the contents to see if they are
identical. If the referenced resources are not present or do not match, appropriate warnings should be raised.
Whether this should be a hard stop on the migration could be configured at the provider level.
Proposed Migration Pipeline
for Forklift to do is create blank target DataVolumes.
duplicated to the target namespace. Multiple VMs could rely on the same configmap or secret, so Forklift
will allow this step to pass if secrets or configmaps with the correct names (and Forklift labels) already
exist.
It also needs to be created in the running state, with a special KubeVirt annotation indicating that the VM is to be
started in migration target mode.
to use for the state transfer.
endpoint from the target migration.
Forklift only needs to wait for the destination VM to report ready. KubeVirt will handle shutdown of the
source VM.
CR Changes
The current implementation of the Plan CR has a boolean to indicate a warm migration, so the CR
needs to be extended to support other migration types. An optional string field must be added to accept a migration
type, that if populated takes precedence over the boolean flag.
Provider adapter changes
The provider adapter interface needs to be expanded to handle provider-specific migration paths.
A new "Migrator" component would be responsible for indicating whether the provider supports a given
migration path and whether it provides its own implementation of any portions of the migration path.
A draft of the new component interface might look something like this:
The migration runner in
plan/migration.go
would be updated to defer to the providerimplementation if available. (Integration would be at the points where the itinerary
is selected, the pipeline is generated, and where individual phases are executed.)
Approaching it in this way allows the provider adapter to take responsibility for
portions of the migration flow (or the entire flow) without requiring a full reimplementation
of the migration flows for each provider all at once.
Security, Risks, and Mitigations
Forklift will require new access to read and create VirtualMachineInstanceMigration instances
on the source and target clusters. Otherwise, the usual security risks apply for cluster to cluster migrations: Forklift
has significant access to secrets and other resources on both clusters, and we need to ensure
that the user deploying the migration plan has the appropriate rights in the source and target
namespaces.
Design Details
Test Plan
Unit tests will be written to ensure that the Migrator component logic
behaves correctly and that the migration runner defers to provider specific
implementations correctly.
Integration tests need to be written to ensure that the Kubevirt live migration path
succeeds.
Upgrade / Downgrade Strategy
This enhancement requires an operator change to deploy a revised Plan CR and
updated controller image. Existing plans are compatible with the updated controller;
plans created using the new migration type field will appear to the old version of
the controller as though they were cold migrations. No special handling is required
to upgrade or downgrade since the changes are purely additive.