-
Notifications
You must be signed in to change notification settings - Fork 27
[DEPRECATED] Converging on desired load balancer state
UPDATE: Please see Convergence Specification for summary
The converge
function takes as inputs the state of the world from Nova and Load balancers. This is enough information to let the converge function know if any servers need to be created, deleted, or updated (fixing their names, metadata, or what load balancers they belong to).
As such, the parameter load_balancers_with_cheese
contains a list of node details for each load balancer that is relevant to the scaling group being converged. This could be all the load balancers for a tenant, or it could just be an optimized list if there are load balancers we know don't matter (just as servers_with_cheese
could be a list of all servers for the tenant, or just all servers belonging to a particular scaling group).
converge
cares about all the load balancers so it can check that:
- the load balancers specified in the launch config indeed still exist (if not, the group should be put into an error state since we cannot scale up again)
- all the servers are on the right load balancers already
- deleted servers have been removed from the relevant load balancers
"Converging on the desired load balancer state" means ensuring (2) and (3).
First, some terminology...
The desired load balancer state for server X
means a set of up to 5 load balancer configurations, each configuration containing:
- the ID of the load balancer that
X
should be added to (note thatX
's ServiceNet IP is added to the load balancer - the LB service does not know about servers as such) - which port on
X
should be attached - what weight
X
should be added as - whether
X
should be moved to draining before being removed from the load balancer, and if so, the maximum amount of time autoscale should wait before forcibly removing it from the load balancer
The desired LB state, as defined above, specifies which load balancers the server should be added to. Does it mean those are the only load balancers the server should be added to?
Should the server be removed from all load balancers the tenant has that are not in the desired LB list?
I'm not sure, but it doesn't sound unreasonable. I'm generally convinced that either otter or the user is responsible for the box, but not both. This drastically reduces the valid state space. --lvh
This depends on whether we want to support only rolling updates (in the AWS/heat sense - replacing all servers that do not conform to the current configuration), or if we wanted to also support non-rolling updates to maintain compatibility with the current behavior (more on this further down).
The desired load balancer state is always specified by the current launch config. For any particular server:
- If it is an old server, and out of date:
- we create a new server that should be added to the new set of load balancers
- we delete the old server (if it is already active, then we delete it later, after the new server is up)
- If the server configuration matches the current launch config's server args, we make sure it has been added to the current launch config's load balancers and removed from all other load balancers
No matter whether the server is outdated or consistent with the current launch config, we look at the desired load balancer state on a server by server basis:
- we can get this by storing the desired load balancer info in the metadata, and counting that as our source of truth for each server
- we can version launch configs, and correlate a server with the launch config version that generated it. The stored launch config is then the source of truth for reach server. This would require us to pass all launch configs that may still be relevant to the converge function, though, and for us to keep track of launch config relevancy.
We ensure that each server is indeed attached to all the configured load balancers in its desired load balancer state.
The steps for converging for load balancers, given the state of the world, is:
- converge builds, for each server, a mapping of the IDs of the load balancers it is currently on, to the node details (how that server was added to the load balancer, what its port is, etc.)
- given these mappings and each server's desired load balancer state (more on this later), build a series of load balancer
IStep
s to converge on, for each server
To clarify, the mapping looks somewhat like this?
{$server_id: {$load_balancer_ids: {"port": port, "how_added": "???"}}}? --lvh
Note that the CLB service will give 422 for some number seconds after an update. This makes updating many nodes on a single load balancer tedious and slow.
Rather than build a single IStep
for each node change, converge can group all the adds and deletes for a particular load balancer and then issue a single add and/or delete call to each load balancer (see this previous discussion). For instance, instead of separate requests each for:
- Add server 1's IP to load balancer 1
- Add server 2's IP to load balancer 1
- Delete server 1's IP from load balancer 3
- Delete server 3's IP from load balancer 3
We can do the following 2 requests:
- Add server 1 and server 2's IPs to load balancer 1
- Delete server 1 and server 3's IPs from load balancer 3
Sounds great :-) Wouldn't we be able to do that as an "optimizing compiler" step that takes the ISteps and turns them into these more intelligent requests? I guess that does imply a function
to_request
, and not just a methodIStep.as_request
, but that doesn't sound like a problem :-) That way we can write the stupid version now and the smart version later. --lvh
Similarly to the 'is the desired load balancer state exclusive' question:
- Do we only remove the server from the load balancers in its desired state?
- Do we remove the server from all load balancers on the tenant?
Both should be doable if servers_with_cheese
contains the deleted servers from querying Nova's server details changes-since
endpoint. It contains the IP address the server had as well as its metadata.
The argument for removing the deleted server from all load balancers is that the server is gone, and its IP is no longer valid.
@manishtomar brought up a good point in that we have to remove the IP from the load balancers in a timely manner after the server is deleted, else the IP may be reused (possibly after 4 hours or so) and there is a remote possibility that some other server with that same IP is now attached to another load balancer.
That is indeed a good point. Maybe the idea that otter is either entirely responsible for some machines and their LBs or not at all could help here, as well. I.e. if we assume that the converger knows the total desired state of the world (all LBs it manages, all machines that should be attached to those LBs), it should also be able to remove stale IPs. Since convergence happens Often(TM), we could leverage that to fix it even if CLB doesn't always listen to our request, or a worker fails, or something like that. --lvh
This section is a description of the current behavior of autoscale in how it preserves old/out-of-date servers.
In the current autoscaling system, if the launch config is updated, only new servers launched after the change will be consistent with the new launch config. We do not delete the old ones unless there is a scale down event. There is no automatic replacement of old servers.
We want to make sure that each server consistently reflects some full launch config though. For example, if we have launch config A:
- server configuration A
- load balancers: 1, 2
And as it's scaling up, the user changes the launch config to B:
- server configuration B
- load balancers: 3, 4
We never want a server of configuration A to be attached to load balancers 3 and 4.
This is currently implemented by propagating the launch config in memory all the way through a launch server job.
The current behavior may have been a misfeature, but it's possible that there are people using it to do green/blue deployments. For example:
- For the first deploy, specify launch config 1, scale up by 5 servers.
- For the second deploy, specify launch config 2, scale up by 5 servers.
- For the third deploy, specify launch config 3, scale down by 5 servers (thus deleting those created by the first deploy) and then scale up by 5 servers.
Before changing to rolling updates only, we'd like to see if there are any users utilizing the system in this way.
At t=0, a scaling group has capacity 5, load balancers {A, B}
and server args v1
.
At t=1, the user configures new load balancers {C, D}
and server args v2
.
There are two possible approaches:
- "Evict": the machines are deleted and removed from
{A, B}
. 5v2
machines are provisioned. Once they are up, they are attached to load balancers{C,D}
. - "Transplant": the machines are moved from
{A,B}
to{C,D}
. A rolling upgrade is performed to change thev1
machines tov2
.
This question only arises when changing both server args and load balancers at the same time. Everyone agrees that if only the load balancer changes, you simply move the servers; if only the server args changes, you do a regular converge (rolling or non-rolling).
The downside of "evict" behavior is that there's a (longer) gap where there are no machines to answer requests.
The downside of "transplant" behavior is that you get a transient state that the user didn't ask for, i.e. a v1
machine on the new load balancers. (This state is transient because the converger would still be performing a rolling update.)
The user can get "transplant" behavior if the default is "evict". The user first switches the load balancers from {A, B}
to {C, D}
. The converger moves the servers. Once the servers are moved, the user switches the server args. The converger performs a rolling update from v1
to v2
.
The user can get "evict" behavior if the default is "transplant". The user first sets the desired capacity to 0, waits until all of the machines are gone, and then changes both server args and load balancers at the same time. Another, perhaps even easier way, is to simply create a new scaling group and axe the old one.
Neither of these options is great UX, particularly because they involve waiting until something reasonably internal has happened; but we can fix that with an API call if need be. The only question is what the default behavior should be.
It is probably also worth noting that this is really an edge case, and it probably isn't worth bending over backwards for.
I have a personal preference for "transplant". I think it would be easier to implement, because it eliminates an edge case. When you configure new load balancers in "transplant", you always move machines to the current load balancer, regardless of what happens to the server args. After that, you just do a regular converge for the machines on that server args, as if the load balancer change never happened. When you're doing "evict", you change that behavior depending on whether or not this is a new server args, so that's an extra code path. --lvh