Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(maint) Compare Environment#name in compiler #9136

Merged
merged 1 commit into from
Nov 15, 2023

Conversation

justinstoller
Copy link
Member

During catalog compilation, after loading the requested environment we load the node and server specified environment. Previously, We would then compare the Environment objects to determine if the requested environment matched the server specified environment. However, if the environment cache is flushed between those two environments being loaded they could be different object instances representing the same desired environment.

To resolve this race condition we now check the environment names (Environment#name canoncalizes to symbols to avoid symbol/string mismatch).

@justinstoller justinstoller requested a review from a team as a code owner October 26, 2023 15:55
@justinstoller justinstoller added the blocked PRs blocked on work external to the PR itself label Oct 26, 2023
@justinstoller
Copy link
Member Author

This is an attempt to resolve support request PE-36986 , a quick review would be appreciated but I want to wait until we hear back from the customer before merging.

@joshcooper
Copy link
Contributor

However, if the environment cache is flushed between those two environments being loaded they could be different object instances representing the same desired environment.

In theory, the Puppet::Node::Environment class overrides ==, eql? and hash

def ==(other)
return true if other.kind_of?(Puppet::Node::Environment) &&
self.name == other.name &&
self.full_modulepath == other.full_modulepath &&
self.manifest == other.manifest
end
alias eql? ==
def hash
[self.class, name, full_modulepath, manifest].hash
end

So it should be possible to check if two environment objects have the same value. But the check depends on the manifest and full_modulepath. When the env cache is flushed and the environment is reloaded, does it have a different manifest or modulepath due to versioned code symlinks?

That said, I think comparing environment names is correct, since all we care about is whether the agent's requested environment matches the server-specified environment (based on classification rules).

Tangent 1: It seems strange that we're using kind_of? instead of instance_of? as the former means you can compare remote environment with a non-remote one.

irb(main):016:0> a = Puppet::Node::Environment.create(:production, [])
=> <Puppet::Node::Environment:442580 @name="production" @manifest="no_manifest" @modulepath="" >
irb(main):017:0> b = Puppet::Node::Environment.remote(:production)
=> <Puppet::Node::Environment::Remote:458740 @name="production" @manifest="no_manifest" @modulepath="" >
irb(main):018:0> a == b
=> true

We're probably relying on that in various places, but it seems like a footgun.

Tangent 2: In theory, cached environment names should always be symbols, but it wasn't always the case https://puppet.atlassian.net/browse/PUP-10955.

@justinstoller
Copy link
Member Author

justinstoller commented Oct 26, 2023

@joshcooper great callout about overriding ==, and when using versioned dirs, yes the manifest/module paths will be different. I assume we should not alter that implementation and in this case explicitly compare names. Do you think so? If so, I'll add some of your findings to the comments.

@joshcooper
Copy link
Contributor

I assume we should not alter that implementation and in this case explicitly compare names

Yeah I'm pretty something is relying on being able to compare real Environment and fake Environment::Remote objects, probably in the configurer. So I'm 👍 to comparing names, since in this case we only care if the agent requested a different environment name than the server's node terminus said to use for compilation.

@justinstoller
Copy link
Member Author

Heard back from @vchepkov and it looks like this is working so far. Even if it does turn out not to be the solution to his problem I think it is still a legit bug that should be fixed and doesn't appear to cause regressions. I've updated the comment and commit message based on Josh's comments. I think this is ready to go now.

@justinstoller justinstoller removed the blocked PRs blocked on work external to the PR itself label Oct 31, 2023
@joshcooper
Copy link
Contributor

LGTM, could you add/update a test around here so that different Environment objects (with the same name) are set on the node and request? I think it's most realistic to set a remote environment (created via Puppet::Node::Environment.remote(name)) on the request.

@joshcooper joshcooper added triaged Jira issue has been created for this and removed triaged Jira issue has been created for this labels Oct 31, 2023
During catalog compilation, after loading the requested environment we
load the node and server specified environment. Previously, We would
then compare the Environment objects to determine if the requested
environment matched the server specified environment. However, if the
environment cache is flushed between those two environments being loaded
they could be different object instances representing the same desired
environment. Environment objects define a `==` method but require the
name and modulepath to be the same. If we are running with versioned
environment dirs those modulepaths may be different (ie different
versions of the same environment).

To resolve this race condition we now check the environment names
(Environment#name canoncalizes to symbols to avoid symbol/string
mismatch). The catalog already uses the server specified environment
which, if we are running with versioned environment dirs and have
different versions of the same environment, will be the most up to date
version of the environment.
@justinstoller
Copy link
Member Author

justinstoller commented Oct 31, 2023

Thanks, Josh! I've added a test now. (fwiw, I've also changed the comparison back to checking instances and saw the test fail)

@vchepkov
Copy link
Contributor

vchepkov commented Nov 3, 2023

FYI, running this patch since October, 26th, issues with empty catalog hasn't reoccured since

@justinstoller
Copy link
Member Author

I think this is waiting on an approval from @puppetlabs/phoenix , let me know if there's something else I can or should do!

@joshcooper joshcooper merged commit f739469 into puppetlabs:7.x Nov 15, 2023
12 checks passed
@joshcooper joshcooper added the bug Something isn't working label Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants