Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Squashed commit of adding support for cli plugin validation reporting. #5665

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

edwbuck
Copy link
Contributor

@edwbuck edwbuck commented Nov 26, 2024

Closes #5626

Signed-off-by: Edwin Buck <edwbuck@gmail.com>
Signed-off-by: Edwin Buck <edwbuck@gmail.com>
@amartinezfayo amartinezfayo self-assigned this Nov 26, 2024
@amartinezfayo amartinezfayo added this to the 1.11.1 milestone Nov 26, 2024
This unit test crashes the server with an intentinally bad config.
The problem was, it crashed in the wrong location.

Signed-off-by: Edwin Buck <edwbuck@gmail.com>
Fix import order on fakedatastore.go
Comparison on boolean only, not boolean == true on server.go
remove whitespace in "// nolint" for listener.go, ca.go, which
was just added due to baseline merge conflict.

Signed-off-by: Edwin Buck <edwbuck@gmail.com>
Signed-off-by: Edwin Buck <edwbuck@gmail.com>
Signed-off-by: Edwin Buck <edwbuck@gmail.com>
@edwbuck
Copy link
Contributor Author

edwbuck commented Nov 27, 2024

@amartinezfayo All checks are passing. If you get the review in, I'll make an effort to address it prior to the holidays.

Signed-off-by: Edwin Buck <edwbuck@gmail.com>
Copy link
Member

@amartinezfayo amartinezfayo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @edwbuck for this contribution and all the effort done to have validation of plugin configurations implemented in SPIRE.
My main concern is around the execution of the Server.run function as part of the validate command. I think that this is problematic. I've added a comment about that.

I noticed that the validate command is not implemented in the agent. Are you planning to add that separately?

I also noticed that validation of plugin configurations is not exercised in unit tests. We should add tests for that.

pkg/common/catalog/catalog.go Outdated Show resolved Hide resolved
pkg/common/catalog/catalog.go Outdated Show resolved Hide resolved
pkg/common/catalog/catalog.go Outdated Show resolved Hide resolved
pkg/common/catalog/catalog.go Outdated Show resolved Hide resolved
pkg/common/catalog/plugin.go Outdated Show resolved Hide resolved
pkg/common/catalog/catalog.go Outdated Show resolved Hide resolved
ctx = context.Background()
}

err = s.Run(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs a new server instance. I think that this approach is problematic in multiple ways. It requires to maintain the ValidateOnly setting in the config, and have different flows of the run function depending of its value. Since we don't want to run a server instance, executing this function doesn't seem to be appropriate when running the validate command.
This requires that when the run function is updated, we need to remember that it's also called as part of the validate command, and consider if the new code can be executed as part of the validation or not.

I believe that as is in this PR, running the validate command will connect to the backend database, run migrations, create the data directory if needed and start the runtime collector for metrics. It also has the side effect of output the logs as if the server is running, which is a surprising behavior as part of a validate command, and makes the parsing of the output more difficult.
Although it could be changed to not do those things, I think that is an example of how problematic this can be if we miss to update the run function with the appropriate ValidateOnly condition. Running a non intended database migration can be particularly problematic.

I think that a better approach would be to not involve the run function as part of the execution of a validate command.
The validate command would just 1) Load the server/agent config as is it currently does (which validates the server/agent config) and 2) Go through each configured plugin and call the plugin's Validate function.

Copy link
Contributor Author

@edwbuck edwbuck Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't want to run a server instance

I think there is a disconnect here. The entire plan since the beginning was to run a limited server instance that could load the plugins, and then run the plugin Validate() funcs, as detailed over the last six months of development. This was mentioned in multiple contributor's meetings, and never met a hint of "we don't want to do this". The exact words were "we will launch a new server instance in such a way it doesn't interfere with the running server instance, it will load the plugins, validate the configuration, and then shut down."

The approach you suggested was the initial approach, and it didn't work. In a call with @azdagron on October 29th, the reason it didn't work was explained to me:

  1. The server config cannot launch the plugins without exposing various server services that the plugins will attempt to connect to prior to the ability to call Validate(). These included a datastore connection, a metrics connection, and a logging connection.
  2. The Plugin launching isn't partitioned in such a way that common/catalog can be used without the same data in server/catalog.
  3. The server/catalog depends on the various plugin setups that are specific to the server.
  4. Those setups include the server having initialized the support services the plugins may require. Passing nil into these support service setups doesn't work.

Andrew directed the solution to include a "ValidateOnly" flag that previously didn't exist, which is why the submission took a couple of extra weeks to refactor and debug. While the solution seems less elegant than other approaches, it actually resulted in less code (meaning less maintenance footprint).

The ValidateOnly returns from Run() to shutdown the server as soon as the plugins validate, before any of the processing of data is performed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed response. I've analyzed this and I still think that we can and should avoid executing Server.run as part of the validate command.
I believe that we can achieve this by having a separate function for validation (one for the server, one for the agent) that is similar to the catalog.Load function but it calls the Validate function instead of Configure. We should also be able to use non-functional loggers and metrics, avoiding to output logs. Validation of the datastore plugin configuration should also be possible, I think.

I've done a quick POC of how this would work, it can be found here: main...amartinezfayo:validate-server-cli
It can validate all configured regular plugins and the sql datastore plugin, without the need of opening database connections, creating the data directory or going through migrations. It should be updated to leverage all the notes from the Validate response and not only the error.

@@ -854,6 +858,10 @@ func (ds *Plugin) Configure(_ context.Context, hclConfiguration string) error {
return ds.openConnections(config)
}

func (ds *Plugin) Validate(_ context.Context, _ catalog.CoreConfig, hclConfiguration string) (*catalog.ValidateResult, error) {
return nil, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we decode the configuration and perform the validation based on the result of configuration.Validate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but it should go into a future Issue. The entire SQLStore plugins aren't 100% conformant with the "real" plugins, and I did some work to make them more conformant. However, going through each one and actually implementing a proper handling of "configure / validate" is a outside the scope of this effort.

cmd/spire-server/cli/validate/validate.go Show resolved Hide resolved
pkg/common/catalog/configure.go Outdated Show resolved Hide resolved
Signed-off-by: Edwin Buck <edwbuck@gmail.com>
@edwbuck
Copy link
Contributor Author

edwbuck commented Dec 3, 2024

Thank you @edwbuck for this contribution and all the effort done to have validation of plugin configurations implemented in SPIRE.
My main concern is around the execution of the Server.run function as part of the validate command. I think that this is problematic. I've added a comment about that.

That's because the prior attempts to add the functionality as Server.validate function(s) led to not enough of the server coming up to properly load the plugins. The intent was always to "launch a server instance with just enough to validate the plugins and shutdown". The solution here uses the ValidateOnly flag (suggested by maintainer @azdagron ) to shut the server down prior to it doing any processing. Alas, plugins can't be loaded into the server correctly unless the server has a datastore, a logging proxy, and a metrics server proxy.

Prior attempts along the path of creating new Server.validate function(s) also led to a large amount of code duplication, and while I don't enjoy the "ValidateOnly" flag, the code is less likely to require duplicate changes, increasing maintenance.

If we can identify other areas where we want to avoid staring items, we can always check it against ValidateOnly. My analysis of the system is not 100% complete, I didn't see any unnecessary systems launch (although I would happily accept suggestions for squelching the logging).

I noticed that the validate command is not implemented in the agent. Are you planning to add that separately?

The work on the server took a lot of time and effort. I would like to implement this for the Agent, but agent plugin verification is not as critical as server plugin verification. I've filed #5677 and will work on it as time permits.

I also noticed that validation of plugin configurations is not exercised in unit tests. We should add tests for that.

The entire validation commands in the cli are unlike any other cli command. They borrow heavily from the run command, and much of the logic is already covered in unit test under the run commands' unit tests. That's why we cannot use our typical unit testing of cli commands for the validate commands (and probably why nobody valued fixing it enough to make it work).

If we want to cut this sharing and refactor it into making a common, shared cli config / config file loading, then it will be significantly easier to structure the testing appropriately. Again, my goal was not to redo the cli commands as they exist, I'm trying to keep the commit minimal. I'm open to opening a new Issue / PR to fix this, but I would like to keep it outside of the scope of this effort (getting the plugins to report their validation notes).

@edwbuck edwbuck force-pushed the squash-validation-cli branch from 152d99b to 7064969 Compare December 3, 2024 18:20
Signed-off-by: Edwin Buck <edwbuck@gmail.com>
Tests include:

1. One minimal config file that should succeed.
2. One minimal config file that has an error in the disk keymanager.

Signed-off-by: Edwin Buck <edwbuck@gmail.com>
The logging output is redirected to /dev/null, so the actual
validation output can more easily be read.

Signed-off-by: Edwin Buck <edwbuck@gmail.com>
@edwbuck
Copy link
Contributor Author

edwbuck commented Dec 4, 2024

@amartinezfayo Thank you for you attentiveness and direction on this PR. I'm attempting to get the requested changes in as early as possible, to give you additional time to review them.

I'm tracking the remaining items here:

  • Remove logging output as it adds noise to the validation output.
  • Make all Validation plugin funcs return without error, using only the Valid flag to show if the plugin configuration is valid / invalid.
  • Ensure that the sql datastore is launched without the ability to perform database migrations
  • Restructure the call patterns such that the ValidateOnly flag doesn't exist.

The suggestion @amartinezfayo made to redirect logs to /dev/null has been followed and pushed; however, /dev/null is a distinctly POSIX construct, and I need the equivalent for a Windows system, so we don't release the validate command to potentially attempt to write to /dev/null on a Windows platform.

The last one "Restructure call patterns" will remove the ValidateOnly flag, but I additionally added fields to the same configure struct to capture the output of validation. With these additional fields, are they also to be removed?

This permits getting the entire output of the notes, and aligns
with the desired output in the code review.

As we don't know which plugin note was the failure, an additional
note is added when the plugin's configuration is invalid.

Signed-off-by: Edwin Buck <edwbuck@gmail.com>
@amartinezfayo
Copy link
Member

however, /dev/null is a distinctly POSIX construct, and I need the equivalent for a Windows system, so we don't release the validate command to potentially attempt to write to /dev/null on a Windows platform.

We should use the os.DevNull constant, that will be replaced by "/dev/null" on Unix-like systems and "NUL" in Windows, which is the way to refer a null device.

The last one "Restructure call patterns" will remove the ValidateOnly flag, but I additionally added fields to the same configure struct to capture the output of validation. With these additional fields, are they also to be removed?

Yes, I don't think that we need those additional fields and I think that they should also be removed. The output of the validation should be returned by the function called by the command (using the configv1.ValidateResponse type), without having to add additional fields in the server configuration. The main...amartinezfayo:validate-server-cli branch is updated to reflect that.

@amartinezfayo amartinezfayo removed this from the 1.11.1 milestone Dec 5, 2024
@amartinezfayo amartinezfayo added this to the 1.11.2 milestone Dec 5, 2024
Signed-off-by: Edwin Buck <edwbuck@gmail.com>
@rturner3 rturner3 modified the milestones: 1.11.2, 1.12.0 Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants