Agent operations framework adr #64

CDimonaco · 2025-01-27T10:24:37Z

This PR includes the adr for operations framework on the agent side.

nelsonkopliku

Nice, thanks! Just dropping a couple of questions for me to better understand.

nelsonkopliku · 2025-01-27T13:59:01Z

adr/0017-agent-operations-framework.md

+Following the new epics and product decisions, we need to build a framework capable of executing write operations on SAP machines.
+To achieve this, we have decided to use the agent, as it runs on the machines with root privileges, has all the necessary permissions to execute commands, and includes infrastructure to consume messages from other components of Trento.
+
+Unlike fact-gathering operations, we have chosen to delegate this new development to a dedicated library rather than embedding the code directly into the agent's codebase. This approach ensures better separation of concerns and allows the release of additional tools supporting operations without affecting the agent's codebase.


This approach ensures better separation of concerns and allows the release of additional tools supporting operations without affecting the agent's codebase.

The agent would still need to be rebuilt in order to get new features from an updated version of the library, correct?

Exactly is a golang library, it's like the gatherers of the checks, they are built into the code (Except there are no plugins for operators)

Yes, correct.

By the way, this part, more than the Context chapter is a technical decision, so I would write it in the Decision chapter as-is.

nelsonkopliku · 2025-01-27T14:00:56Z

adr/0017-agent-operations-framework.md

+The requirements for this new development are very specific:
+
+- Operations must be atomic and include rollback capabilities.
+- Operations must accept arguments to enable actions in different contexts without requiring dedicated code for each case.


question: I believe I am missing some details here

Operations must accept arguments to enable actions in different contexts without requiring dedicated code for each case

Could you help out with an example or point to some extra information to properly understand, please?

Take the saptuneapplynote operator, it should not be hardcoded on a single note but accept the note as argument so it can be used with different notes on different machines (It's already like that)

The first operation we are working on is saptune. Applying saptune solutions specifically.
Different hosts need different solutions. We have HANA, NETWEAVER, S4HANA, HANA=S4HANA, etc.
So we are going to send this value as argument. This way, the operation will do a slightly different thing based on the argument.

nelsonkopliku · 2025-01-27T14:06:52Z

adr/0017-agent-operations-framework.md

+The Registry manages all available operators.
+Each operator has a version. By default, the latest version is fetched if no specific version is requested.
+
+The operator naming convention is: `<operatorname>@<version>`


question: is the operatorname just the operationID mentioned after, or are they two different things?

Yes we can say that the operatorname is like the name of the gatherer the operationID is like the execution_id of a check execution a unique identifier for an operation

different things. The operatorname is a unique value that identifies the operator.
The operationID is the ID of the current operation execution so to say

arbulu89

Hey @CDimonaco
Great job.
I have noted some nitpick things.
Besides that, I miss what the operator returns as information.
I understand that it returns the diff and some sort of result: if the operation went OK.
I remember that at some point we talked about returning the stage where the operator stopped.
At the end, wanda needs to know if the operation in this agent"

Updated something
Didn't update
Rolled back
Failed (so the user needs to check)

arbulu89 · 2025-01-27T15:26:09Z

adr/0017-agent-operations-framework.md

+Following the new epics and product decisions, we need to build a framework capable of executing write operations on SAP machines.
+To achieve this, we have decided to use the agent, as it runs on the machines with root privileges, has all the necessary permissions to execute commands, and includes infrastructure to consume messages from other components of Trento.
+
+Unlike fact-gathering operations, we have chosen to delegate this new development to a dedicated library rather than embedding the code directly into the agent's codebase. This approach ensures better separation of concerns and allows the release of additional tools supporting operations without affecting the agent's codebase.


Yes, correct.

By the way, this part, more than the Context chapter is a technical decision, so I would write it in the Decision chapter as-is.

arbulu89 · 2025-01-27T15:29:04Z

adr/0017-agent-operations-framework.md

+The requirements for this new development are very specific:
+
+- Operations must be atomic and include rollback capabilities.
+- Operations must accept arguments to enable actions in different contexts without requiring dedicated code for each case.


The first operation we are working on is saptune. Applying saptune solutions specifically.
Different hosts need different solutions. We have HANA, NETWEAVER, S4HANA, HANA=S4HANA, etc.
So we are going to send this value as argument. This way, the operation will do a slightly different thing based on the argument.

arbulu89 · 2025-01-27T15:39:33Z

adr/0017-agent-operations-framework.md

+
+
+These methods are invoked by the Executor, which wraps the operator. All operators are exposed through a constructor function, returning operators already wrapped in an Executor.
+Executor


question: Should this Executor go as header?

arbulu89 · 2025-01-27T15:40:25Z

adr/0017-agent-operations-framework.md

+These methods are invoked by the Executor, which wraps the operator. All operators are exposed through a constructor function, returning operators already wrapped in an Executor.
+Executor
+The Executor is a wrapper around an operator that manages operations transactionally.
+For library users, the Executor is transparent—operators are already wrapped within an Executor.


typo: i think this transparent-operators is some sort of typo

arbulu89 · 2025-01-27T15:41:26Z

adr/0017-agent-operations-framework.md

+The Registry manages all available operators.
+Each operator has a version. By default, the latest version is fetched if no specific version is requested.
+
+The operator naming convention is: `<operatorname>@<version>`


different things. The operatorname is a unique value that identifies the operator.
The operationID is the ID of the current operation execution so to say

arbulu89 · 2025-01-27T15:42:32Z

adr/0017-agent-operations-framework.md

+
+The library handling operations is named [workbench](https://github.com/trento-project/workbench).
+
+## Glossary


Honestly, I don't really like this Glossary chapter. It is rather confusing and I think the subsequent next chapters perfectly explain all the details.
I don't think we need such a summary.
I would personally remove it

CDimonaco · 2025-01-27T15:47:21Z

Hey @CDimonaco Great job. I have noted some nitpick things. Besides that, I miss what the operator returns as information. I understand that it returns the diff and some sort of result: if the operation went OK. I remember that at some point we talked about returning the stage where the operator stopped. At the end, wanda needs to know if the operation in this agent"
1. Updated something

2. Didn't update

3. Rolled back

4. Failed (so the user needs to check)

Yep we have the failed step I forgot to mention in the doc, thx!

CDimonaco · 2025-01-28T10:37:53Z

@arbulu89 I addressed the feedbacks, wdyt?

arbulu89

Perfect @CDimonaco
Super clear to me.
I just recommended 2 additional changes. Specially the operator/operation distinction. I don't like it, as it mixes things. Let's simply stick to the operator term here

arbulu89 · 2025-01-28T12:44:22Z

adr/0017-agent-operations-framework.md

+
+### Example:
+
+- **Operator**: `saptuneapplysolution` - Applies a SAP Tune solution using the solution name as an argument.


I don't really like this example. I think it is more confusing than helping, as both the operator and operation as mixed, and in essence both things are the same. I would have this as:

Find the next operator as example: `saptuneapplysolution` is an operator that applies saptune solutions. It will apply the solution given as argument. If a solution is already applied, no action is taken. In case of an error, revert the solution specifies as an argument to return to a state where no solutions are applied.

nitpick: Applies a saptune solution, it is the tool name. Let's stick to it.
Nobody knows it as SAP Tune hehe
The same for the next line

arbulu89 · 2025-01-28T12:51:37Z

adr/0017-agent-operations-framework.md

+
+If rollback fails, an error is returned without further action.
+
+Each operator implements these phases by satisfying the `phaser` interface:


I would put this Each operator implements... phrase and the code snippet in the main Operator chapter, after listing the 4 phases. Otherwise it looks like it belongs to the rollback sub chapter

balanza

Great job guys. That's a great feature to be added to the project.

As a little feedback, I struggle a bit with naming: operations, operator, write operations, operationId, operator name... just to name a few. Furthermore, such terms are common in the daily jargon and they might be easily misused and misunderstood.
I'd rather have a glossary for such terms to avoid ambiguities, or to choose less-ambigue names.

balanza · 2025-01-28T12:01:31Z

adr/0017-agent-operations-framework.md

+- Operations must be transactional, including distinct steps for prerequisites verification, commit, rollback, and validation of changes applied during the commit phase.
+- Operations must be versioned, with backward compatibility ensured for previous versions in the event of updates.
+- Operations must be idempotent.


By Operations we mean the single action performed by an agent instance, the workflow step that does an action on every agent instance, the entire workflow, something else?

This is true. I would rename this to Operator.
At the end, the agent is all about operators. I would try to avoid any Operation wording.
At the end, and operation is a set of steps. And each step requests a operator execution in the agents.
The agents, by know doesn't have the knowledge of the Operation as a whole. It is only responsible of knowing and executing this small pieces of code now called Operators

balanza · 2025-01-28T13:44:13Z

adr/0017-agent-operations-framework.md

+The operator follows a transactional workflow, which includes the following distinct phases:
+
+- **PLAN**
+- **COMMIT**
+- **VERIFY**
+- **ROLLBACK**


This applies to a single write operation, right? If an operator executes multiple write operations, will they be in the same transaction?

Also, so far we are not considering distributed transactions across multiple operators in the same operations, correct?

This applies to a single write operation, right? If an operator executes multiple write operations, will they be in the same transaction?

I guess one operator can do more than one "write" operation in the same transaction, so we can rollback the whole process. The example is the corosync file change and reload. It is the same like re-configuring nginx (easier example). You change the nginx configuration file and restart the daemon. If the restart fails, you restore the initial config file and restart. I guess we could include all this in the same operator. Otherwise things start getting complex. We didn't have yet the chance to implement such a thing though, so this can change.

Also, so far we are not considering distributed transactions across multiple operators in the same operations, correct?

damn, now I feel you pair with the terminology XD
The operator shouldn't never know that there are other operator executions in the same agent or operation. That's the meaning of "atomic" I guess. The operator only knows to do one single thing deterministically.
That's the difficulty on implementing complex multi-step operations and their rollbacks. That rollbacking operations with multiple steps is difficult. That's why we are postponing this implementation and design

CDimonaco added the documentation Improvements or additions to documentation label Jan 27, 2025

CDimonaco requested review from balanza and arbulu89 January 27, 2025 10:24

CDimonaco self-assigned this Jan 27, 2025

nelsonkopliku reviewed Jan 27, 2025

View reviewed changes

arbulu89 reviewed Jan 27, 2025

View reviewed changes

Agent operations framework adr

205dda5

CDimonaco force-pushed the agent_operations_framework_adr branch from 8277adc to 205dda5 Compare January 28, 2025 10:37

arbulu89 approved these changes Jan 28, 2025

View reviewed changes

balanza reviewed Jan 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent operations framework adr #64

Agent operations framework adr #64

CDimonaco commented Jan 27, 2025

nelsonkopliku left a comment

nelsonkopliku Jan 27, 2025

CDimonaco Jan 27, 2025

arbulu89 Jan 27, 2025

nelsonkopliku Jan 27, 2025

CDimonaco Jan 27, 2025

arbulu89 Jan 27, 2025

nelsonkopliku Jan 27, 2025

CDimonaco Jan 27, 2025

arbulu89 Jan 27, 2025

arbulu89 left a comment

arbulu89 Jan 27, 2025

arbulu89 Jan 27, 2025

arbulu89 Jan 27, 2025

CDimonaco Jan 27, 2025

arbulu89 Jan 27, 2025

arbulu89 Jan 27, 2025

arbulu89 Jan 27, 2025

CDimonaco commented Jan 27, 2025

CDimonaco commented Jan 28, 2025

arbulu89 left a comment

arbulu89 Jan 28, 2025

arbulu89 Jan 28, 2025

balanza left a comment

balanza Jan 28, 2025

arbulu89 Jan 28, 2025

balanza Jan 28, 2025

arbulu89 Jan 28, 2025



		These methods are invoked by the Executor, which wraps the operator. All operators are exposed through a constructor function, returning operators already wrapped in an Executor.
		Executor


		The library handling operations is named [workbench](https://github.com/trento-project/workbench).

		## Glossary


		### Example:

		- Operator: `saptuneapplysolution` - Applies a SAP Tune solution using the solution name as an argument.


		If rollback fails, an error is returned without further action.

		Each operator implements these phases by satisfying the `phaser` interface:

Agent operations framework adr #64

Are you sure you want to change the base?

Agent operations framework adr #64

Conversation

CDimonaco commented Jan 27, 2025

nelsonkopliku left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arbulu89 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CDimonaco commented Jan 27, 2025

CDimonaco commented Jan 28, 2025

arbulu89 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

balanza left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment