Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi Instance Plugins #1324

Closed
wants to merge 13 commits into from
Closed

Conversation

narategithub
Copy link
Collaborator

This was a copy-paste typo. It should be
LDMS_STREAM_EVENT_UNSUBSCRIBE_STATUS, not
LDMS_STREAM_EVENT_SUBSCRIBE_STATUS.
This patch add multi-instance plugin infrastructure in ldmsd.
- Add plugin instance.
- Update format for consistency.
NOTE: This passed `ldms-test/multi_store_sos_test`, which tests both
      multi-instance loading and traditional loading.
NOTE: This passed `ldms-test/multi_json_stream_sampler_test` which tests
      both multi-instance loading and traditional loading. It also tests
      terminating the plugin instance.
When ldmsd daemonized (fork + exec), `zap_init()` function is called
again from the pthread atfork call chain. Because the logger has already
been registered before daemonizing, `ovis_log_register()` failed to
register the same logger after ldmsd was daemonized. This patch adds a
checking condition to not register the logger if we alread have the log
handle.
This passed `ldms-test/multi_store_csv_test`.
This passed `ldms-test/multi_store_avro_kafka_test`.
- Make procnetdev2 multiple instances
- Remove the 32 device list limitation
- Add `excludes` option to exclude the listed interfaces
  - For example, excludes=lo,eth0

NOTE: passed 'ldms-test/multi_procnetdev2_test'
@narategithub narategithub requested a review from tom95858 January 15, 2024 16:52
@morrone
Copy link
Collaborator

morrone commented Jan 17, 2024

I would really, really like to see some changes to the approach taken here.

First of all, I would love to see plugins only have to have implement known API functions, rather than bundling function pointers into an ldmsd_plugin struct. The ldmsd_plugin struct, and how it tracks the functions pointers from a plugin, are ldmsd implementation details and should not be exposed to the plugin author.

Next, I would really, really, really want to see us do away with this practice of forcing plugin authors to play tricks of wrapping the ldmsd_plugin struct with hidden state data after the end. The plugin instance's state data should be passed back explicitly when the intance is initialized (for instance, it could be the return value from config()). The state data for an instance should be explicitly passed in to every subsequent call. This is pretty standard in most C code that I see. Moving things from hidden-and-implicit to being explicit in the API will make things much easier to understand for new plugin authors.

We need a new plugin API call to allow the plugin to express whether or not it is multi-instance capable, and ldmsd should guarantee that if the answer is "no", that it will not allow attempts to configure more than one instance.

I think it may be in scope, but maybe this is more debateable: we need the API calls fully documented in the header file, including their semantics. As things stand, the semantics are muddy at best. For insance, it should absolutely be enforced by ldmsd that it will never call deconstructor functions on an instance that has not had its constructor called. The sample() functon should never becalled on something that has not had its constructor called, or after the destructor has run. We shouldn't have to have every single plugin author reinvent the wheel when we can handle these semantics once in ldmsd, and document the semantics.

@morrone morrone self-requested a review January 18, 2024 00:04
Copy link
Collaborator

@morrone morrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see API changes.

@morrone
Copy link
Collaborator

morrone commented Jan 18, 2024

I don't know how much further review I should continue to do. If my high level requests are addressed, I think a number of things will naturally fall out of that and get better.

For instance, the lack of symmetry and clarity in just allocating and freeing the memory for plugin instances needs to be addressed. The plugin author (at least for a store that I am looking at right now) needs to allocate the instance state in get_plugin_instance(). But there is no matching put_plugin_instance(). Instead the author must free the memory in ptr->store.base.term().

We need to call "ldmsd_store_alloc()" in get_plugin_instance(). We would then expect to call ldmsd_store_free() in term(), right? But instead we need to call ldmsd_store_cleanup() and then free().

extern int ldmsd_config_init(char *name);
struct ldmsd_plugin_cfg *ldmsd_get_plugin(char *name);
struct ldmsd_plugin *ldmsd_get_plugin(const char *name);
void ldmsd_put_plugin(struct ldmsd_plugin *pi);
Copy link
Collaborator

@morrone morrone Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of having both ldmsd_put_plugin and ldmsd_plugin_put. Too close in naming, and there is no documentation here about what they do either.

@morrone
Copy link
Collaborator

morrone commented Jan 18, 2024

What are the symantics of sample() in the new paradigm? Does sample() need to be reentrant, or will ldmsd ensure that it is not required?

@narategithub
Copy link
Collaborator Author

@morrone Thank you for a detailed review :)

I think the structure extension is quite standard practice. And I'm not sure if we should force the names of the functions. I think @tom95858 can elaborate on this matter better than me.

After plugin creation, all calls from ldmsd to the plugin always supply the plugin structure that was created by the plugin. It is the same as self in Python. The plugin can track its state in the extended part of the structure.

I agree with you that the plugin creation is quite cumbersome. I'll change the ldmsd_sampler_alloc() and ldmsd_store_alloc() to also receive the necessary options and all common initialization can be done in there.

The existing plugins that does not have get_plugin_instance() is guaranteed to have only 1 instance. If the plugin implements get_plugin_instance(), then it must support multi instances.

sample() is thread safe by plugin instance. What I mean is that sample() function of a plugin (e.g. procnetdev2) can be called simultaneously but for different instances of that function.

There is also a lack of developer documentation that shall be addressed. The asymmetric that you mention should be either addressed or explained in the next update.

Again, thank you for the review :)

@morrone
Copy link
Collaborator

morrone commented Jan 19, 2024

I think the structure extension is quite standard practice.

I can't speak to whether it is common in some circles, but it certainly isn't standard. I think you would be hard pressed to find that called out as a best practice in text books. In fact in most uses, it is not the best approach. There is no good reason to make the plugin's state data a hidden data at the end of ldmsd's internal pointer structure, when we can easily make them separate and explicit.

I think there are various reasons why this is wrong from a code readability and maintainability standpoint, but those would take a long discussion of softrware development pracitices to make a good case for. Here is a clear technical reason why this is a bad approach:

This approach virtually guarantees that plugin ABI compatibility is impossible. Any time we add a function to the API, or tweak the structure in any way, all plugins must be recompiled and tracked against a specific build of LDMS.

So one big reason that I believe that my proposed approach to plugin API design is that it entirely allows for making backwards-compatible API improvements in a clean and straight-forward way. New optional API functions do not break the ABI.

@morrone
Copy link
Collaborator

morrone commented Jan 19, 2024

Temporarily, to move the conversation forward, I'll set aside the idea of exporting all the of the API functions in a plugin module (rather than exporting one function that returns the struct of API function pointers). I think we can address my other major concern pretty easily.

To try to expand on why I think it is bad practice, in this case, to hide an instance's data at the end of the API function pointer structure, I think basically we are failing to have good separation of concerns. We are creating a mishmash of different concepts that don't really go together. In this case, I think there should be a clear division between:

  1. Loading and discovering a plugin's exported functions (and by "plugin", here I mean the shared-library module file)
  2. Launching and interacting with instances (of a sampler, store, etc) using the plugin's exported functions

In the proposed change, these are combined. But we should only load the module/plugin and learn its' exported functions once. On the other hand, instances and their state need to be capable of multiple interactions.

This PR proposes that we replace get_plugin() with get_plugin_instance(). Instead of that, how about we keep get_plugin(), and make its job to just return the struct of API function pointers. And we do not hide any plugin state hanging off the end of that structure.

Next we introduce a new pair of functions to the API struct:

  /* Construct a new instance
   * RETURN: Opaque pointer to the new instance's internal state, or NULL is unable to do so
   */
  void *instance_init([whatever parameters]);

  /* Destroy an instance 
    * IN self: Opaque pointer to the instance's internal state
    */
  void instance_fini(void *self);

By doing this, we eliminate any need to combine the API pointer struct (which is something that is only used internally in ldmsd), with the instance state (which is only something used internally in the plugin).

Since we are changing the API, I'd like to take the opportunity to improve naming a bit. Instead of calling it "get_plugin", maybe the name could be:

ldmsdplug_import

And perhaps we could then at least export a second function named something like:

ldmsdplug_export

I think these suggestions untangle the the two concepts (a module and an instance), improves encapsulation, and helps to introduce a stronger degree of degree of symmetry to the API.

@tom95858
Copy link
Collaborator

Hi @narategithub, we have mixed the ldmsd_noun_verb_ function naming and ldmsd_verb_noun_ in some places. We should make these consistent in the API. I think most of the API is ldmsd_noun_verb().

@tom95858
Copy link
Collaborator

@morrone, @narategithub let's get a bit more specific:

Requirements

  • Support a single plugin with multiple configurations
  • Support existing plugins that do not support multiple configurations
  • Make it easier, not harder to write plugins
  • Better documentation

Currently Non Requirements (but may be desirable)

  • Support running OVIS-4.3.x plugins in a OVIS-4.4.2 ldmsd
    • This creates an impossibly complicated test matrix that I don't believe that we want to undertake
  • Completely hide a plugin's context data inside a structure understood only by the plugin
    • It is currently not the case that a plugin's data is entirely private to the plugin, the most obvious example is the sample interval and offset. These are known to the updater in order to schedule calls to sample()
    • We could however, have plugin functions that export the data that needs to be shared, e.g. interval/offset.

Other Thoughts

Having a base structure that is overloaded to contain additional data for those plugins/libraries/clients that understand the extensions to the structure is quite common in my experience; especially in the Linux kernel. I don't have any numbers on how often and/or if this paradigm is frowned upon in academic circles. That said we could trivially avoid this with a context pointer in the self pointer that is handed around.

Let's maybe plan a white-board meeting on Zoom to toss around ideas?

@morrone
Copy link
Collaborator

morrone commented Jan 23, 2024

I don't think anyone is suggesting the "Currently Non Requirements", so I think we're all good there.

Having a base structure that is overloaded to contain additional data for those plugins/libraries/clients that understand the extensions to the structure is quite common in my experience; especially in the Linux kernel.

First, the kernel is not always the best example for user-space programming decisions. The kernel needs to deal with lower level details that often explain what they are doing. And often the kernel has very good reasons for the niche techniques that it employs that would not also apply here. And of course historically some of the kernel's C usage has been gcc-specific rather than generally C-compliant. And I think if we look closer at those use cases, we'll find that they don't apply here.

I don't think it is only academics that would frown. :)

This isn't really "subclassing". We're in C, not C++, and there are probably quite a few more things that would need to be done to make this qualify as subclassing. And more importantly, we are talking about designing an API, and we are not setting up proper abstraction across that API.

But let's consider for a moment that this is subclassing. If that were the case, then its an example of bad subclassing. It is a mistake in code abstraction. The plugin never calls a single method in the class. It doesn't even employ any of the data in the base classes. The only data it uses is its own internal data, which is hidden at the end of the class. Combining unrelated classes in a memory management trick would be less than desirable coding practice. So there is a clear division in use of data and methods, but for some reason, this code smashes them together.

This is going to be strange and unexpected for any developer new to this code base. I can tell you it already was when I started on LDMS, and this moves in the directions of even more unusual. It violates the Principle of Least Surprise. And like you said, one or our goals is "Make it easier, not harder to write plugins".

And like I said, and I do think this is important for any API design, we are designing an API in which backwards compatible evolution of the API (retaining ABI compatibility) is just about impossible.

I agree that putting a context pointer in "self" would be better, but far better still would be for "self" to be the context pointer.

I'm up for a Zoom meeting!

@baallan
Copy link
Collaborator

baallan commented Jan 30, 2024

Setting aside my thoughts about the instance model mechanics, where is the discussion of the content (methods and their lifecycle semantics for multinstance plugins) of the APIs for samplers, stores (and if we're ambitious, transforms?) I didn't see it in the headers and at least one thing I did see in a header (get_set) rather disturbs me. Maybe the cart is before the horse?

for example, it would strongly simplify sampler and stream plugins (eliminate the coder needing to enforce data safety with explicit use of pthreads) if the "what will be called when and what will not" was fully documented (as has come up in among other discussions, the dcgm plugin).

for example, taking a sampler which does not work by subscribing to a stream, we should have the framework guarantee an order of operations something like:

load
config[,config*]
start
sample
stop
[config*; start; stop]
term

meaning that stuff in [] is possible/optional and that the ordering above is strict (config and term are not called while between start and stop; config is called at least once; stop is called before term; nothing else is called while a sample is in progress). The central problem we currently have is that config/stop/term can all happen due to the command line while the sample loop is on a timer.

@morrone
Copy link
Collaborator

morrone commented Jan 31, 2024

@baallan I do generally agree with all of that. I was alluding to the same thing when I questioned the semantics earlier.

It looks like get_set() is something that is being brought forward into the updated API, when it really should just be eliminated. I'm not even sure of its existing semantics, let alone what they might be in the new API. Plenty of the new samplers just return NULL to get_set(), so it can't be very important?


struct ldmsd_plugin *get_plugin()
static void __once()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existence of this __once() function tells us that there is a problem with the API design. No plugin author should need to do this. We are failing to give the author a place to do global initialization (and also missing the matching place to tear down global state at the end) that will be used, regardless of how many instances there may be.

We could fix this by going with my suggestion (or something similar) of having ldmsd_import()/ldmsd_export() calls (that are independant of the instance_init()/instance_fini() methods).

@baallan
Copy link
Collaborator

baallan commented Jan 31, 2024

Re get_set:
If it exists, something like it should be a utility function provided by the framework (which is managing underlying lists anyway) and not put on the plugin author to implement.

I can picture the utility of 3 functions (pseudocode, not derived from any current code) managing lists of objects:

// let people find the plugins in which they may be interested
ldmsd_plugin_class_iterator ldmsd_plugin_class_iterator_get()

// let people find the instances of a specific factory plugin in which they may be interested
ldmsd_plugin_class_instance_iterator ldmsd_plugin_class_instance_iterator_get(ldmsd_plugin_class c)

// let people find the entire list of sets registered by a single plugin instance (whether factory or singleton)
ldmsd_plugin_instance_sets_iterator_get(ldmsd_plugin_instance i)

yes of course this is c and not c++, so maybe we should introduce the word factory (singleton plugins and factory plugins) to clarify the behavior patterns for new developers. For some samplers and stores, singleton really is the obvious implementation and it should be easy, while for others the factory pattern may be the most appropriate.
And of course in the // comments by 'people' I really mean generalized plugins which do things like transform data as it flows through the pipeline.

I don't see the value in trying to make a single set of function names which encapsulates both singleton and factory plugins. That just introduces ambiguities and conditional behavior points in the documentation of api functions, which makes the documentation hard to write and maintain.

@tom95858 tom95858 closed this Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants