Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add number shards by node #246

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Rajaeelfarsi
Copy link

@Rajaeelfarsi Rajaeelfarsi commented Jan 5, 2024

Description

#189

Following the issue opened by arob1n, I made some modifications to the source code to add the metric that gives the number of shard per node. I've named the metric nodes_shards_number.


  • All my commits include DCO.
    DCO stands for Developer Certificate of Origin and it is your declaration that your contribution is correctly attributed and licensed. Please read more about how to attach DCO to your commits here (spoiler alert: in most cases it is as simple as using -s option when doing git commit).
    Please be aware that commits without DCO will cause failure of PR CI workflow and can not be merged.

Rajaeelfarsi and others added 3 commits January 4, 2024 17:09
Signed-off-by: Rajaeelfarsi <elfarsirajae@gmail.com>
Signed-off-by: Rajaeelfarsi <elfarsirajae@gmail.com>
Copy link
Collaborator

@lukas-vlcek lukas-vlcek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Rajaeelfarsi

First of all thanks a lot for your PR. (And my apologies that I did not get to it earlier... 🙏 )

Looking at the proposed implementation – is there actually any reason why we need to exclude UNASSIGNED shards and then sum all the rest under a single category? Did you consider more general approach and simply report counts for all shard states? Similarly to how it is done at the cluster level?

It is always possible to create your own summary metrics using Prometheus recording rules. So this way you can provide accumulated summary for all non-unassigned shards. (And if you are using jsonnet then you can use it to include your recording rules into mixin such as one found in mixin branch of this repository. Or you can use different mechanism, but the point is that it is enough if the plugin/exporter provides the raw metrics and then you can build a custom metrics on top of it directly in Prometheus.)

WDYT?

I am happy to help with that. Let me know if you want to give it try or I can jump on it and take your commit into a new PR.

Reagrds,
Lukáš

@Rajaeelfarsi
Copy link
Author

Rajaeelfarsi commented Feb 9, 2024

Hi @lukas-vlcek,

Thank you for your response. I really appreciate your feedback and guidance.

I wanted to mention that I am relatively new to Opensearch and development in general, so I may not have a deep understanding of all the intricacies yet. However, I am eager to learn and improve.

Regarding the corrections I plan to make, I based my implementation on the API GET /_cat/shards, which provides the different states(types) of a shard:

(Default) State of the shard. Returned values are:

**INITIALIZING**: The shard is recovering from a peer shard or gateway.
**RELOCATING**: The shard is relocating.
**STARTED**: The shard has started.
**UNASSIGNED**: The shard is not assigned to any node.

I will follow your example for the cluster and implement the same approach for each node.

However, I have a question regarding the inclusion of the "unassigned" state. As far as I understand, a shard in the "unassigned" state is not assigned to any node, unlike the cluster level where there can be unassigned shards. Therefore, I am unsure about the need to include the "unassigned" state for each node, as nodes do not typically contain shards with the "unassigned" state.

If you could provide further clarification on why I should include the "unassigned" state for each node, I would greatly appreciate it.

Thank you once again for your support.

Best regards, Rajae

@Baarsgaard
Copy link

Baarsgaard commented Apr 18, 2024

Would you consider also exposing the currently configured cluster.max_shards_per_node?
This would allow for a dead simple alert when you're nearing the limit and not require an update to your monitoring if you change the max_shards_per_node.
Other than parsing the metric when scraping, it should be almost free to store in any vector database as it's essentially a static number.

@lukas-vlcek
Copy link
Collaborator

@Baarsgaard Makes sense!

@Baarsgaard
Copy link

Baarsgaard commented Apr 22, 2024

nodes do not typically contain shards with the "unassigned" state.

This is the exact reason I would include unassigned shards as well, simply because it's atypical and I would therefore like it exposed.

@smbambling
Copy link

Any status on this, I would love to be able to have per node shard monitoring / tracking as I move away from ElasticSearch

@lukas-vlcek
Copy link
Collaborator

Okey, let's make some progress with this over the next week. 💪

@smbambling
Copy link

Any movement on this, this is a key monitoring feature we are currently missing.

@smbambling
Copy link

I unfortunately an unable to assist with Java coding and it looks like @Rajaeelfarsi might be a little busy at the moment and unable to complete this PR. Is anyone able to pick up where he left off ? This feature will be key to some more robust monitoring

@dbuteau
Copy link

dbuteau commented Oct 22, 2024

Would it be possible to accept as is and maybe open feature request for other missing metrics ?

IMHO this interesting contrib should not be blocked because it did not fulfil all new idea which appeared

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants