-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ES-10063 Add multi-project support for more stats APIs #127650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ES-10063 Add multi-project support for more stats APIs #127650
Conversation
This affects the following APIs: - `GET _nodes/stats`: - For `indices`, it now prefixes the index name with the project ID (for non-default projects). Previously, it didn't tell you which project an index was in, and it failed if two projects had the same index name. - For `ingest`, it now gets the pipeline and processor stats for all projects, and prefixes the pipeline ID with the project ID. Previously, it only got them for the default project. - `GET /_cluster/stats`: - For `ingest`, it now aggregates the pipeline and processor stats for all projects. Previously, it only got them for the default project. - `GET /_info`: - For `ingest`, same as for `GET /_nodes/stats`. This is done by making `IndicesService.stats()` and `IngestService.stats()` include project IDs in the `NodeIndicesStats` and `IngestStats` objects they return, and making those stats objects incorporate the project IDs when converting to XContent. The transitive callers of these two methods are rather extensive (including all callers to `NodeService.stats()`, all callers of `TransportNodesStatsAction`, and so on). To ensure the change is safe, the callers were all checked out, and they fall into the following cases: - The behaviour change is one of the desired enhancements described above. - There is no behaviour change because it was getting node stats but neither `indices` nor `ingest` stats were requested. - There is no behaviour change because it was getting `indices` and/or `ingest` stats but only using aggregate values. - In `MachineLearningUsageTransportAction` and `TransportGetTrainedModelsStatsAction`, the `IngestStats` returned will return stats from all projects instead of just the default with this change, but they have been changed to filter the non-default project stats out, so this change is a noop there. (These actions are not MP-ready yet.) - `MonitoringService` will be affected, but this is the legacy monitoring module which is not in use anywhere that MP is going to be enabled. (If anything, the behaviour is probably improved by this change, as it will now include project IDs, rather than producing ambiguous unqualified results and failing in the case of duplicates.)
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this, Pete! I did a first round of review by just looking at the production files. I'll have a look at the tests later on. This feels like the right direction; my comments are mostly small and/or thinking out loud.
for (Map.Entry< | ||
ProjectId, | ||
Map<String, List<org.elasticsearch.ingest.IngestStats.ProcessorStat>>> processorStatsForProject : nodeStat | ||
.getIngestStats() | ||
.processorStats() | ||
.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we extract some variables here to keep this a bit more readable? The multi-line types inside a for loop declaration is a bit intense. Same goes for the loop below.
Also, is there a reason this class is fully qualified instead of imported? I don't immediately see any name conflicts but maybe I'm missing something. I still think it's better to extract variables - I'm personally not a big fan of multi-line for loop declarations but maybe that's just me - but it would already help to import the class.
String, | ||
List<org.elasticsearch.ingest.IngestStats.ProcessorStat>> processorStats : processorStatsForProject.getValue() | ||
.entrySet()) { | ||
pipelineIdsByProject.computeIfAbsent(projectId, k -> new HashSet<>()).add(processorStats.getKey()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to keep a map of all the pipeline IDs, right? Since we only need the count in the end, we can just keep one set when we loop over the processors in the project and then add the size of the set to an int/long variable that's initialized outside all the loops (technically it could cause an integer overflow, but I don't think that's something we really need to protect against here).
* Constructs an instance. If {@code projectsByIndex} argument is non-null, then the index-to-project map will be stored, and the | ||
* project IDs will be prepended to the index names when converting this instance to XContent (except when it is the default project). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean empty
instead of non-null
. Also, I think the then the index-to-project map will be stored
part is not relevant anymore as we're always storing that field? But maybe I'm misinterpreting it.
if (Objects.equals(projectId, Metadata.DEFAULT_PROJECT_ID)) { | ||
return index.getName(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit on the fence about this approach. Thinking out loud: on the one hand, this maintains BwC with little effort. On the other hand, if we ever have indices in the default project this'll be a bit off. It's been mentioned a handful of times that we might want to consider (so, super hypothetically) having a shared cluster-level GeoIP index to avoid downloading the same DB over again. We shouldn't make this decision purely based on the GeoIP index, but the idea of the faint possibility of cluster-level indices is something to think about. At the same time, this approach doesn't "break" anything in that case either, it just makes the output inconsistent. So, we either try to account for that situation proactively now, or we decide to go with this approach and we solve that issue if/when it comes. Any thoughts?
static List<PipelineStat> merge(List<PipelineStat> first, List<PipelineStat> second) { | ||
var totalsPerPipeline = new HashMap<String, PipelineStat>(); | ||
record MergeKey(ProjectId projectId, String pipelineId) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL you could define records inside methods! Do you have any idea how that compiles? I'm mostly curious if the compiler extracts that record definition somewhere or whether it's going to redefine that record on every invocation of this method.
@@ -11,7 +11,7 @@ esplugin { | |||
|
|||
dependencies { | |||
testImplementation project(path: ':test:test-clusters') | |||
clusterModules project(':test:external-modules:test-multi-project') | |||
clusterModules(project(':test:external-modules:test-multi-project'), project(":modules:ingest-common")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the more common syntax is:
clusterModules(project(':test:external-modules:test-multi-project'), project(":modules:ingest-common")) | |
clusterModules project(':test:external-modules:test-multi-project') | |
clusterModules project(":modules:ingest-common") |
@@ -79,7 +79,7 @@ static NodeIndicesStats buildNodeIndicesStats(RoutingNode routingNode, long byte | |||
IndexShardStats shardStats = new IndexShardStats(shardId, new ShardStats[] { shardStat }); | |||
indexStats.computeIfAbsent(shardId.getIndex(), k -> new ArrayList<>()).add(shardStats); | |||
} | |||
return new NodeIndicesStats(COMMON_STATS, Map.of(), indexStats, true); | |||
return new NodeIndicesStats(COMMON_STATS, Map.of(), indexStats, Map.of(), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we add a (small) comment here why were using Map.of()
for the projectsByIndex
parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some YAML tests from the ingest-common
module are muted in this file because they depend on stats APIs:
'^test/ingest/15_info_ingest/*', // uses cluster info API
'^test/ingest/70_bulk/Test bulk request with default pipeline', // uses stats API
'^test/ingest/70_bulk/Test bulk request without default pipeline', // uses stats API
However, because we decided to prefix the project ID, we won't be able to unmute those tests... I don't immediately have a good suggestion, but I thought I'd mention it so we're aware. If we decide that that is OK, we should move these lines to the bottom so they're next to the comment explaining why some stats API tests are muted in MP.
return indices.values() | ||
.stream() | ||
.map(AbstractIndexComponent::index) | ||
.collect(Collectors.toMap(index -> index, index -> clusterService.state().metadata().projectFor(index).id())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it's worth it to build this map manually instead of using Streams, for performance reasons. I believe the stats API is called pretty regularly by monitoring agents (but I could be mistaken), and if we're doing this for all indices in all projects, the difference between Streams and non-streams might be noticeable. What do you think?
return indices.values() | ||
.stream() | ||
.map(AbstractIndexComponent::index) | ||
.collect(Collectors.toMap(index -> index, index -> clusterService.state().metadata().projectFor(index).id())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud: by using Metadata#projectFor
, we throw an exception if the index is not found. I'm not familiar enough with the IndicesService
to know whether there is a possibility of the service being "behind" the state in ClusterService
. For instance, when an index is deleted from the cluster state, is the indices service updated atomically? Or if an index is added, is it first added to the cluster state or the IndicesService
? If you don't already have an answer to those questions (which is completely fine), I think it's worth having a look to be sure what kind of behavior we could see here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall! I haven't gone over the tests yet, will run another pass for the tests tomorrow.
public NodeIndicesStats( | ||
CommonStats oldStats, | ||
Map<Index, CommonStats> statsByIndex, | ||
Map<Index, List<IndexShardStats>> statsByShard, | ||
Map<Index, ProjectId> projectsByIndex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like this approach of passing in the index to project map
This affects the following APIs:
GET _nodes/stats
:indices
, it now prefixes the index name with the project ID (for non-default projects). Previously, it didn't tell you which project an index was in, and it failed if two projects had the same index name.ingest
, it now gets the pipeline and processor stats for all projects, and prefixes the pipeline ID with the project ID. Previously, it only got them for the default project.GET /_cluster/stats
:ingest
, it now aggregates the pipeline and processor stats for all projects. Previously, it only got them for the default project.GET /_info
:ingest
, same as forGET /_nodes/stats
.It does not add multi-project support for the
repositories
section on_nodes/stats
, since repositories aren't MP-ready yet AFAIK.This is done by making
IndicesService.stats()
andIngestService.stats()
include project IDs in theNodeIndicesStats
andIngestStats
objects they return, and making those stats objects incorporate the project IDs when converting to XContent.The transitive callers of these two methods are rather extensive (including all callers to
NodeService.stats()
, all callers ofTransportNodesStatsAction
, and so on). To ensure the change is safe, the callers were all checked out, and they fall into the following cases:indices
noringest
stats were requested.indices
and/oringest
stats but only using aggregate values.MachineLearningUsageTransportAction
andTransportGetTrainedModelsStatsAction
, theIngestStats
returned will return stats from all projects instead of just the default with this change, but they have been changed to filter the non-default project stats out, so this change is a noop there. (These actions are not MP-ready yet.)MonitoringService
will be affected, but this is the legacy monitoring module which is not in use anywhere that MP is going to be enabled. (If anything, the behaviour is probably improved by this change, as it will now include project IDs, rather than producing ambiguous unqualified results and failing in the case of duplicates.)