Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit parallel execution of a stage's layer #574

Merged
merged 1 commit into from
Jan 31, 2025

Conversation

maoueh
Copy link
Contributor

@maoueh maoueh commented Jan 28, 2025

Previously, the engine was executing modules in a stage's layer all in parallel. So if you had 20 independent mapper modules, they were all run in parallel.

This was hindering performance on high load where a lot of CPU cycles can be consumed will the machine has limited physical cores available.

We now change that behavior, development mode will not execute any modules in parallel, never. For production mode, we now limit to 2 parallel execution. A future update will make that value dynamic based on the subscription of the request.

Copy link
Contributor

@sduchesneau sduchesneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just ensure tier2 gets the same treatment passed down from tier1, and the rest is very good!

service/tier1.go Outdated
requestDetails.MaxParallelJobs = count
}
}
if parallelExecutors := auth.Get("X-Sf-Substreams-Stage-Layer-Parallel-Executor-Max-Count"); parallelExecutors != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to:

  1. implement the same parsing in tier2.go
  2. ensure we pass that header down to tier2 request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done please check

@maoueh maoueh force-pushed the feature/limit-execution-parallelism branch from e9bbdb1 to 2fcfb47 Compare January 30, 2025 21:43
Copy link
Contributor

@sduchesneau sduchesneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in comment, and I suggest moving the IsInDevelopment logic to the code evaluating the max number of workers for proper separation of concerns.

Other than that, LGTM !


const safeguardMaxStageLayerParallelExecutorCount = 16

// MaxParallelJobs returns the maximum number of parallel executors (e.g. go routines) that can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, should be MaxStageLayerParallelExecutor

logging.Logger(ctx, p.stores.logger).Debug("executing stage's layers", zap.Int("layer_count", len(p.StagedModuleExecutors)), zap.Uint64("max_parallel_executor", maxParallelExecutor))

for _, layer := range p.StagedModuleExecutors {
if isDevelopmentMode || maxParallelExecutor <= 1 || len(layer) <= 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we care here about isDevelopmentMode ?

The MaxStageLayerParallelExecutor should check that we are in development mode and return 1. This logic doesn't belong here.

}

// If unset, provide default value which is 2 for now
if details.MaxStageLayerParallelExecutor == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentionned earlier, add the check for IsInDevelopmentMode here and return 1 ...

Previously, the engine was executing modules in a stage's layer all in parallel. So if you had 20 independent mapper modules, they were all run in parallel.

This was hindering performance on high load where a lot of CPU cycles can be consumed will the machine has limited physical cores available.

We now change that behavior, development mode will not execute any modules in parallel, never. For production mode, we now limit to 2 parallel execution. A future update will make that value dynamic based on the subscription of the request.
@maoueh maoueh force-pushed the feature/limit-execution-parallelism branch from 2fcfb47 to 207dae8 Compare January 31, 2025 21:20
@maoueh maoueh merged commit 6ba42ce into develop Jan 31, 2025
5 checks passed
@maoueh maoueh deleted the feature/limit-execution-parallelism branch January 31, 2025 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants