-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PILOT-6724: add hard limit for queueing jobs into thread pool #199
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
QXgu
reviewed
Jan 27, 2025
QXgu
approved these changes
Jan 27, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
After testing, the issue is only limited in windows platform, which doesn’t have hard limit of memory usage of each process. It will allow cli to use memory as much as possible, while linux will have around 8GB memory budge for it. However, the issue is not only limited with system setups, the code logic itself has huge defect which cli shouldn’t use that much memory at all. After investigation, the issue was caused by improper logic inside threading chunk upload. Cli uses apply_aync function to queue chunk uploading functions chunk_upload. Cli will keep creating function into pool and those function will be hold by memory until reaching limits.
The problem is here: The function chunk_upload will take chunk as parameter, that means it will use memory to hold chunk (partial of the file). The job creating speed is way more faster than job consuming speed(job uploading). Therefore, it eventually eats up all memory budget.
The solution is to limit the number of waiting job (concurrency) by semaphore package. I made following test regarding with different threading number:
In both cases, the job creation speed is much faster than its consumption speed. And there is no performance drop. So I decided to create a default envvar num_of_jobs with default number 20, which we can adjust it a later if users need higher number of waiting jobs.
JIRA Issues
PILOT-6724
Type of Change
Please delete options that are not relevant.
Testing
Are there any new or updated tests to validate the changes?
Test Directions
No test cases updated