Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PILOT-6724: add hard limit for queueing jobs into thread pool #199

Merged
merged 3 commits into from
Jan 27, 2025

Conversation

colorzzr
Copy link
Member

Summary

After testing, the issue is only limited in windows platform, which doesn’t have hard limit of memory usage of each process. It will allow cli to use memory as much as possible, while linux will have around 8GB memory budge for it. However, the issue is not only limited with system setups, the code logic itself has huge defect which cli shouldn’t use that much memory at all. After investigation, the issue was caused by improper logic inside threading chunk upload. Cli uses apply_aync function to queue chunk uploading functions chunk_upload. Cli will keep creating function into pool and those function will be hold by memory until reaching limits.

The problem is here: The function chunk_upload will take chunk as parameter, that means it will use memory to hold chunk (partial of the file). The job creating speed is way more faster than job consuming speed(job uploading). Therefore, it eventually eats up all memory budget.

The solution is to limit the number of waiting job (concurrency) by semaphore package. I made following test regarding with different threading number:

threading 2 with 20 waiting jobs limitation.

threading 6 with 20 waiting jobs limitation.

In both cases, the job creation speed is much faster than its consumption speed. And there is no performance drop. So I decided to create a default envvar num_of_jobs with default number 20, which we can adjust it a later if users need higher number of waiting jobs.

JIRA Issues

PILOT-6724

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Refactor or reformatting

Testing

Are there any new or updated tests to validate the changes?

  • Yes
  • No

Test Directions

No test cases updated

@colorzzr colorzzr added the bug Something isn't working label Jan 27, 2025
@colorzzr colorzzr requested a review from QXgu January 27, 2025 16:38
@colorzzr colorzzr self-assigned this Jan 27, 2025
Copy link

github-actions bot commented Jan 27, 2025

Coverage

Coverage Report
FileStmtsMissCoverMissing
__init__.py00100% 
pilotcli.py0590%5, 7–10, 12–18, 21–22, 24–25, 27–28, 30–34, 36–39, 41, 43–44, 46–47, 50, 53–57, 59, 61–64, 66–71, 74–80, 83–85
commands
   __init__.py00100% 
   container_registry.py01959%16, 22–24, 31–33, 41–44, 50–53, 62–65
   dataset.py0989%20, 38–39, 47, 82–83, 92, 126, 130
   entry_point.py01185%41, 47, 75, 98–100, 102–106
   file.py03986%47, 145–148, 152, 157, 160–162, 220, 304–311, 313, 315, 329–335, 337, 366, 368, 372, 418, 421, 436–438, 444–445
   folder.py0196%23
   project.py0295%18, 61
   user.py0982%27, 44–45, 53–54, 62–63, 84–85
configs
   __init__.py00100% 
   app_config.py00100% 
   config.py00100% 
   user_config.py02183%35, 70, 79, 90, 117, 130, 140, 146, 150, 154, 158, 162, 166, 170, 174, 178, 182, 186, 190, 194, 198
   utils.py03536%13–14, 16–19, 22–23, 25, 27, 31–32, 34, 36–37, 40, 43, 45–47, 49–51, 54–57, 59, 67, 69–72, 74–75
models
   __init__.py00100% 
   enums.py00100% 
   item.py00100% 
   service_meta_class.py0180%9
   singleton.py00100% 
   upload_form.py0433%28, 40–42
resources
   custom_error.py00100% 
   custom_help.py00100% 
services
   __init__.py00100% 
services/clients
   __init__.py00100% 
   base_auth_client.py0196%53
   base_client.py0295%52, 103
services/container_registry_manager
   container_registry_manager.py010817%19–20, 23–26, 29–33, 36–44, 48–60, 62–64, 68–82, 84–86, 90–99, 101–103, 107–110, 125–136, 140–143, 147–162, 164, 166–169
services/crypto
   __init__.py00100% 
   crypto.py01940%35, 43–47, 49, 59–61, 69–75, 77, 79
services/dataset_manager
   dataset_detail.py02271%38, 59, 66–71, 73–75, 77–86, 88
   dataset_download.py02282%53, 63–65, 74–76, 92, 94–95, 97–98, 100, 102–104, 112–115, 149, 164
   dataset_list.py0881%36–41, 43, 52
   model.py00100% 
services/file_manager
   __init__.py00100% 
   file_list.py01285%60–66, 86–88, 100, 102
   file_manifests.py08030%18–23, 41–45, 49–55, 57, 61–66, 68, 72–73, 77–81, 83, 86, 88–89, 91–92, 94–97, 102–108, 113–118, 121–127, 129–131, 134–140, 142–144, 146–149
   file_tag.py03931%23, 27–31, 33, 36–48, 50–52, 54–55, 59–61, 64–68, 70–73, 75–76
services/file_manager/file_download
   __init__.py00100% 
   download_client.py010555%52–61, 79–80, 122–126, 128–130, 142–143, 148–149, 153–162, 165–170, 180–181, 212, 217, 223–233, 235–236, 239–240, 245–246, 250–255, 259–260, 263–265, 267–278, 285, 290, 305, 309–311, 313–326, 328
   model.py00100% 
services/file_manager/file_metadata
   __init__.py00100% 
   file_metadata_client.py0395%98–100
   folder_client.py0491%50, 79–81
services/file_manager/file_move
   __init__.py00100% 
   file_move_client.py03759%60–64, 66, 74, 79–83, 86–94, 96, 98–103, 114, 116–120, 122, 172–173
services/file_manager/file_trash
   __init__.py00100% 
   file_trash_client.py0296%69, 90
   utils.py0392%39, 63, 70
services/file_manager/file_upload
   __init__.py00100% 
   exception.py00100% 
   file_upload.py02885%38–45, 103–105, 117, 146–147, 151, 156, 192, 215–217, 258–259, 261–263, 343, 346, 350
   models.py0591%24, 44, 150–152
   upload_client.py04577%102–105, 145, 217, 231–245, 247, 249–253, 255–259, 261–262, 398, 416, 418, 424–425, 430–433, 435–436
   upload_validator.py01957%26–32, 35–41, 44–45, 50, 52, 54
services/logger_services
   __init__.py00100% 
   debugging_log.py00100% 
   log_functions.py00100% 
services/output_manager
   __init__.py00100% 
   error_handler.py00100% 
   help_page.py0791%13–17, 19, 21
   message_handler.py05673%24, 46, 51, 66–68, 73, 83, 88, 101, 106, 113, 118, 123, 127, 143, 178, 188, 197, 215, 237, 281–292, 303–308, 310–316, 327, 331–332, 336–337, 339–340, 344, 348, 352
services/project_manager
   __init__.py00100% 
   project.py0684%40, 46–47, 49–51
services/user_authentication
   __init__.py00100% 
   decorator.py0293%27, 30
   token_manager.py0987%27, 46–50, 77–78, 95
   user_login_logout.py01684%29–30, 41, 126, 130–136, 138–140, 144–145
utils
   __init__.py00100% 
   aggregated.py02784%56, 66–68, 82–84, 116, 130–131, 144, 168–175, 177–178, 206, 214, 231–232, 240, 264
TOTAL364089775% 

@colorzzr colorzzr merged commit a5558cc into develop Jan 27, 2025
1 check passed
@colorzzr colorzzr deleted the PILOT-6724 branch January 27, 2025 20:24
@colorzzr colorzzr restored the PILOT-6724 branch January 27, 2025 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants