Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve image pulling process #2276

Open
fregataa opened this issue Jun 15, 2024 · 1 comment
Open

Improve image pulling process #2276

fregataa opened this issue Jun 15, 2024 · 1 comment
Assignees
Labels
comp:agent Related to Agent component comp:manager Related to Manager component type:feature Add new features
Milestone

Comments

@fregataa
Copy link
Member

fregataa commented Jun 15, 2024

TODO steps

Note

Step 1 targets 24.03, the remaining steps target 24.09.

Step 1

  • Allow superadmins to update status of kernels from PULLING to CANCELLED forcibly

Step 2

  • Implement check-and-pull API
    It checks whether an agent has a specific image or not.
    If the agent does not have the image, it starts a background task that pulls the image and returns the background task id.
    After the background task finishes, the agent should dispatch ImagePulled event.

  • Check images early when creating kernels
    When a kernel is ready to be created, call check-and-pull API first. If the check-and-pull API returns background task id, finish the kernel creation process and update the kernel status to PULLING. Else, proceed to create the kernel.

  • check-pulling-kernels loop
    Manager should run a global asynchronous background loop that checks all PULLING status kernels. It starts kernels if the image has been pulled successfully.

  • Update USER_RESOURCE_OCCUPYING_KERNEL_STATUSES
    Exclude PULLING kernel status from USER_RESOURCE_OCCUPYING_KERNEL_STATUSES since PULLING status kernels don't occupy any resources!

As-is

sequenceDiagram
    loop "prepare" loop
        activate Manager
        Manager->>Manager: Fetch "SCHEDULED" sessions
        Manager->>+Agent: Create kernel
        opt need to pull
            Agent->>Event bus: Produce "Pulling" event
            Agent->>Agent: Pull the image and wait till it finishes
            Agent->>Event bus: Produce "Preparing" event
        end
        Agent->>Agent: Create kernel
        Agent-->>-Manager: Return kernel creation info
        deactivate Manager
    end
Loading

To-do

sequenceDiagram
    loop "check-readiness" loop
        activate Manager
        Manager->>Manager: Fetch "SCHEDULED" sessions
        Manager->>Agent: check-and-pull
        activate Agent
        Note right of Agent: Run in background task
        Agent-->>Manager: Return background task id
        deactivate Manager
    end
    opt need to pull
        Agent->>Event bus: Produce "Pulling" event
        Agent->>Agent: Pull the image and wait till it finishes
    end
    Agent->>Event bus: Produce "Pull finished" event
    deactivate Agent

    loop "create-kernel" loop
        activate Manager
        Manager->>Manager: Fetch "READY-TO-CREATE" sessions
        Manager->>+Agent: Create kernel
        Agent->>Agent: Create kernel
        Agent-->>-Manager: Return kernel creation info
        deactivate Manager
    end
Loading

Step 3

Step 4

@fregataa fregataa self-assigned this Jun 15, 2024
@fregataa fregataa changed the title Detach image pulling process from kernel creation Improve image pulling process Jun 15, 2024
@achimnol achimnol added type:feature Add new features comp:manager Related to Manager component comp:agent Related to Agent component labels Jun 17, 2024
@achimnol achimnol added this to the 24.09 milestone Jun 17, 2024
@achimnol
Copy link
Member

achimnol commented Jun 17, 2024

Step 1 targets 24.03 while the remaining steps target 24.09.
(cc: @adrysn @xyloon @kmkwon94)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:agent Related to Agent component comp:manager Related to Manager component type:feature Add new features
Projects
None yet
Development

No branches or pull requests

2 participants