High provision time on Vertex AI pipelines #127

venturozzaccio · 2023-09-27T10:08:43Z

venturozzaccio
Sep 27, 2023

Hi,

I have a pipeline that contains about 40 nodes. Before deploying it from kedro I tried a basic pipeline and I found that the pod takes 2 about 2 minutes to be provisioned. So 40x2min = 80 minutes of waiting for the pipeline, assuming no parallelization.
It's a bit too much for my task that takes 5 minutes in total running locally.
Is there a way to reuse the same provisioned pod that I can configure or are you guys using some other kedro related workaround to avoid this provision time disaster?

Thanks

marrrcin · 2023-09-27T10:53:02Z

marrrcin
Sep 27, 2023

We have been experiencing the same behaviour with Vertex AI recently. There is not much we can do on that part as it's a cloud provider issue 🤷🏻‍♂️
To address the issue with long provisioning, "grouping" can be employed (i.e. putting more than 1 Kedro node in one Vertex AI Pipeline node). We're open to accept a PR on that if you're interested :)

0 replies

venturozzaccio · 2023-09-27T14:15:39Z

venturozzaccio
Sep 27, 2023
Author

Yeah, I also thought about grouping nodes but it will be quite a limit.
I found that there is a beta feature called provisioned resources

https://cloud.google.com/vertex-ai/docs/training/persistent-resource-train

that supposedly does the job given manual creation and deletion of the resources after the pipeline run.

The usage of that is available on beta cli as a parameter for customjob that is the same type of job created from pipelinejob (I could be wrong) and there is the same in the sdk beta for the Customjob class so maybe there is a way to pass that parameter down.

1 reply

marrrcin Sep 28, 2023

Please let us know if you test it out :)

Italosayan · 2024-04-30T18:14:12Z

Italosayan
Apr 30, 2024

Then a node shouldn't be a simple function right? In the Kedro documentation the basic example for a node is:

def add(x, y):
    return x + y

Shouldn't the docs separate the work on more comprehensive steps? Maybe all data preprocessing in a single node and training in another. Would that improve performance?

1 reply

marrrcin May 6, 2024

Grouping was implemented and released in https://github.com/getindata/kedro-vertexai/releases/tag/0.10.0 - that could greatly improve the performance and address the high provisioning time while the nodes on the Kedro side can remain small and testable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High provision time on Vertex AI pipelines #127

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

High provision time on Vertex AI pipelines #127

venturozzaccio Sep 27, 2023

Replies: 3 comments · 2 replies

marrrcin Sep 27, 2023

venturozzaccio Sep 27, 2023 Author

marrrcin Sep 28, 2023

Italosayan Apr 30, 2024

marrrcin May 6, 2024

venturozzaccio
Sep 27, 2023

Replies: 3 comments 2 replies

marrrcin
Sep 27, 2023

venturozzaccio
Sep 27, 2023
Author

Italosayan
Apr 30, 2024