Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed for workflows with hundreds of items #264

Closed
wants to merge 6 commits into from

Conversation

rebrowning
Copy link

@rebrowning rebrowning commented Jan 18, 2024

I have a test where I make a curl request to a simple endpoint that returns a json payload containing 500 items like:
{"data": f"this is a sentence that has a counter {count} as well as some other text"}. The changes to orquesta/specs/native/v1/models.py shave about half the time of the render function by doing a deep copy of the full object before we iterate over every single item, since all we really care about is setting the item (the rest of the payload does not change), so far from my testing it appears we can safely perform this action. This may be something that works nicely when combined with the changes in this PR: #256 , though they fix different issues from what I can see.

Copy link
Contributor

@guzzijones guzzijones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some follow up needed.

@@ -184,7 +184,8 @@ def get_staged_tasks(self, filtered=True):
if not filtered:
return self.staged

return [x for x in self.staged if x["ready"] and not x.get("completed", False)]
resp = [x for x in self.staged if x["ready"] and not x.get("completed", False)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is not needed, correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, I'll revert some of the additional changes that were made (just moved variables out of return statements to get timing on them, didn't move back)


item_ctx_value = ctx_util.set_current_item(item_ctx_value, item)
action = expr_base.evaluate(self.action, item_ctx_value)
gen_input = expr_base.evaluate(getattr(self, "input", {}), item_ctx_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not needed? were you just testing the variable value here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item_ctx_value is needed i see. 195 and 196 can be reverted.

@@ -156,6 +156,8 @@ def has_retry(self):
def render(self, in_ctx):
action_specs = []

item_ctx_value = ctx_util.copy_context(in_ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having the copy here will slow down non with items . move the copy in the else clause.

if context and not isinstance(context, dict):
raise TypeError("The context is not type of dict.")

ctx = {**context}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a shallow instead of deep copy , correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually leave this function as is and just do the operation in render.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll move it over. And yes it's a shallow copy, but intentional. The rest of the sub-objects that are copied in a deep copy don't actually change between iteration from what I saw in testing, but the top level is what needs to change which a shallow copy still facilitates.

Copy link
Author

@rebrowning rebrowning Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guzzijones thoughts on undoing the changes to set_current_item, that function is not used anywhere else in orquesta. set_current_item would not work at the current location that copy_context is being used, so we could either remove the set_current_item function and move the logic into the render function, or leave set_current_item in the current form

had to resend with the correct profile, switched to my laptop and forgot to switch

@guzzijones
Copy link
Contributor

we actually don't even need the shallow copy. I am working on an addition to my nocopy branch with added benchmarks.

@guzzijones
Copy link
Contributor

here is what I ended up with

@rebrowning
Copy link
Author

rebrowning commented Jan 23, 2024

That's clever @guzzijones I forgot that evaluate was essentially doing it's own deep copy. Makes sense though since that was the next bottleneck in the render function.

What do we need to get your branch moved forward? Is there anything I can help with there? I'm not sure I'd want to merge this branch if your will cover this change plus the rest of the deep copy clean up.

I know you also have a branch in the st2 repo for performance improvements. Are any of your orquesta changes dependent on that st2 change?

I think the next performance improvement in the render function (if it's worth the dev cycles) would need a redis connection. My thought is caching the generated object for a task that has many items. That object created by render is identical from render to render from what I saw. This would mean passing the redis configuration through though, and I haven't looked into how much of an effort that'd be.

@guzzijones
Copy link
Contributor

I will run another build of st2 with the orquesta improvements. If those tests pass that is a good sign. We still need approval from another maintainer, though.

Redis caching is a good idea, but there are lots more deep copies that could be shallow copies before that.

@guzzijones
Copy link
Contributor

orquesta changes are not dependent on st2 changes.
The st2 changes are related to zstd compression for parameters and removal of a duplicate document for liveaction.

@rebrowning rebrowning closed this Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants