Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task tree: backend performance improvements #255

Merged
merged 8 commits into from
May 16, 2024
Merged

Conversation

psrok1
Copy link
Member

@psrok1 psrok1 commented May 13, 2024

This PR is follow-up after #207

Instead of making extra mappings, we can simply generate task uids in format {root_uid}:uid. Most tasks in the Karton environment are "produced" by karton-system while routing, so performance gain will be significant even if most consumers are not upgraded yet to the latest version of Karton.

The only disadvantage is that task identifiers will be a bit lengthy e.g. {57d3630e-c86d-4931-b071-d1fafb02610d}:f03f9927-dd6a-4275-8706-f922efb7a331. This may have some impact on the size of logs and Redis memory usage

[2024-05-14 12:38:06,884][INFO] Received new task - {57d3630e-c86d-4931-b071-d1fafb02610d}:f03f9927-dd6a-4275-8706-f922efb7a331
{"error": null, "headers": {"origin": "", "quality": "high", "receiver": "karton.wait-for-it", "type": "task"}, "headers_persistent": {"quality": "high"}, "last_update": 1715683086.8855345, "orig_uid": "{57d3630e-c86d-4931-b071-d1fafb02610d}:57d3630e-c86d-4931-b071-d1fafb02610d", "parent_uid": null, "payload": {"payload": {"echo": "off"}}, "payload_persistent": {"__headers_persistent": {"quality": "high"}}, "priority": "normal", "root_uid": "57d3630e-c86d-4931-b071-d1fafb02610d", "status": "Started", "uid": "{57d3630e-c86d-4931-b071-d1fafb02610d}:f03f9927-dd6a-4275-8706-f922efb7a331"}
[2024-05-14 12:38:16,886][INFO] Task done - {57d3630e-c86d-4931-b071-d1fafb02610d}:f03f9927-dd6a-4275-8706-f922efb7a331
[2024-05-14 12:38:16,892][INFO] Received new task - {3d41423a-a99b-43ba-8e89-6c9dcfd71a50}:cbf7c5ec-97da-446d-b65f-1d72997ffada

On the other hand, having root_uid in logs is pretty useful to track issues on analysis level and correlate logs with actual samples, so this side-effect is an advantage as well.

To actually get a performance gain from that change, other changes in karton.inspect and karton.backend have been included as well:

  • New KartonBackend method iter_task_tree that allows to gather all tasks related with provided root_uid. It also lists tasks with legacy uids (matching karton.task:*[^:]*) that should be mostly unrouted tasks waiting for karton-system processing.
  • KartonState is now lazy and gathers information about all tasks in system only when one of KartonState properties has been used. In addition, there is extra method get_analysis that allows to get tasks coming only from the specified root_uid.
  • KartonState has turned off resource parsing by default (parse_resources=False). Converting __karton_resource__ payload entries to actual Resource objects is unnecessary for dashboard and analysis status checking. In the same time, it enables much faster deserialization of tasks.

Changes planned after merging this PR:

  • Convert {root_uid}:uid back to uid in karton-dashboard views to make its views less bloated with long UIDs
  • Use KartonState.get_analysis in MWDB to make analysis status gathering much quicker.

@psrok1 psrok1 linked an issue May 13, 2024 that may be closed by this pull request
@psrok1 psrok1 marked this pull request as ready for review May 14, 2024 15:20
Comment on lines 559 to 561
Task.unserialize(task_data, backend=self)
if parse_resources
else Task.unserialize(task_data, parse_resources=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it just

Suggested change
Task.unserialize(task_data, backend=self)
if parse_resources
else Task.unserialize(task_data, parse_resources=False)
Task.unserialize(task_data, backend=self, parse_resources=parse_resources)

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely, backend=self is not passed for one of the cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but on the other hand there's nothing wrong with not passing the backend parameter when parse_resources=False. It's just not required by the unserialize function in that case

karton/core/backend.py Outdated Show resolved Hide resolved
karton/core/backend.py Outdated Show resolved Hide resolved
karton/core/backend.py Show resolved Hide resolved
@psrok1 psrok1 force-pushed the feature/task-tree-uid branch from 791b9d3 to 4bf2bb5 Compare May 15, 2024 13:10
@psrok1 psrok1 requested review from msm-cert and nazywam May 15, 2024 14:48
@psrok1 psrok1 merged commit c7d56b3 into master May 16, 2024
6 checks passed
@psrok1 psrok1 deleted the feature/task-tree-uid branch May 16, 2024 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get task tree status
3 participants