-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task tree: backend performance improvements #255
Conversation
karton/core/backend.py
Outdated
Task.unserialize(task_data, backend=self) | ||
if parse_resources | ||
else Task.unserialize(task_data, parse_resources=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it just
Task.unserialize(task_data, backend=self) | |
if parse_resources | |
else Task.unserialize(task_data, parse_resources=False) | |
Task.unserialize(task_data, backend=self, parse_resources=parse_resources) |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely, backend=self
is not passed for one of the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but on the other hand there's nothing wrong with not passing the backend parameter when parse_resources=False. It's just not required by the unserialize function in that case
791b9d3
to
4bf2bb5
Compare
This PR is follow-up after #207
Instead of making extra mappings, we can simply generate task uids in format
{root_uid}:uid
. Most tasks in the Karton environment are "produced" by karton-system while routing, so performance gain will be significant even if most consumers are not upgraded yet to the latest version of Karton.The only disadvantage is that task identifiers will be a bit lengthy e.g.
{57d3630e-c86d-4931-b071-d1fafb02610d}:f03f9927-dd6a-4275-8706-f922efb7a331
. This may have some impact on the size of logs and Redis memory usageOn the other hand, having root_uid in logs is pretty useful to track issues on analysis level and correlate logs with actual samples, so this side-effect is an advantage as well.
To actually get a performance gain from that change, other changes in
karton.inspect
andkarton.backend
have been included as well:iter_task_tree
that allows to gather all tasks related with providedroot_uid
. It also lists tasks with legacy uids (matchingkarton.task:*[^:]*
) that should be mostly unrouted tasks waiting forkarton-system
processing.KartonState
is now lazy and gathers information about all tasks in system only when one ofKartonState
properties has been used. In addition, there is extra methodget_analysis
that allows to get tasks coming only from the specified root_uid.KartonState
has turned off resource parsing by default (parse_resources=False
). Converting__karton_resource__
payload entries to actual Resource objects is unnecessary for dashboard and analysis status checking. In the same time, it enables much faster deserialization of tasks.Changes planned after merging this PR:
{root_uid}:uid
back touid
in karton-dashboard views to make its views less bloated with long UIDsKartonState.get_analysis
in MWDB to make analysis status gathering much quicker.