-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speeding up deep graphs #2
Comments
There is potential for speedup by multithreading parts of GFAffix, especially if the bottle neck is the graph traversal. I'd be interested in knowing which step in the algorithm affects the running time on these deep graphs. The graph traversal can be parallelized, but it's not embarrassingly parallelizable, as the graph editing (which is intertwined in the graph traversal) can only be done safely in a single thread. |
If you give me some sample dataset I can try to perform runtime analysis and try to improve running time by parallelized or not. |
@ekg do you have such a "deep graph" handy? |
Or just smaller but similar graph. |
Yes, but they are big. I don't think you need to use a deep graph to see if
parallelism can help. Probably a chunk of the MHC will be enough to
profile, or any graph that you've already used for testing that takes e.g.
5-10s to process.
…On Thu, May 19, 2022, 14:42 Daniel Doerr ***@***.***> wrote:
@ekg <https://github.com/ekg> do you have such a "deep graph" handy?
—
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQEM2VS4UROFRWBP62ILVKYZKHANCNFSM5WEPZ5WQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Do these deep graphs have a particular high node degree, or is the number of nodes exceptionally high? |
High node degree and path depth. 2k fold coverage
…On Thu, May 19, 2022, 15:07 Daniel Doerr ***@***.***> wrote:
Do these deep graphs have a particular high node degree, or is the number
of nodes exceptionally high?
—
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQEKGUXR4IN3CXK47MALVKY4KRANCNFSM5WEPZ5WQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
GFAffix should scale linearly with path depth, overall this shouldn't dominate the runtime. But node degree is certainly a bottle neck. |
@ekg Could you give me advise how to build a similar graph, type of data, pipeline, tools, etc… |
@natir sorry to take a while here. I may need to share this somehow, but the graph is rather large and will take 4 days to build if you do so from scratch. I'll see if I can get a simpler test case together. |
@ekg Just so you know that your issue is not forgotten: I have a solution for parallelizing GFAffix and will work on this sometime soonish. |
@danydoerr any updates on the parallelization? |
@AndreaGuarracino thanks for asking. I'm preparing a release that should be out in the next few days. Do you want to have already a binary to test on? I'm also changing the cli... |
Of course, please 'exit' the binary xD
…________________________________
From: Daniel Doerr ***@***.***>
Sent: Friday, December 20, 2024 03:31
To: marschall-lab/GFAffix ***@***.***>
Cc: Andrea Guarracino ***@***.***>; Mention ***@***.***>
Subject: Re: [marschall-lab/GFAffix] speeding up deep graphs (Issue #2)
@AndreaGuarracino<https://github.com/AndreaGuarracino> thanks for asking. I'm preparing a release that should be out in the next few days. Do you want to have already a binary to test on? I'm also changing the cli...
—
Reply to this email directly, view it on GitHub<#2 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AO26XHWFI4VP52KTV7UBVPD2GPPYTAVCNFSM6AAAAABT56CEJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJWGYYTGMBUGY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Thx! Now I've noticed your Looks extremely great. With 1 thread it is already a bit faster than the current |
Yes. That is the current development branch from which I generated the binary. There should be a small speed up when using up to 4 threads (`-p4`)
|
@AndreaGuarracino do you have ungfaffixed graphs of the new HPRC assemblies that I can test on? |
Not yet, they'll come soon. I've started using the new gfaffix everywhere, so it is already "in production".
…________________________________
From: Daniel Doerr ***@***.***>
Sent: Saturday, December 21, 2024 03:10
To: marschall-lab/GFAffix ***@***.***>
Cc: Andrea Guarracino ***@***.***>; Mention ***@***.***>
Subject: Re: [marschall-lab/GFAffix] speeding up deep graphs (Issue #2)
@AndreaGuarracino<https://github.com/AndreaGuarracino> do you have ungfaffixed graphs of the new HPRC assemblies that I can test on?
—
Reply to this email directly, view it on GitHub<#2 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AO26XHUH5LQIME267YIGIUT2GUWBRAVCNFSM6AAAAABT56CEJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJYGA2TQNJUHA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
New version is released now. It still requires improvement in speed and memory usage for these large graphs. The bottleneck is now only the i/o part of gfaffix, and I already have some concrete ideas how to improve it. I'll leave this issue open for now. |
On deep graphs (2k-fold) I'm seeing gfaffix taking quite a bit of time. It's essentially single threaded, right? Is there a possible way to adapt it to operate in parallel?
The text was updated successfully, but these errors were encountered: