speeding up deep graphs #2

ekg · 2022-05-17T10:52:30Z

On deep graphs (2k-fold) I'm seeing gfaffix taking quite a bit of time. It's essentially single threaded, right? Is there a possible way to adapt it to operate in parallel?

danydoerr · 2022-05-18T12:25:12Z

There is potential for speedup by multithreading parts of GFAffix, especially if the bottle neck is the graph traversal. I'd be interested in knowing which step in the algorithm affects the running time on these deep graphs. The graph traversal can be parallelized, but it's not embarrassingly parallelizable, as the graph editing (which is intertwined in the graph traversal) can only be done safely in a single thread.

natir · 2022-05-19T12:38:19Z

If you give me some sample dataset I can try to perform runtime analysis and try to improve running time by parallelized or not.

danydoerr · 2022-05-19T12:41:57Z

@ekg do you have such a "deep graph" handy?

natir · 2022-05-19T12:47:10Z

Or just smaller but similar graph.

ekg · 2022-05-19T12:49:13Z

Yes, but they are big. I don't think you need to use a deep graph to see if parallelism can help. Probably a chunk of the MHC will be enough to profile, or any graph that you've already used for testing that takes e.g. 5-10s to process.

…

On Thu, May 19, 2022, 14:42 Daniel Doerr ***@***.***> wrote: @ekg <https://github.com/ekg> do you have such a "deep graph" handy? — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDQEM2VS4UROFRWBP62ILVKYZKHANCNFSM5WEPZ5WQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

danydoerr · 2022-05-19T13:07:38Z

Do these deep graphs have a particular high node degree, or is the number of nodes exceptionally high?

ekg · 2022-05-19T13:09:25Z

High node degree and path depth. 2k fold coverage

…

On Thu, May 19, 2022, 15:07 Daniel Doerr ***@***.***> wrote: Do these deep graphs have a particular high node degree, or is the number of nodes exceptionally high? — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDQEKGUXR4IN3CXK47MALVKY4KRANCNFSM5WEPZ5WQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

danydoerr · 2022-05-19T13:11:43Z

GFAffix should scale linearly with path depth, overall this shouldn't dominate the runtime. But node degree is certainly a bottle neck.

natir · 2022-05-19T20:00:01Z

@ekg Could you give me advise how to build a similar graph, type of data, pipeline, tools, etc…

ekg · 2022-06-08T10:16:50Z

@natir sorry to take a while here. I may need to share this somehow, but the graph is rather large and will take 4 days to build if you do so from scratch. I'll see if I can get a simpler test case together.

danydoerr · 2023-03-24T13:25:59Z

@ekg Just so you know that your issue is not forgotten: I have a solution for parallelizing GFAffix and will work on this sometime soonish.

AndreaGuarracino · 2024-12-19T22:30:01Z

@danydoerr any updates on the parallelization?

danydoerr · 2024-12-20T09:31:03Z

@AndreaGuarracino thanks for asking. I'm preparing a release that should be out in the next few days. Do you want to have already a binary to test on? I'm also changing the cli...

AndreaGuarracino · 2024-12-20T15:33:07Z

Of course, please 'exit' the binary xD

…

________________________________ From: Daniel Doerr ***@***.***> Sent: Friday, December 20, 2024 03:31 To: marschall-lab/GFAffix ***@***.***> Cc: Andrea Guarracino ***@***.***>; Mention ***@***.***> Subject: Re: [marschall-lab/GFAffix] speeding up deep graphs (Issue #2) @AndreaGuarracino<https://github.com/AndreaGuarracino> thanks for asking. I'm preparing a release that should be out in the next few days. Do you want to have already a binary to test on? I'm also changing the cli... — Reply to this email directly, view it on GitHub<#2 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AO26XHWFI4VP52KTV7UBVPD2GPPYTAVCNFSM6AAAAABT56CEJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJWGYYTGMBUGY>. You are receiving this because you were mentioned.Message ID: ***@***.***>

danydoerr · 2024-12-20T16:38:01Z

@AndreaGuarracino voilà gfaffix-0.2.0-prerelease.tar.gz

AndreaGuarracino · 2024-12-20T20:14:55Z

Thx! Now I've noticed your refined_deduplication branch!

Looks extremely great. With 1 thread it is already a bit faster than the current main branch. And it is also slimmer in memory.

danydoerr · 2024-12-20T20:42:14Z

Yes. That is the current development branch from which I generated the binary. There should be a small speed up when using up to 4 threads (`-p4`)

danydoerr · 2024-12-21T09:10:25Z

@AndreaGuarracino do you have ungfaffixed graphs of the new HPRC assemblies that I can test on?

AndreaGuarracino · 2024-12-21T16:52:22Z

Not yet, they'll come soon. I've started using the new gfaffix everywhere, so it is already "in production".

…

________________________________ From: Daniel Doerr ***@***.***> Sent: Saturday, December 21, 2024 03:10 To: marschall-lab/GFAffix ***@***.***> Cc: Andrea Guarracino ***@***.***>; Mention ***@***.***> Subject: Re: [marschall-lab/GFAffix] speeding up deep graphs (Issue #2) @AndreaGuarracino<https://github.com/AndreaGuarracino> do you have ungfaffixed graphs of the new HPRC assemblies that I can test on? — Reply to this email directly, view it on GitHub<#2 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AO26XHUH5LQIME267YIGIUT2GUWBRAVCNFSM6AAAAABT56CEJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJYGA2TQNJUHA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

danydoerr · 2025-01-04T09:30:39Z

New version is released now. It still requires improvement in speed and memory usage for these large graphs. The bottleneck is now only the i/o part of gfaffix, and I already have some concrete ideas how to improve it. I'll leave this issue open for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speeding up deep graphs #2

speeding up deep graphs #2

ekg commented May 17, 2022

danydoerr commented May 18, 2022 •

edited

Loading

natir commented May 19, 2022

danydoerr commented May 19, 2022

natir commented May 19, 2022

ekg commented May 19, 2022 via email

danydoerr commented May 19, 2022

ekg commented May 19, 2022 via email

danydoerr commented May 19, 2022

natir commented May 19, 2022

ekg commented Jun 8, 2022

danydoerr commented Mar 24, 2023

AndreaGuarracino commented Dec 19, 2024

danydoerr commented Dec 20, 2024 •

edited

Loading

AndreaGuarracino commented Dec 20, 2024 via email

danydoerr commented Dec 20, 2024 •

edited

Loading

AndreaGuarracino commented Dec 20, 2024

danydoerr commented Dec 20, 2024 via email •

edited

Loading

danydoerr commented Dec 21, 2024

AndreaGuarracino commented Dec 21, 2024 via email

danydoerr commented Jan 4, 2025

speeding up deep graphs #2

speeding up deep graphs #2

Comments

ekg commented May 17, 2022

danydoerr commented May 18, 2022 • edited Loading

natir commented May 19, 2022

danydoerr commented May 19, 2022

natir commented May 19, 2022

ekg commented May 19, 2022 via email

danydoerr commented May 19, 2022

ekg commented May 19, 2022 via email

danydoerr commented May 19, 2022

natir commented May 19, 2022

ekg commented Jun 8, 2022

danydoerr commented Mar 24, 2023

AndreaGuarracino commented Dec 19, 2024

danydoerr commented Dec 20, 2024 • edited Loading

AndreaGuarracino commented Dec 20, 2024 via email

danydoerr commented Dec 20, 2024 • edited Loading

AndreaGuarracino commented Dec 20, 2024

danydoerr commented Dec 20, 2024 via email • edited Loading

danydoerr commented Dec 21, 2024

AndreaGuarracino commented Dec 21, 2024 via email

danydoerr commented Jan 4, 2025

danydoerr commented May 18, 2022 •

edited

Loading

danydoerr commented Dec 20, 2024 •

edited

Loading

danydoerr commented Dec 20, 2024 •

edited

Loading

danydoerr commented Dec 20, 2024 via email •

edited

Loading