Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using -super5 with Muscle3D #85

Open
pentamorfico opened this issue Nov 21, 2024 · 6 comments
Open

Error when using -super5 with Muscle3D #85

pentamorfico opened this issue Nov 21, 2024 · 6 comments

Comments

@pentamorfico
Copy link

pentamorfico commented Nov 21, 2024

Hi Robert!

First of all, thanks you so much for your time dedicated to software development and making bioinformaticians lives easier!

I'm trying to aling around 90k PDBs from Alphafold using Muscle 3D, following this commands in a machine with 2TB of RAM and 128 threads.

reseek -pdb2mega second_round/ -output second_round.mega && muscle -super5 second_round.mega -output second_round.afa

However, I always get the following error:

Mega::GetProfileByLabel(Cluster2) with different cluster numbers depending on the run.

Is -super5 compatible with Muscle3D? When I run -align with >1k sequences, I get the warning >1k sequences, may be slow or use excessive memory, consider using -super5

I also tried with smaller alignments (~200 PDBs) and I also get the same error. When I try with -align instead of -super5 everything works nice!

Is it advisable to run Muscle 3D with >90k sequences without the -super5 command? What should be the best strategy for this?

Thanks for your time,
Mario

@rcedgar
Copy link
Owner

rcedgar commented Nov 21, 2024

-super5 is for aa, for structures use -super7 which is briefly documented in the repo README as follows:

# for up to ~10,000 structures
reseek -convert STRUCTS -bca structs.bca
reseek -pdb2mega structs.bca -output structs.mega
reseek -distmx structs.bca -output structs.distmx
muscle -super7 structs.mega -distmxin structs.distmx -reseek -output structs.afa

I haven't tried 90k structures, I think a good chance it will work though the alignment might be better if you cluster the structures first. Reseek has an undocumented clustering command -- if you want to give that a try let me know & I'll sketch out how to use it.

The usage message given by muscle does not explain this and the documentation at the web site does not mention structure at all yet -- the documentation could certainly be improved here.

If you find the 90k alignment useful, I'd be interested to learn more, maybe you could email me?

@pentamorfico
Copy link
Author

Hey @rcedgar, thank you so much for the super-fast reply!

Sorry, I completely missed that part of the README. I think I was too excited to try it out and jumped straight to launching it!

I will try again and let you know if it worked. For now, I'm interested in having the 90k structure-based alignment to compare it with a sequence-based alignment. If that doesn't work, I will try with reseek clustering first. I already compared Foldmason vs Muscle3D with a smaller set and Muscle worked way better for me.

I will definitely send you an email with more information about the project in case you are interested!

Best,
Mario

@pentamorfico
Copy link
Author

Hi @rcedgar I tried again following these commands:

reseek -convert second_round/ -bca second_round.bca
reseek -pdb2mega second_round.bca -output second_round.mega
reseek -distmx second_round.bca -output second_round.distmx -verysensitive
muscle -super7 second_round.mega -distmxin second_round.distmx -reseek -output second_round.afa

And I get the following error:

---Fatal error---
Distance matrix too sparse

@rcedgar
Copy link
Owner

rcedgar commented Nov 22, 2024

This means that the structures are very divergent, I think it's unlikely you will be able to make a meaningful MStA here. Happy to discuss further if you send me an email.

@pentamorfico
Copy link
Author

Hi @rcedgar, I've checked and the distance matrix second_round.distmx is completely empty, it is only showing the IDs of the proteins. I will investigate more in depth what can be causing this issue.

I've sent you a mail with some more info!

@rcedgar
Copy link
Owner

rcedgar commented Nov 28, 2024

This seems to be an issue with the conda build, close as resolved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants