You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BuddySuite does not work well on very large files, because the Buddy classes read everything into memory.
File size can be pre-determined, and perhaps files can be managed as handles. This will likely require a major rewrite.
The best way to do this would probably require reading the files in buffered chunks, which would definitely require a rewrite of pretty much everything. I'm also not sure how you would go about parsing a chunk of a sequence file, especially in alignment formats. If you want to cut down on memory usage it's probably easiest to just look for places where stuff is being copied unnecessarily (I noticed before that sometimes BuddySuite was using several times more memory than the file size, especially with larger file sizes, so there's probably room for improvement). On the other hand, it could be that a lot of that is biopython's fault in which case it would be hard to fix.
In terms of sqlite databases, you could make it so that people could have BuddySuite load their massive files and store them in a reusable sqlite database for future use and reuse. That way I think you'd limit the overhead on future runs by being able to access the sequences straight from the database, without having to parse the files or load everything into memory at once. I'm not entirely sure this would yield a performance increase though, it depends on how sqlite databases work under the hood (if it's all loaded into memory then basically nothing is gained except for skipping the parsing step).
BuddySuite does not work well on very large files, because the Buddy classes read everything into memory.
File size can be pre-determined, and perhaps files can be managed as handles. This will likely require a major rewrite.
Any way to leverage SQLite databases?
This may be useful (not sure yet)
https://github.com/mdshw5/pyfaidx
The text was updated successfully, but these errors were encountered: