-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSHash as git submodule #2
Comments
I think this makes sense, but right now there are some other changes that go a bit deeper that we would need to resolve. For example a few other files in the sshash source tree have been modified — for example, the builder was modified to consume the cuttlefish reduced GFA format, some functions were added to the utilities file, and some classes were made friends of other classes because of access requirements. Also, I just a couple of days ago moved a bunch of logging messages to I agree having composability of source code as well as conceptually would be very nice. We could figure out what outside of the standard API would need to be exposed and figure out a minimal way to do that. |
Oh ok sure, I forgot about the modified builder! |
Keep track to modifications/additions to SSHash:
others? |
It's even simpler; each line is basically just a numeric identifier, followed by the unitig it labels.
Yes, and we can decide what code belongs in
Yup — not sure what exactly has changed here, but not too much I think.
So that one I just sort of lept in on, as I'm super biased in favor of |
Oh yes, I think you passed me some examples some time ago.
we just have
Yes, exactly the same I have in mind. That will also help very much in conveying the main take-home message of the library:
Sure, I like it and it's much more evolved than my simple |
I think we should do this ASAP! :) |
Hi @jermp! Thanks for the updates. I agree 100%. The longer we wait, the more painful the eventual re-integration with upstream will become. However, right now there are several things we'd have to add / modify regarding specific interfaces to interact cleanly with the piscem code. Mostly what's listed above regarding the logger (which is actually a bit more complicated because now we have spdlog but in a custom namespace to avoid C++ multiple symbol silliness when we compile piscem-cpp along with cuttlefish), and additionally there are 2 other tweaks that I'm aware of offhand. First, there are a few added functions in this branch to assist in fast cached search (and to account for the observations we've made before that one must be extra careful when using caching on non-subsequent k-mers), and also an alteration to the default nucleotide encoding to use A:0, C:1, G:2, T:3 to match with the Kmer class used in the rest of the code. I think we should create an sshash-submodule-integration branch, and once we have integrated, modified, and tested sshash as a submodule, we can merge that back into dev (and main). --Rob |
Ok, I'll create a new branch for SSHash,
For this matter, I think the cleanest way to proceed is to rely on the already present logging mechanism for SSHash (we can make sure that nothing is printed if
For this one, we could have a compiler flag for SSHash which specifies what encoding to use. What do you think? |
Awesome!
I agree this is the cleanest solution, though not totally ideal (as we won't get logging out of the sshash part and diagnosing any weird errors may be harder). Perhaps we can add a new command line flag like
Sure that seems easy enough to add to the build script. Once we start the integration we'll presumably run into the functions that are present in the piscem fork that aren't upstream. For those we can decide whether or not they are worth adding upstream to sshash, or if we should reorganize the relevant call sites in piscem. |
Ok, branch created and solved the second issue of the encoding of nucleotides. See commit here jermp/sshash@fd2cc09 |
Awesome! I've created a corresponding branch of this repo, where we can make relevant integration changes on this side. You should have write access, but let me know if you don't. |
Perfect. The new branch is up to date with |
Ok done it! I've taken the liberty to also clean the outdated README :D Will write a new one from scratch. |
Ok, so I think we should have a checklist of differences and things to tackle (below is what I can think of now off the top of my head, but feel free to add more and I will add more if I recall any) :
|
I think that it would be better if SSHash were used as a submodule, to better highlight the "separation of concerns".
The only current modifications are done in the streaming query object, but that can also be reflected in the SSHash repo.
Or, are there any hard limits to this?
For the moment, we can leave everything as it is, of course.
But in the long term, I think piscem should highlight the composition SSHash+contig-table.
The text was updated successfully, but these errors were encountered: