Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimate length of long homopolymers in DNA #63

Open
dweemx opened this issue Jan 29, 2024 · 5 comments
Open

Estimate length of long homopolymers in DNA #63

dweemx opened this issue Jan 29, 2024 · 5 comments

Comments

@dweemx
Copy link

dweemx commented Jan 29, 2024

Hello,
Is it possible to use SquiggleKit to estimate the length of homopolymers in each read?

I guess I would have to use the Segmenter tool together with SquigglePull? However I'm not too sure about the options to use. It would be much appreciated if you could help me with this

@Psy-Fer
Copy link
Owner

Psy-Fer commented Jan 29, 2024

Hello,

Yes segmenter would probably be the tool to use. Estimation could be done roughly from sequencing speed and sampling rate, although it would be pretty rough estimate.

James

@dweemx
Copy link
Author

dweemx commented Jan 30, 2024

Ok, thank you for prompt response. I guess that would be still a better estimate than estimating the homopolymers length at the read level?

Is the tool (Segmenter) agnostic on the chemistry (e.g. Kit 14) ?

For running the Segmenter.py, I'm not familiar with the different parameters that can be set. I expect my long poly(A/T) to be between 80bp and 200bp. Do you suggest to adapt some of the default values of the parameters?

I read the documentation but not too sure about setting following parameters for estimating homopolymers:

  • -k --stall | False
  • -j --stall_start | 300
  • -g --gap | False
  • -b -gap_dist

If I understand well, the output of the Segmenter.py is a tsv file that will contain the sample window of the homopolymers which I will then use together with the sequencing speed and sampling rate?

@Psy-Fer
Copy link
Owner

Psy-Fer commented Jan 30, 2024

Hey,

Ahh yes something like that. Let me look at that and get back to you. Need to have a look at it, it's been a few years...

Also are you going from fast5, slow5, or pod5 files?

James

@dweemx
Copy link
Author

dweemx commented Jan 30, 2024

I have pod5 files (but otherwise I could also work with fast5 files with a simple conversion)

Thanks a lot for looking into this

@dweemx dweemx changed the title Estimate Length of Long Homopolymers in DNA Estimate length of long homopolymers in DNA Jan 30, 2024
@Psy-Fer
Copy link
Owner

Psy-Fer commented Jan 31, 2024

Hey, do you have an example read you could share with me that has one of these homopolymers in it?

It would help me a lot with what the expected bounds should be for detection. Originally, segmenter was designed around R9.4.1 cDNA (using our RAGE-seq 10X single cell data). So it's probably worth the time letting me run a read from regular DNA with a homopolymer example you want to measure so I can make sure everything is in order.

James

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants