Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to pick the path with longer reads when multiple circular genome structures produced #93

Closed
hungweichen0327 opened this issue Aug 25, 2021 · 4 comments

Comments

@hungweichen0327
Copy link

Dear @Kinggerm ,

Thank you for the convenient and useful software. I use GetOrganelle to assemble the chloroplast of my plant species and I got 6 fasta files with different path. The output fastg and log file is below:

get_org.log.txt
compressed_fastg.zip

The figure below is the obtained from Bandage using extended_K105.assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg.

extended_K105 assembly_graph fastg extend-embplant_pt-embplant_mt

I quickly read the issue #25 and #86, but I am not sure whether I would use plastome_arch_info.py in my case. Besides, I found the warning in the log file said:

WARNING: Please check the existence of those isomers by using reads mapping (library information) or longer reads.

I also have Nanopore long reads data in my plant. Did I use long reads and map the the circular genome generated by GetOrganelle? And this plant species has also published a chloroplast genome in another accession.

Do you have any suggestion to solve the problem of multiple circular genome structure in my case? (Adjust the GetOrganelle parameter and run again? Map this new genome with Nanopore long read? Use complete reference chloroplast genome of the same species?)

Thank you for the help!

@Kinggerm
Copy link
Owner

You can simply choose the results of repeat_pattern1 in your attached case. You don't need plastome_arch_info.py nor Nanopore long reads.
You can refer to GetOrganelle-paper-Fig2 and FAQ scenario, which your case totally matches.

Let me know if you have further questions, otherwise feel free to close this issue

@hungweichen0327
Copy link
Author

Dear @Kinggerm,

Thank you for the quick reply. After reading the "Why are there so many *path_sequence.fasta files?" part in FAQ scenario, I am not sure whether I clearly understand. Three additional questions I would like to confirm:

  1. In FAQ scenario, you mentioned that "If you are assembling the plastome with IRs, the assembly result should be two equimolar isomeric sequences. Both of them are right and coexist in the plant (Palmer 1983; Walker et al. 2015)." Is that mean there are two kinds of chloroplast in a plant and they both co-exist? So scientists usually use one to represent the chloroplast sequence in their targeted plant accession.

  2. The reason that I got six fasta files with different path rather than just two is because the other repeats in the chloroplast genome. In FAQ scenario, you mentioned that "In this Fig 2 plastome case, the two symmetrical configurations (inside the green dashed-line box) out of all six configurations are more likely to be the two real plastome structures." How to know the rest 4 configurations are not real plastome?
    And you also mentioned that "A function that incorporates long library reads or long-read sequencing data in estimating the proportion of all candidate isomers is planned for future versions of GetOrganelle." Is that mean only one (or two) of six configurations would be the real plastome?

  3. Currently, there is already a published chloroplast genome in my plant species but not the same accession. Is it necessary to align this reference chloroplast genome and choose the most similar one from six configurations?

I am sorry if I ask basic questions. It's my first time to assemble chloroplast genome in plant and get to know the chloroplast structure recently. Once again, thanks for the useful software. I am looking forward to you reply. Thank you!

@Kinggerm
Copy link
Owner

@hungweichen0327

  1. correct
  2. IRs are generally symmetric. So other four candidate configurations with asymmetric IRs (or say shorter IRs) may less likely be real. You can further confirm this by using longer library or PCR.
  3. Usually, you can do that to make downstream analysis easier.

Hope these help.

@hungweichen0327
Copy link
Author

Dear @Kinggerm,

Yes, you replied precisely. Thank you for the help and I close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants