How to pick the path with longer reads when multiple circular genome structures produced #93

hungweichen0327 · 2021-08-25T02:07:07Z

Thank you for the convenient and useful software. I use GetOrganelle to assemble the chloroplast of my plant species and I got 6 fasta files with different path. The output fastg and log file is below:

get_org.log.txt
compressed_fastg.zip

The figure below is the obtained from Bandage using extended_K105.assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg.

I quickly read the issue #25 and #86, but I am not sure whether I would use plastome_arch_info.py in my case. Besides, I found the warning in the log file said:

WARNING: Please check the existence of those isomers by using reads mapping (library information) or longer reads.

I also have Nanopore long reads data in my plant. Did I use long reads and map the the circular genome generated by GetOrganelle? And this plant species has also published a chloroplast genome in another accession.

Do you have any suggestion to solve the problem of multiple circular genome structure in my case? (Adjust the GetOrganelle parameter and run again? Map this new genome with Nanopore long read? Use complete reference chloroplast genome of the same species?)

Thank you for the help!

Kinggerm · 2021-08-25T17:16:18Z

You can simply choose the results of repeat_pattern1 in your attached case. You don't need plastome_arch_info.py nor Nanopore long reads.
You can refer to GetOrganelle-paper-Fig2 and FAQ scenario, which your case totally matches.

Let me know if you have further questions, otherwise feel free to close this issue

hungweichen0327 · 2021-08-26T02:37:02Z

Dear @Kinggerm,

Thank you for the quick reply. After reading the "Why are there so many *path_sequence.fasta files?" part in FAQ scenario, I am not sure whether I clearly understand. Three additional questions I would like to confirm:

In FAQ scenario, you mentioned that "If you are assembling the plastome with IRs, the assembly result should be two equimolar isomeric sequences. Both of them are right and coexist in the plant (Palmer 1983; Walker et al. 2015)." Is that mean there are two kinds of chloroplast in a plant and they both co-exist? So scientists usually use one to represent the chloroplast sequence in their targeted plant accession.
The reason that I got six fasta files with different path rather than just two is because the other repeats in the chloroplast genome. In FAQ scenario, you mentioned that "In this Fig 2 plastome case, the two symmetrical configurations (inside the green dashed-line box) out of all six configurations are more likely to be the two real plastome structures." How to know the rest 4 configurations are not real plastome?
And you also mentioned that "A function that incorporates long library reads or long-read sequencing data in estimating the proportion of all candidate isomers is planned for future versions of GetOrganelle." Is that mean only one (or two) of six configurations would be the real plastome?
Currently, there is already a published chloroplast genome in my plant species but not the same accession. Is it necessary to align this reference chloroplast genome and choose the most similar one from six configurations?

I am sorry if I ask basic questions. It's my first time to assemble chloroplast genome in plant and get to know the chloroplast structure recently. Once again, thanks for the useful software. I am looking forward to you reply. Thank you!

Kinggerm · 2021-08-27T17:47:11Z

@hungweichen0327

correct
IRs are generally symmetric. So other four candidate configurations with asymmetric IRs (or say shorter IRs) may less likely be real. You can further confirm this by using longer library or PCR.
Usually, you can do that to make downstream analysis easier.

Hope these help.

hungweichen0327 · 2021-08-30T03:46:52Z

Dear @Kinggerm,

Yes, you replied precisely. Thank you for the help and I close the issue.

hungweichen0327 closed this as completed Aug 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to pick the path with longer reads when multiple circular genome structures produced #93

How to pick the path with longer reads when multiple circular genome structures produced #93

hungweichen0327 commented Aug 25, 2021

Kinggerm commented Aug 25, 2021

hungweichen0327 commented Aug 26, 2021

Kinggerm commented Aug 27, 2021

hungweichen0327 commented Aug 30, 2021

How to pick the path with longer reads when multiple circular genome structures produced #93

How to pick the path with longer reads when multiple circular genome structures produced #93

Comments

hungweichen0327 commented Aug 25, 2021

Kinggerm commented Aug 25, 2021

hungweichen0327 commented Aug 26, 2021

Kinggerm commented Aug 27, 2021

hungweichen0327 commented Aug 30, 2021