Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes Issue #94 - #108

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Fixes Issue #94 - #108

wants to merge 4 commits into from

Conversation

Klangina
Copy link
Contributor

@Klangina Klangina commented Oct 22, 2024

  • Refactored and merged R/tree.R CreateFA2Tree and ConvertFA2Tree.
  • Parametrized every hardcoded value and path in ConvertFA2Tree
  • Refactored convertAlignment2Trees to support running both modes.
  • Backward compatible
  • Updated all documentation for R/tree.R

Description

What kind of change(s) are included?

  • Feature (adds or updates new capabilities)
  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have searched for existing content to ensure this is not a duplicate.
  • I have performed a self-review of these additions (including spelling, grammar, and related).
  • I have added comments to my code to help provide understanding.
  • I have added a test which covers the code changes found within this PR.
  • I have deleted all non-relevant text in this pull request template.
  • Reviewer assignment: Tag a relevant team member to review and approve the changes.

@Klangina
Copy link
Contributor Author

@jananiravi I have made the changes. Kindly review/guide me for further steps if required in this Issue.

 - Refactored  and merged R/tree.R CreateFA2Tree and ConvertFA2Tree.
- Parametrized every hardcoded value and path in ConvertFA2Tree
- Refactored convertAlignment2Trees to support running both modes.
- Backward compatible
- Updated all documentation for R/tree.R
@Klangina
Copy link
Contributor Author

FIxed the commit message to reflect the correct issue number

@Klangina Klangina changed the title Fixes Issue #90 - Fixes Issue #94 - Oct 22, 2024
Copy link
Contributor

@epbrenner epbrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple thoughts and suggestions as I passed through. I'm not 100% certain on how the tree generation is meant to flow from the older script you're refactoring, so please ask me for clarification or feel free to just toss suggestions if they don't align with the actual logic going on here!

R/tree.R Outdated
# estimate a distance matrix using a Jules-Cantor Model
dna_dist <- dist.ml(prot10, model = "JC69")
# estimate a distance matrix using a Jules-Cantor Model
dna_dist <- dist.ml(prot_subset, model = dna_model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dna_dist <- dist.ml(prot_subset, model = dna_model)
prot_dist <- dist.ml(prot_subset, model = mt)

I know the original outdated code references DNA here, but I believe this should all be protein based on the inputs. If the change to prot_dist is correct, the code below will also need modification to reflect the variable name changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jananiravi : Can you confirm if the above can be considered correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epbrenner are these the unused tree generators? Using dna could be a stub? Maybe change to protein for now (will work with current molevolvr) or make it a function argument to take dna or protein fasta sequences -- future-proofing the function given starting with DNA seqs will be part of molevolvr sooner rather than later. Does that answer your question, @Klangina ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure @jananiravi, I will create an argument to switch between DNA and protein sequences and keep the default as protein. Will update the pr shortly.

R/tree.R Outdated
plot(prot_NJ, main = "Neighbor Joining")
## Neighbor Joining, UPGMA, and Maximum Parsimony
# quick and dirty UPGMA and NJ trees
prot_UPGMA <- upgma(dna_dist)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prot_UPGMA <- upgma(dna_dist)
prot_UPGMA <- upgma(prot_dist)

R/tree.R Outdated
## Neighbor Joining, UPGMA, and Maximum Parsimony
# quick and dirty UPGMA and NJ trees
prot_UPGMA <- upgma(dna_dist)
prot_NJ <- NJ(dna_dist)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prot_NJ <- NJ(dna_dist)
prot_NJ <- NJ(prot_dist)

R/tree.R Outdated
# ml estimation w/ distance matrix
fit <- pml(prot_NJ, prot_subset)
print(fit)
fitJC <- optim.pml(fit, model = ml_model, rearrangement = rearrangement)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fitJC <- optim.pml(fit, model = ml_model, rearrangement = rearrangement)
fit_optimized <- optim.pml(fit, model = fit, rearrangement = rearrangement)

Or something similar. JC here stands for Jukes-Cantor, which is a DNA substitution model, so removing that from the variable. I believe the model in model = ml_model will actually be the prior tree generated by pml(), which would be fit here, and then optim.pml() optimizes the previous model. Do double-check me on this, though!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, switching model based on DNA/protein input could be part of the function, too.

R/tree.R Outdated
fit <- pml(prot_NJ, prot_subset)
print(fit)
fitJC <- optim.pml(fit, model = ml_model, rearrangement = rearrangement)
logLik(fitJC)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logLik(fitJC)
logLik(fit_optimized)

If above suggestion is correct.

R/tree.R Outdated
bs = bootstrap_reps, optNni = TRUE, multicore = TRUE,
control = pml.control(trace = 0)
)
plotBS(midpoint(fitJC$tree), bs, p = bootstrap_p, type = plot_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
plotBS(midpoint(fitJC$tree), bs, p = bootstrap_p, type = plot_type)
plotBS(midpoint(fit_optimized$tree), bs, p = bootstrap_p, type = plot_type)

R/tree.R Outdated
prot_subset_NJ <- NJ(prot_subset_dm)
fit2 <- pml(prot_subset_NJ, data = prot_subset)
print(fit2)
fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement)
fit_optimized2 <- optim.pml(fit2, model = fit2, rearrangement = rearrangement)

As before, if model should be the output of pml() then update. Discard if I'm mistaken though!

R/tree.R Outdated
fit2 <- pml(prot_subset_NJ, data = prot_subset)
print(fit2)
fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement)
logLik(fitJC2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logLik(fitJC2)
logLik(fit_optimized2)

R/tree.R Outdated
print(fit2)
fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement)
logLik(fitJC2)
bs_subset <- bootstrap.pml(fitJC2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bs_subset <- bootstrap.pml(fitJC2,
bs_subset <- bootstrap.pml(fit_optimized2,

R/tree.R Outdated
bs = bootstrap_reps, optNni = TRUE, multicore = TRUE,
control = pml.control(trace = 0)
)
bs2 <- plotBS(midpoint(fitJC2$tree), bs, p = bootstrap_p, type = plot_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bs2 <- plotBS(midpoint(fitJC2$tree), bs, p = bootstrap_p, type = plot_type)
bs2 <- plotBS(midpoint(fit_optimized2$tree), bs, p = bootstrap_p, type = plot_type)

- renamed dna_dist to seq_dist
- renamed fitJC and fitJC2 to fit_optimised and fit_optimised2 to make variable names more general.
- Updated code and documentation accordingly.
@Klangina
Copy link
Contributor Author

@epbrenner @jananiravi : Made changes as per recommendations before. Kindly review.

… following two issues:

createFA2Tree: no visible global function definition for ‘midpoint’
createFA2Tree: no visible binding for global variable ‘fit_optimized2’
@Klangina Klangina mentioned this pull request Oct 28, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants