-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes Issue #94 - #108
base: main
Are you sure you want to change the base?
Fixes Issue #94 - #108
Conversation
@jananiravi I have made the changes. Kindly review/guide me for further steps if required in this Issue. |
- Refactored and merged R/tree.R CreateFA2Tree and ConvertFA2Tree. - Parametrized every hardcoded value and path in ConvertFA2Tree - Refactored convertAlignment2Trees to support running both modes. - Backward compatible - Updated all documentation for R/tree.R
FIxed the commit message to reflect the correct issue number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple thoughts and suggestions as I passed through. I'm not 100% certain on how the tree generation is meant to flow from the older script you're refactoring, so please ask me for clarification or feel free to just toss suggestions if they don't align with the actual logic going on here!
R/tree.R
Outdated
# estimate a distance matrix using a Jules-Cantor Model | ||
dna_dist <- dist.ml(prot10, model = "JC69") | ||
# estimate a distance matrix using a Jules-Cantor Model | ||
dna_dist <- dist.ml(prot_subset, model = dna_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dna_dist <- dist.ml(prot_subset, model = dna_model) | |
prot_dist <- dist.ml(prot_subset, model = mt) |
I know the original outdated code references DNA here, but I believe this should all be protein based on the inputs. If the change to prot_dist is correct, the code below will also need modification to reflect the variable name changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jananiravi : Can you confirm if the above can be considered correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@epbrenner are these the unused tree generators? Using dna could be a stub? Maybe change to protein for now (will work with current molevolvr) or make it a function argument to take dna or protein fasta sequences -- future-proofing the function given starting with DNA seqs will be part of molevolvr sooner rather than later. Does that answer your question, @Klangina ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure @jananiravi, I will create an argument to switch between DNA and protein sequences and keep the default as protein. Will update the pr shortly.
R/tree.R
Outdated
plot(prot_NJ, main = "Neighbor Joining") | ||
## Neighbor Joining, UPGMA, and Maximum Parsimony | ||
# quick and dirty UPGMA and NJ trees | ||
prot_UPGMA <- upgma(dna_dist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prot_UPGMA <- upgma(dna_dist) | |
prot_UPGMA <- upgma(prot_dist) |
R/tree.R
Outdated
## Neighbor Joining, UPGMA, and Maximum Parsimony | ||
# quick and dirty UPGMA and NJ trees | ||
prot_UPGMA <- upgma(dna_dist) | ||
prot_NJ <- NJ(dna_dist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prot_NJ <- NJ(dna_dist) | |
prot_NJ <- NJ(prot_dist) |
R/tree.R
Outdated
# ml estimation w/ distance matrix | ||
fit <- pml(prot_NJ, prot_subset) | ||
print(fit) | ||
fitJC <- optim.pml(fit, model = ml_model, rearrangement = rearrangement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fitJC <- optim.pml(fit, model = ml_model, rearrangement = rearrangement) | |
fit_optimized <- optim.pml(fit, model = fit, rearrangement = rearrangement) |
Or something similar. JC here stands for Jukes-Cantor, which is a DNA substitution model, so removing that from the variable. I believe the model in model = ml_model
will actually be the prior tree generated by pml()
, which would be fit
here, and then optim.pml()
optimizes the previous model. Do double-check me on this, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, switching model based on DNA/protein input could be part of the function, too.
R/tree.R
Outdated
fit <- pml(prot_NJ, prot_subset) | ||
print(fit) | ||
fitJC <- optim.pml(fit, model = ml_model, rearrangement = rearrangement) | ||
logLik(fitJC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logLik(fitJC) | |
logLik(fit_optimized) |
If above suggestion is correct.
R/tree.R
Outdated
bs = bootstrap_reps, optNni = TRUE, multicore = TRUE, | ||
control = pml.control(trace = 0) | ||
) | ||
plotBS(midpoint(fitJC$tree), bs, p = bootstrap_p, type = plot_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plotBS(midpoint(fitJC$tree), bs, p = bootstrap_p, type = plot_type) | |
plotBS(midpoint(fit_optimized$tree), bs, p = bootstrap_p, type = plot_type) |
R/tree.R
Outdated
prot_subset_NJ <- NJ(prot_subset_dm) | ||
fit2 <- pml(prot_subset_NJ, data = prot_subset) | ||
print(fit2) | ||
fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement) | |
fit_optimized2 <- optim.pml(fit2, model = fit2, rearrangement = rearrangement) |
As before, if model
should be the output of pml()
then update. Discard if I'm mistaken though!
R/tree.R
Outdated
fit2 <- pml(prot_subset_NJ, data = prot_subset) | ||
print(fit2) | ||
fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement) | ||
logLik(fitJC2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logLik(fitJC2) | |
logLik(fit_optimized2) |
R/tree.R
Outdated
print(fit2) | ||
fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement) | ||
logLik(fitJC2) | ||
bs_subset <- bootstrap.pml(fitJC2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bs_subset <- bootstrap.pml(fitJC2, | |
bs_subset <- bootstrap.pml(fit_optimized2, |
R/tree.R
Outdated
bs = bootstrap_reps, optNni = TRUE, multicore = TRUE, | ||
control = pml.control(trace = 0) | ||
) | ||
bs2 <- plotBS(midpoint(fitJC2$tree), bs, p = bootstrap_p, type = plot_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bs2 <- plotBS(midpoint(fitJC2$tree), bs, p = bootstrap_p, type = plot_type) | |
bs2 <- plotBS(midpoint(fit_optimized2$tree), bs, p = bootstrap_p, type = plot_type) |
- renamed dna_dist to seq_dist - renamed fitJC and fitJC2 to fit_optimised and fit_optimised2 to make variable names more general. - Updated code and documentation accordingly.
@epbrenner @jananiravi : Made changes as per recommendations before. Kindly review. |
… following two issues: createFA2Tree: no visible global function definition for ‘midpoint’ createFA2Tree: no visible binding for global variable ‘fit_optimized2’
Description
What kind of change(s) are included?
Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.