-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
develop dosage caller #7
Conversation
|
||
dosage <- rep(NA, length(called.alleles)); | ||
for (i in 1:length(called.alleles)) { | ||
if ((split.alleles$called.allele.a[i] == missing.label) & (split.alleles$called.allele.b[i] == missing.label)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now this dosage calling algorithm does not explicitly account for a case where one of the alleles is known and the other is missing e.g. A/. or ./A. I don't recall if this is a possible result from variant calling. I would expect that this is not possible, and if it was, I don't really know how I would interpret that result. As it is, the algorithm just counts the number of risk alleles so if A is a risk allele the above examples would result in dosage 1. @tyamaguchi-ucla would you mind chiming in briefly on whether you think a dosage calling algorithm should allow this type of input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alkaZeltser There are so many variant callers, so it's possible we might encounter this case, although I'm not sure if I've seen one. I suggest we treat any edge cases as warnings (still skip them) and update the package if necessary.
@forbiddenpersimmon Sorry I realized that I really dropped the ball on formatting checks in this function. Added more checks and lots of new tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice changes to the function! It's clearer now what checks it is performing on the input vectors. Tests also are more comprehensive, LGTM on that front!
There's one more lintr error related to whitespace to fix. Do you have an extension installed to highlight trailing spaces? I would recommend installing one, if not.
Added just one more test, for using real data. I'll check with Taka about the ./A case on Friday during our 1:1 and then merge. |
I do! But it's kinda subtle and I still miss it sometimes lol! |
@forbiddenpersimmon I made one more adjustment to handle the half missing genotype case. Would you mind just tracing through the main allele calling logic one more time, to make sure I didn't miss something? |
Missing semi-colon here. Logic looks good with new case! |
This PR contains one new function: the dosage caller. It converts a genotype input (in the form of alphabetical alleles) into 0/1/2 dosages relative to the specified risk allele.
Also unit tests as per usual.
I have read the code review guidelines and the code review best practice on GitHub check-list.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)-[brief_description_of_branch].
I have set up or verified the branch protection rule following the github standards before opening this pull request.
I have added the changes included in this pull request to
NEWS
under the next release version or unreleased, and updated the date.I have updated the version number in
metadata.yaml
andDESCRIPTION
.Both
R CMD build
andR CMD check
run successfully.Testing Results
All tests pass.