Replies: 4 comments 8 replies
-
@laurahspencer Do you have any insight into parameters since you're working on something similar? |
Beta Was this translation helpful? Give feedback.
-
Related question! I ran this code:
I thought the wildcard would allow for multiple samples to be processed, but looks like BS-Snper only processes one file at a time. Is that normal for SNP calling? If I have multiple samples I want to consider when calling SNPs, do I call SNPs for each sample separately, then concatenate the data? |
Beta Was this translation helpful? Give feedback.
-
One of my fave snp filtering papers is this one: @yaaminiv if you are just trying to identify C/T snps to remove them, you probably don't need to worry too much about making sure you have high-confidence snps (that is, not due to genotyping error). But you are right, you probably only want to remove SNPs found in at least 2 individuals. You can use BS-snper to make a vcf file for each individual separately, then combine the vcfs with vcfmerge in vcftools . You can then use vcftools to filter your combined vcf to snps with a Minor Allele Count > 3 (--mac 3). |
Beta Was this translation helpful? Give feedback.
-
I'm very late to the party but just wanted to add I also like the paper @ksil91 shared and that the pipelines I've used don't have these parameters except for minimum mapping quality per base, for which I've used 20. Hope you're feeling better about it a couple of weeks out @yaaminiv! |
Beta Was this translation helpful? Give feedback.
-
TL;DR: Is there a good resource for understanding parameters used in SNP calling? BS-Snper has minimal documentation.
Context
Currently running BS-Snper in this Jupiter notebook with minimal arguments specified! On the off-chance I get code to execute properly the first time, I want to understand some of the other BS-Snper options I can use when calling SNPs.
The full list of options is in their README. Below are parameters I'm unfamiliar with + their explanations from the README:
--minhetfreq: Threshold of frequency for calling heterozygous SNP
--minhomfreq: Threshold of frequency for calling homozygous SNP
--minquali: Threshold of base quality
--minread2: Minimum mutation reads number
--errorate: Minimum mutation rate
--mapvalue: Minimum read mapping value
Are there certain thresholds that are standard practice for calling SNPs (i.e., a minimal mutation reads or mutation rate that's used in the literature)? I can do my own search but figured I'd ask first.
Beta Was this translation helpful? Give feedback.
All reactions