Skip to content

Commit

Permalink
Merge pull request #32 from zjnolen/domajorminor
Browse files Browse the repository at this point in the history
Enable changing -doMajorMinor and -doMaf in Beagle (and maf file) calculation
  • Loading branch information
zjnolen authored Feb 14, 2024
2 parents 7180ede + c7d00ae commit 31e67a4
Show file tree
Hide file tree
Showing 5 changed files with 61 additions and 22 deletions.
7 changes: 4 additions & 3 deletions .test/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,7 @@ params:
filt-on-depth-classes: true
fastp:
extra: "-p -g" # don't put --merge or --overlap_len_require here, they're implicit
min_overlap_mod: 30
min_overlap_hist: 15
min_overlap_hist: 20
bwa_aln:
extra: "-l 16500 -n 0.01 -o 2"
picard:
Expand All @@ -128,6 +127,8 @@ params:
extra: ""
extra_saf: ""
extra_beagle: ""
domajorminor: 1 # Currently only compatible with values 1, 2, 4, 5
domaf: 1
snp_pval: "1e-6"
min_maf: 0.05
mindepthind_heterozygosity: 3
Expand All @@ -141,7 +142,7 @@ params:
fit_LDdecay_n_correction: true
pruning_min-weight: 0.1
realsfs:
fold: 1 # Should only be set to 1 for now
fold: 1 # Should only be set to 1, unless ancestral reference is given above
sfsboot: 100
fst:
whichFst: 1
Expand Down
47 changes: 32 additions & 15 deletions config/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -417,22 +417,39 @@ or a pull request and I'll gladly put it in.
than this will be binned to this value. Should be fine for most to leave
at `1000`. (integer, [docs](http://www.popgen.dk/angsd/index.php/Depth))
- `extra:` Additional options to pass to ANGSD during genotype likelihood
calculation. This is primarily useful for adding BAM input filters. Note
that `--remove_bads` and `-only_proper_pairs` are enabled by default, so
they only need to be included if you want to turn them off. I've also
found that for some datasets, `-C 50` and `-baq 1` can create a strong
relationship between sample depth and detected diversity, effectively
removing the benefits of ANGSD for low/variable depth data. I recommend
that these aren't included unless you know you need them, and even then,
I'd recommend plotting `heterozygosity ~ sample depth` to ensure there is
not any relationship. Since the workflow uses bwa to map, `-uniqueOnly 1`
doesn't do anything if your minimum mapping quality is > 0. Don't put
mapping and base quality thresholds here either, it will use the ones
defined above automatically. Although historical samples will have DNA
damaged assessed and to some extent, corrected, it may be useful to put
`-noTrans 1` or `-trim INT` here if you're interested in stricter filters
for degraded DNA. (string, [docs](http://www.popgen.dk/angsd/index.php/Input#BAM.2FCRAM))
calculation at all times. This is primarily useful for adding BAM input
filters. Note that `--remove_bads` and `-only_proper_pairs` are enabled
by default, so they only need to be included if you want to turn them
off or explicitly ensure they are enabled. I've also found that for some
datasets, `-C 50` and `-baq 1` can create a strong relationship between
sample depth and detected diversity, effectively removing the benefits of
ANGSD for low/variable depth data. I recommend that these aren't included
unless you know you need them. Since the workflow uses bwa to map,
`-uniqueOnly 1` doesn't do anything if your minimum mapping quality is
\> 0. Don't put mapping and base quality thresholds here either, it will
use the ones defined above automatically. Although historical samples
will have DNA damaged assessed and to some extent, corrected, it may be
useful to put `-noTrans 1` or `-trim INT` here if you're interested in
stricter filters for degraded DNA. (string, [docs](http://www.popgen.dk/angsd/index.php/Input#BAM.2FCRAM))
- `extra_saf:` Same as `extra`, but only used when making SAF files (used
for SFS, thetas, Fst, IBSrelate, heterozygosity includes invariable
sites).
- `extra_beagle:` Same as `extra`, but only used when making Beagle and Maf
files (used for PCA, Admix, ngsF-HMM, doIBS, ngsrelate, includes only
variable sites).
- `snp_pval:` The p-value to use for calling SNPs (float, [docs](http://www.popgen.dk/angsd/index.php/SNP_calling))
- `domajorminor:` Method for inferring the major and minor alleles. Set to
1 to infer from the genotype likelihoods, see [documentation](https://www.popgen.dk/angsd/index.php/Major_Minor)
for other options. `1`, `2`, and `4` can be set without any additional
configuration. `5` must also have an ancestral reference provided in the
config, otherwise it will be the same as `4`. `3` is currently not
possible, but please open an issue if you have a use case, I'd like to
add it, but would need some input on how it is used.
- `domaf:` Method for inferring minor allele frequencies. Set to `1` to
infer from genotype likelihoods using a known major and minor from the
`domajorminor` setting above. See [docs](http://www.popgen.dk/angsd/index.php/Allele_Frequencies)
for other options. I have not tested much beyond `1` and `8`, please open
an issue if you have problems.
- `min_maf:` The minimum minor allele frequency required to call a SNP.
(float, [docs](http://www.popgen.dk/angsd/index.php/Allele_Frequencies))
- `ngsld:` Settings for ngsLD ([docs](https://github.com/fgvieira/ngsLD))
Expand Down
4 changes: 3 additions & 1 deletion config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ params:
filt-on-depth-classes: true
fastp:
extra: "-p -g" # don't put --merge or --overlap_len_require here, they're implicit
min_overlap_hist: 15
min_overlap_hist: 20
bwa_aln:
extra: "-l 16500 -n 0.01 -o 2"
picard:
Expand All @@ -126,6 +126,8 @@ params:
extra: ""
extra_saf: ""
extra_beagle: ""
domajorminor: 1 # Currently only compatible with values 1, 2, 4, 5
domaf: 1
snp_pval: "1e-6"
min_maf: 0.05
mindepthind_heterozygosity: 3
Expand Down
12 changes: 9 additions & 3 deletions workflow/rules/3.2_beagles.smk
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,12 @@ rule angsd_doGlf2:
"""
input:
unpack(filt_depth),
unpack(get_anc_ref),
bam="results/datasets/{dataset}/bamlists/{dataset}.{ref}_{population}{dp}.bamlist",
bams=get_bamlist_bams,
bais=get_bamlist_bais,
ref="results/ref/{ref}/{ref}.fa",
reffai="results/ref/{ref}/{ref}.fa.fai",
regions="results/datasets/{dataset}/filters/chunks/{ref}_chunk{chunk}.rf",
output:
beagle=temp(
Expand All @@ -36,7 +38,10 @@ rule angsd_doGlf2:
mapQ=config["mapQ"],
baseQ=config["baseQ"],
snp_pval=config["params"]["angsd"]["snp_pval"],
maf=config["params"]["angsd"]["domaf"],
minmaf=config["params"]["angsd"]["min_maf"],
majmin=config["params"]["angsd"]["domajorminor"],
counts=get_docounts,
nind=lambda w: len(get_samples_from_pop(w.population)),
out=lambda w, output: os.path.splitext(output.arg)[0],
threads: lambda wildcards, attempt: attempt
Expand All @@ -45,10 +50,11 @@ rule angsd_doGlf2:
shell:
"""
angsd -doGlf 2 -bam {input.bam} -GL {params.gl_model} -ref {input.ref} \
-doMajorMinor 1 -doMaf 1 -SNP_pval {params.snp_pval} \
-minMaf {params.minmaf} -nThreads {threads} {params.extra} \
-doMajorMinor {params.majmin} -doMaf {params.maf} -minMaf {params.minmaf} \
-SNP_pval {params.snp_pval} -nThreads {threads} {params.extra} \
-minMapQ {params.mapQ} -minQ {params.baseQ} -sites {input.sites} \
{params.extra_beagle} -rf {input.regions} -out {params.out} &> {log}
-anc {input.anc} {params.extra_beagle} -rf {input.regions} \
{params.counts} -out {params.out} &> {log}
"""


Expand Down
13 changes: 13 additions & 0 deletions workflow/rules/common.smk
Original file line number Diff line number Diff line change
Expand Up @@ -480,6 +480,19 @@ def get_anc_ref(wildcards):
}


# Determine if docounts is needed for beagle/maf calculation to keep it from
# slowing things down when it is not. It is only needed if the major and minor
# alleles are being inferred from counts (-doMajorMinor 2) or the minor allele
# frequency is being inferred by counts (-doMaf 8, >8 possible if count
# inference is combined with other inferences)
def get_docounts(wildcard):
if (int(config["params"]["angsd"]["domajorminor"]) == 2) or (
int(config["params"]["angsd"]["domaf"]) >= 8
):
return "-doCounts 1"
return ""


## Get random sampling proportion depending on if LD decay is being calculated
## or if LD pruning is being done
def get_ngsld_sampling(wildcards):
Expand Down

0 comments on commit 31e67a4

Please sign in to comment.