Skip to content

Commit

Permalink
Merge pull request #26 from zjnolen/subsample
Browse files Browse the repository at this point in the history
Enable subsampling to lower depth
  • Loading branch information
zjnolen authored Feb 12, 2024
2 parents b2b439d + ef0cc6f commit 4b65094
Show file tree
Hide file tree
Showing 11 changed files with 324 additions and 60 deletions.
28 changes: 24 additions & 4 deletions .test/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,6 @@ exclude_ind: []

excl_pca-admix: []

#==================== Downsampling Configuration ======================#

downsample_cov:

#====================== Analysis Selection ============================#

populations: []
Expand Down Expand Up @@ -68,6 +64,29 @@ analyses:
inbreeding_ngsf-hmm: true
ibs_matrix: true

#==================== Downsampling Configuration ======================#

subsample_dp: 2

subsample_redo_filts: true

subsample_analyses:
estimate_ld: true
ld_decay: true
pca_pcangsd: true
admix_ngsadmix: true
relatedness:
ngsrelate: true
ibsrelate_ibs: true
ibsrelate_sfs: true
thetas_angsd: true
heterozygosity_angsd: true
fst_angsd:
populations: true
individuals: true
inbreeding_ngsf-hmm: true
ibs_matrix: true

#=========================== Filter Sets ==============================#

filter_beds:
Expand Down Expand Up @@ -109,6 +128,7 @@ params:
extra_beagle: ""
snp_pval: "1e-6"
min_maf: 0.05
mindepthind_heterozygosity: 3
ngsld:
max_kb_dist_est-ld: 200
max_kb_dist_decay: 100
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,12 @@ Additionally, several data filtering options are available:
- Removal of regions with low mappability for fragments of a specified size
- Removal of regions with extreme high or low depth
- Removal of regions with a certain amount of missing data
- Multiple filter sets from user provided BED files that can be intersected
with other enabled filters (for instance, performing analyses on neutral
sites and genic regions separately)

All the above analyses can also be performed with sample depth subsampled to
a uniform level to account for differences in depth between samples.

## Getting Started

Expand Down
24 changes: 24 additions & 0 deletions config/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,30 @@ settings for each analysis are set in the next section.
- `ibs_matrix:` Estimate pairwise identity by state distance between all
samples using ANGSD. (`true`/`false`)

#### Downsampling Section

As this workflow is aimed at low coverage samples, its likely there might be
considerable variance in sample depth. For this reason, it may be good to
subsample all your samples to a similar depth to examine if variation in depth
is influencing results. To do this, set an integer value here to subsample all
your samples down to and run specific analyses.

- `subsample_dp:` A mean depth to subsample your reads to. This will be done
per sample, and subsample from all the reads. If a sample already has the
same, or lower, depth than this number, it will just be used as is in the
analysis. (INT)

- `subsample_redo_filts:` Make a separate filtered sites file using the
subsampled bams to calculate depth based filters. If left disabled, the
depth filters will be determined from the full coverage files.
(`true`/`false`)

- `subsample_analyses:` Individually enable analyses to be performed with the
subsampled data. These are the same as the ones above in the analyses
section. Enabling here will only run the analysis for the subsampled data,
if you want to run it for the full data as well, you need to enable it in the
analyses section as well. (`true`/`false`)

#### Filter Sets

By default, this workflow will perform all analyses requested in the above
Expand Down
30 changes: 24 additions & 6 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,6 @@ exclude_ind: []

excl_pca-admix: []

#==================== Downsampling Configuration ======================#

# untested, not recommended for now

downsample_cov:

#====================== Analysis Selection ============================#

populations: []
Expand Down Expand Up @@ -70,6 +64,29 @@ analyses:
inbreeding_ngsf-hmm: false
ibs_matrix: false

#==================== Downsampling Configuration ======================#

subsample_dp:

subsample_redo_filts:

subsample_analyses:
estimate_ld: false
ld_decay: false
pca_pcangsd: false
admix_ngsadmix: false
relatedness:
ngsrelate: false
ibsrelate_ibs: false
ibsrelate_sfs: false
thetas_angsd: false
heterozygosity_angsd: false
fst_angsd:
populations: false
individuals: false
inbreeding_ngsf-hmm: false
ibs_matrix: false

#=========================== Filter Sets ==============================#

filter_beds:
Expand Down Expand Up @@ -109,6 +126,7 @@ params:
extra_beagle: ""
snp_pval: "1e-6"
min_maf: 0.05
mindepthind_heterozygosity: 3
ngsld:
max_kb_dist_est-ld: 4000
max_kb_dist_decay: 100
Expand Down
Loading

0 comments on commit 4b65094

Please sign in to comment.