Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare version 0.3.0 #22

Merged
merged 147 commits into from
Sep 18, 2024
Merged

Prepare version 0.3.0 #22

merged 147 commits into from
Sep 18, 2024

Conversation

zjnolen
Copy link
Owner

@zjnolen zjnolen commented Jan 12, 2024

No description provided.

zjnolen and others added 30 commits November 14, 2023 17:01
Move from fgbio to bamutil to clip overlap,
as this tool clips the lower quality bases.
Also merge overlapping reads even for
modern samples.
Merge overlapping reads in fastp, and map merged reads for both modern and historical. Unmerged reads are mapped for modern as well, and optionally for historical. Downstream overlap clipping switched from fgbio to bamutil (as this accounts for the quality when deciding which read to clip).
Allow for filtering on depth within depth classes, whole dataset depth,
or both. Add ability to choose between a median or percentile based
depth cutoff.
Update depth filters

Depth filters now produce histograms with limits. Additionally, the depth filter can now be set using multiples of the median (as in 0.2) or alternatively with percentiles.
Convert mappability filter to pileup mappability
This still allows for using mem, either can be selected.
Runtime and threads seem to scale linearly for bwa aln, max it out on threads.

bwa samse runs quick, doesn't use much memory, and is single threaded
Also mention some of the reasons why
Set up so workflow can start with bam files and only perform popgen analyses
Switch default aligner for historical samples from bwa mem to aln. bwa mem is still default for modern samples and can be switched to for historical if desired.
Avoids having to run repeatmodeler/masker if it has already been run elsewhere for a genome and a bed/gff file can be provided.
If desired, damageprofiler on user provided bams can be added sometime, but for now its only used when starting from fastq. Assuming that user provided bams have been processed and damage assessed already.
zjnolen and others added 29 commits March 29, 2024 18:06
Allow filtering with -minInd in ANGSD
Add in ability to remove transitions easily from config
This will allow easily getting a subsampled bam for a different depth than listed in the config (mainly for extensions to the main workflow, say having some VCF analyses you want to use a higher subsampled depth with than in ANGSD)
Now infers target depth from dp wildcard, allowing mixing of multiple depths in an extended workflow
By default $RANDOM is suggested to choose a random seed each time, but '0' is also commonly used.
Allow removal of individuals from dataset when subsampling
Allow multiple target depths for depth subsampling
Seems pandas was a dependency for numpy in old versions, but no longer. Needs to be explicitly included now.
@zjnolen zjnolen merged commit 52a27be into master Sep 18, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant