Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hg19 and hg38 from same workflow version #19

Open
wants to merge 49 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
dd083fd
Adapted indelCallingWorkflow to work for GRCh38. Main changes: Adapta…
Dec 19, 2019
5198ab6
Added hg38 REFERENCE_GENOME and INDEX_PREFIX to config file
Jan 15, 2020
cb34845
hg38 reference directory typo
NagaComBio Jan 20, 2020
bff626e
Updating hg38DatabasesDirectory
NagaComBio Jan 20, 2020
7815c9a
TiNDA updated for hg38; Added clustering with raw filters
NagaComBio Jan 27, 2020
597ae75
reading raw bgzip file
NagaComBio Jan 27, 2020
0427748
Indel calling from CRAM files
NagaComBio Jan 7, 2021
db4044c
Moving files to ngs_share
NagaComBio Feb 11, 2021
686a65f
Merging main branch
NagaComBio Mar 1, 2021
2a1523b
Moving GRCh38 config to a separate XML
NagaComBio Mar 1, 2021
5ac9468
Changing chr column
NagaComBio Mar 3, 2021
f5e9f8b
Removing annovar and bias analysis in sample swap
NagaComBio Mar 3, 2021
b32f561
Removing unused variables
NagaComBio Mar 3, 2021
517f2f7
Analysis bam header for hg19
NagaComBio Mar 3, 2021
05d0c36
Blacklist/selfchain only for hg19
NagaComBio Mar 3, 2021
627df27
Max AF to 5% for local control filtering in nocontrol
NagaComBio Mar 3, 2021
2deb979
Adding a WES local control for hg38
NagaComBio Mar 3, 2021
6b6cf39
Adjusting gnomad variable
NagaComBio Mar 3, 2021
e2ee87e
Updating GRCh38 XML
NagaComBio Mar 3, 2021
0471b16
Analysing bam header for hg19
NagaComBio Mar 3, 2021
08dec0d
Fixing typo
NagaComBio Mar 3, 2021
ef7c1a7
Removing bias analysis scripts from the repo
NagaComBio Mar 3, 2021
6247724
Passing the max MAF values to confidence annotation
NagaComBio Mar 5, 2021
064d592
Sorting chrs alphanumerically
NagaComBio Mar 19, 2021
7024ae7
Nocontrol: 1000genomes - Using AF for filtering instead of EUR_AF
NagaComBio Apr 22, 2021
7067082
Nocontrol filtering: removed commented lines
NagaComBio Apr 22, 2021
8a92cd9
Reworking the gnomAD and localcontrol variable names
NagaComBio Apr 22, 2021
c32000b
Merge branch 'master' into hg38
NagaComBio Apr 30, 2021
b06fdbe
Increasing the mem requirement
NagaComBio Oct 1, 2021
849d77d
Adding chr_prefix for hg38 chrs
NagaComBio Oct 18, 2021
6220c3b
Fix column swap bug for nocontrol workflow
NagaComBio Apr 29, 2022
41f1d64
Update ngs_share path
NagaComBio Apr 29, 2022
e2b1e59
Resource update
NagaComBio May 24, 2022
3c8effa
Update confidence scoring based on SeqC2, runlowmaf
NagaComBio May 24, 2022
8be0cd0
Merge branch 'master' into hg38
NagaComBio Jul 1, 2022
2394ac0
Removing tVAF from penalities & alleleBias to -1
NagaComBio Sep 16, 2022
b282b66
Upgrade to gencodev39 for hg38
NagaComBio Oct 6, 2022
219ada5
GnomAD and local control based confidence annotation
NagaComBio Nov 7, 2022
c7bb74f
Update Readme
NagaComBio Nov 7, 2022
9f57365
Bug fix
NagaComBio Nov 7, 2022
05776cc
Reverting SNP based confidence scoring
NagaComBio Mar 20, 2023
9a2ee42
Update reference
NagaComBio Mar 20, 2023
6b206e2
Update PLATYPUS_PARAMS
NagaComBio Mar 20, 2023
7bfc634
Merge branch 'master' into hg38
NagaComBio Jan 24, 2024
61d12f4
Update README.md
NagaComBio Jan 24, 2024
331e123
Creating helper functions
NagaComBio Jan 24, 2024
e4f9c9b
Moving backticks to system cmd
NagaComBio Jan 24, 2024
cdd1988
Moving FREQ from filter to MAFCommon flag in info
NagaComBio Jan 26, 2024
4a2b713
Add more annotations options for reference genome
NagaComBio Jan 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file modified .gitignore
100644 → 100755
Empty file.
Empty file modified CONTRIBUTORS
100644 → 100755
Empty file.
Empty file modified IndelCallingWorkflow.iml
100644 → 100755
Empty file.
Empty file modified IndelCallingWorkflow.jar
100644 → 100755
Empty file.
Empty file modified LICENSE
100644 → 100755
Empty file.
16 changes: 16 additions & 0 deletions README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,22 @@ Note that only on `master` new features are implemented, so the other branches a

## Changelist

* Version update to 4.0.0

- Major: Support for hg38/GRCh38 reference.
- Major: Updating the confidence scoring script.
- hg38: Annotated with `FREQ` in the filter column and if the variant has higher MAF in gnomAD and local-control than the threshold. It is not filtered out.
- hg38: Remove `HISEQDEPTH, DUKE_EXCLUDED, DAC_BLACKLIST, SELFCHAIN, REPEAT & MAPABILITY` from hg38 annotations.
- hg19: `--skipREMAP` option will perform the same for hg38.
- Remove ExAC and EVS from annotation and no-control workflow filtering.
- `runlowmaf` option to include variants with a VAF of 5-10% in high confidence somatic variants.
- `HapScore` filter tag in platypus is punished now(-2).
- Minor: TiNDA related updates
- Remove bias filtering from TiNDA downstream analysis.
- Update to TiNDA plots.
- Environment script for the `checkSampleSwap` job.
- Patch: Update `COWorkflowsBasePlugin` to 1.4.2.

* Version update to 3.1.1

- Patch (Bugfix): The nocontrol workflow is exempted from the tumor & control column swap introduced in 3.1.0.
Expand Down
2 changes: 1 addition & 1 deletion buildinfo.txt
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
dependson=COWorkflowsBasePlugin:1.4.1
dependson=COWorkflowsBasePlugin:1.4.2
RoddyAPIVersion=3.5
4 changes: 2 additions & 2 deletions buildversion.txt
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
3.1
1
4.0
0
Empty file modified docs/images/denbi.png
100644 → 100755
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ source(opt$cFunction)
##### Data Analysis
# read the Rare.txt file and chromosome left file
dat<-read.delim(opt$file, header=T, sep="\t")
chr.length <- read.table(opt$chrLength, header=T)
chr.length <- read_tsv(opt$chrLength, col_names=c("CHR", "Length"))
chr.length$shiftLength <- c(0, chr.length$Length[1:23])
vinjana marked this conversation as resolved.
Show resolved Hide resolved

## Testing seqtype options
opt$seqType <- toupper(opt$seqType)
Expand Down Expand Up @@ -122,6 +123,10 @@ canopy.clust <- tryCatch(

dat$canopyCluster<-canopy.clust$sna_cluster

### Raw filtering
dat %>%
vinjana marked this conversation as resolved.
Show resolved Hide resolved
mutate(rawCluster = ifelse(Tumor_AF > Control_AF + 0.1 & Control_AF < 0.25, 'Somatic_Rescue', 'Germline')) -> dat

## Select the TiN cluster
somaticClass <- dat %>%
mutate(squareRescue = Control_AF < centroid$maxControl &
Expand Down Expand Up @@ -182,29 +187,22 @@ dat %>% filter(grepl("Somatic_Rescue", TiN_Class)) %>%
dat %>% filter(grepl("Germline", TiN_Class)) %>%
rbind(somRes) -> dat

#dat %>% group_by(Rareness, TiN_Class) %>% summarise(count=n())

# Plot 1 with canopy cluster
poly.df <- data.frame(x=c(0, 0, centroid$maxControl, centroid$maxControl),
y=c(centroid$minTumor, 1, 1, centroid$maxControl))

p1 <- ggplot() + geom_point(aes(Control_AF, Tumor_AF, color=factor(canopyCluster)), alpha=0.5, data=dat) +
# Plot 1 with threshold based cluster
p1 <- ggplot() + geom_point(aes(Control_AF, Tumor_AF, color=factor(rawCluster)), alpha=0.5, data=dat) +
theme_bw() + theme(text = element_text(size=15), legend.position="bottom") +
xlab("Control VAF") + ylab("Tumor VAF") +
xlim(0,1) + ylim(0,1) +
guides(color=guide_legend("Canopy clusters")) +
ggtitle(paste0("Clusters from Canopy")) +
geom_polygon(data=poly.df, aes(x, y), alpha=0.2, fill="gold")
guides(color=guide_legend("Raw clusters")) +
ggtitle(paste0("Clusters from raw filters"))

# Plot 2 with TiN cluster
p2 <- ggplot() + geom_polygon(data=poly.df, aes(x, y), alpha=0.2, fill="#d8161688") +
p2 <- ggplot() +
geom_point(aes(Control_AF, Tumor_AF, color=TiN_Class), alpha=0.3, data=dat) +
theme_bw() + theme(text = element_text(size=15), legend.position="bottom") +
xlab("Control VAF") + ylab("Tumor VAF") +
xlim(0,1) + ylim(0,1) +
guides(color=guide_legend("TiN clusters")) +
ggtitle(paste0("TiN clusters")) +
geom_polygon(data=poly.df, aes(x, y), alpha=0.2, fill="gold")
guides(color=guide_legend("TiNDA clusters")) +
ggtitle(paste0("TiNDA clusters"))

## function to plot linear chromosome
plotGenome_ggplot <- function(data, Y, chr.length, colorCol) {
Expand All @@ -228,18 +226,21 @@ plotGenome_ggplot <- function(data, Y, chr.length, colorCol) {
p3<-plotGenome_ggplot(dat, 'Control_AF', chr.length, 'TiN_Class')
p4<-plotGenome_ggplot(dat, 'Tumor_AF', chr.length, 'TiN_Class')

#### multi plotting
# Blank region
#blank <- grid.rect(gp=gpar(col="white"))

# Rescue info table
#rescueInfo<-as.data.frame(table(dat$TiN_Class))
#colnames(rescueInfo)<-c("Reclassification", "Counts")
rescueInfo <- dat %>%
group_by(TiN_Class) %>%
summarise(Count =n(), Median_Control_VAF = formatC(median(Control_AF), digits=5, format="f"),
Median_Tumor_VAF = formatC(median(Tumor_AF), digits=5, format="f")) %>%
mutate(Clustering = "Canopy") %>%
rename(TiNDA_Class = TiN_Class)

rescueInfo <- dat %>%
group_by(TiN_Class) %>%
rescueInfo <- dat %>%
group_by(rawCluster) %>%
summarise(Count =n(), Median_Control_VAF = formatC(median(Control_AF), digits=5, format="f"),
Median_Tumor_VAF = formatC(median(Tumor_AF), digits=5, format="f"))
Median_Tumor_VAF = formatC(median(Tumor_AF), digits=5, format="f")) %>%
mutate(Clustering = "Raw") %>%
rename(TiNDA_Class = rawCluster) %>%
bind_rows(rescueInfo) %>%
select(Clustering, TiNDA_Class, Count, Median_Control_VAF, Median_Tumor_VAF)

rescueInfo.toFile <- rescueInfo
if("Somatic_Rescue" %in% rescueInfo.toFile$TiN_Class) {
Expand All @@ -249,6 +250,7 @@ if("Somatic_Rescue" %in% rescueInfo.toFile$TiN_Class) {
rescueInfo.toFile$Pid<-opt$pid
}


TableTheme <- gridExtra::ttheme_default(
core = list(fg_params=list(cex = 1, hjust=1, x=0.95)),
colhead = list(fg_params=list(cex = 1, hjust=1, x=0.95)),
Expand All @@ -262,8 +264,6 @@ PlotLayout <-rbind(c(1,2,3),
c(5,5,5))

### Writing as png file


png(file = opt$oPlot, width=1500, height=800)
grid.arrange(p1, p2, TableAnn, p3, p4,
layout_matrix = PlotLayout,
Expand All @@ -274,20 +274,21 @@ dev.off()
# Saving the rescue table file
write.table(dat, file=opt$oFile, sep="\t", row.names = F, quote = F)

##
#reg.finalizer(environment(), cleanup, onexit = FALSE)

###############################################################################
## TiN classification to the vcf file
library(vcfR)
vcf <- read.vcfR(opt$vcf)
vcf <- as.data.frame(cbind(vcf@fix, vcf@gt))
vcf$POS <- as.integer(as.character(vcf$POS))

vcf$CHR <- as.character(vcf$CHR)
vcf$CHROM <- as.character(vcf$CHROM)
dat$CHR <- as.character(dat$CHR)

vcf %>% left_join(dat %>% select(CHR:ALT, TiN_Class) %>% rename("CHROM"="CHR")) %>%
vcf %>% left_join(
dat %>%
select(CHR:ALT, rawCluster, TiN_Class) %>%
rename("CHROM"="CHR")
) %>%
rename("#CHROM"="CHROM") %>%
write_tsv(opt$oVcf, na=".")

Empty file modified resources/analysisTools/indelCallingWorkflow/__init__.py
100644 → 100755
Empty file.
175 changes: 0 additions & 175 deletions resources/analysisTools/indelCallingWorkflow/biasFilter.py

This file was deleted.

Loading