-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathscript3_visualizeSV.sh
91 lines (69 loc) · 5.27 KB
/
script3_visualizeSV.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
cd ~/HALL_2014/results
#----------------------------------------------------------------------------------------------------------
# 1) GENERATE BREAKPOINT CALLS IN BED12 FORMAT FOR EASY IGV VISUALIZATION
#----------------------------------------------------------------------------------------------------------
~/HALL_2014/bin/bedpeToBed12 -i breakpoints.naive.bedpe -n breakpoints.naive > ../tracks/ourTracks/breakpoints.naive.bed
~/HALL_2014/bin/bedpeToBed12 -i breakpoints.strict.bedpe -n breakpoints.strict > ../tracks/ourTracks/breakpoints.strict.bed
# COLOR KEY:
# Deletion = DEL = red;
# Tandem Duplication = DUP = green;
# Inversion = INV = blue
# Inter-chromosomal = INT = gray (this also includes intra-chromosomal calls > 1 Mb, to enable easy viewing)
# In BED12 format, the predicted breakpoint-containing intervals are shown as solid blocks.
# The thinner line just connects the blocks.
# Note that when mappings are very far apart (e.g., >1 Mb or different chromosomes),
# we draw the two read mapping clusters as distinct features (to avoid too many lines taking over IGV screen space)
# BED12 was designed to visualize genes, with exons as blocks and introns as the thin line.
# But, since IGV and UCSC do not accommodate paired variant call formats, we've adapted BED12
# Later on we'll visualize these in VCF format as well.
#----------------------------------------------------------------------------------------------------------
# 2) LOAD TRACKS
#----------------------------------------------------------------------------------------------------------
# Start the Integrative Genomics Viewer (IGV)
# FIRST, make sure you choose the "Human (b37)" genome before doing anything else.
# SECOND, go to the settings menu at the top: view > preferences > alignments:
# Change "Visibility range threshold" to 20
# Change "Max read count" to 100
# Change "per window size" to 50
# THIRD, change "insert size options" at the bottom; set min to 0 and max to 687 (this is 5 standard deviations)
# FOURTH, load the bamfile: ~/HALL_2014/results/NA12878.20.bam
# FIFTH, load all the .bed files from the following two directories:
# (the .idx files are index files to make igv faster)
# ~/HALL_2014/tracks/ourTracks
# ~/HALL_2014/tracks/publicTracks
# SIXTH, customize your IGV session.
# I prefer to put all of my custom BED tracks on the top,
# the bam files in the middle, and all the public tracks on the bottom.
# But, you can arrange things however you like.
# You can move tracks up and down by dragging them with the mouse.
# SEVENTH, save the IGV session (under the "file" menu bar)
# for easy reload if (when?) IGV crashes due to memory limitations
#----------------------------------------------------------------------------------------------------------
# 3) SPEND SOME TIME LOOKING AT THE DATA
#----------------------------------------------------------------------------------------------------------
# Do the LUMPY calls make sense based on the raw alignments?
# When you mouse over the LUMPY call it shows a score; this is the number of alignments that were clustered.
# Try the following features in IGV:
# 1) Click on a BED track to select it, zoom into a ~20kb window, then press "control-f" to flip to the next feature.
# This is good way to view a lot of calls in a short period of time without IGV freezing up.
# 2) When you find an variant that is between different chromosomes, right click on discordant read
# and select "View mate region in split screen". Right-click near the top to switch back to "standard view".
# Can you find any obvious false positives?
# Can you find any obvious false negatives?
# Do you see any large clusters of LUMPY calls in the "naive" callset that are not present in the "strict" callset?
# Do you any deletion calls that are actually tranposon insertions in the reference genome?
# Do you see any complex variants composed of multiple LUMPY calls that are right next to each other, or overlapping?
#----------------------------------------------------------------------------------------------------------
# 4) LOAD SOME MORE INFORMATIVE TRACKS
#----------------------------------------------------------------------------------------------------------
# ~/HALL_2014/tracks/supplementalTracks contains some additional data including the following:
# a) NA12878.lumpy.bed: these are the LUMPY calls for the entire genome, in BED12 format
# b) NA12878.lumpy.annotated.vcf: these are the LUMPY calls for the entire genome, in VCF format
# c) NA12878.cnvnator.cnvs.bed: these are CNV calls obtained by read-depth analysis using CNVNator.
# (These are unfiltered so there are likely to be many false positives; e.g., the smallest ones are lowest confidence)
# d) NA12878.cnvnator.readDepth.tdf: this is normalized read-depth data in 100 bp windows.
# This is very useful for visualizing the raw read-depth data. Your eyes are better than most algorithms.
# when loading this last file, you'll need to change the view options:
# right click on the track name at the very left; change windowing function to "none";
# right click on the track name and change data range min to 0, Mid to 2, and Max to 10;
# right click on the track name and change Track Height to 100 for better visualization