forked from EtieM/outLyzer
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
242 lines (187 loc) · 10.7 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
Copyright Etienne Muller (2016)
muller.etienne@hotmail.fr
outLyzer is a computer program whose purpose is to detect variations,
specifically low allele frequency variation, in next generation
sequencing data (tumor samples, mosaïc mutation)
This software is a system to highlight mutations that must be used
with caution. We can not guarantee the accuracy of informations and
predictions provided.
This software is governed by the CeCILL license under French law and
abiding by the rules of distribution of free software. You can use,
modify and/ or redistribute the software under the terms of the CeCILL
license as circulated by CEA, CNRS and INRIA at the following URL
"http://www.cecill.info".
As a counterpart to the access to the source code and rights to copy,
modify and redistribute granted by the license, users are provided only
with a limited warranty and the software's author, the holder of the
economic rights, and the successive licensors have only limited
liability.
In this respect, the user's attention is drawn to the risks associated
with loading, using, modifying and/or developing or reproducing the
software by the user in light of its specific status of free software,
that may mean that it is complicated to manipulate, and that also
therefore means that it is reserved for developers and experienced
professionals having in-depth computer knowledge. Users are therefore
encouraged to load and test the software's suitability as regards their
requirements in conditions enabling the security of their systems and/or
data to be ensured and, more generally, to use and operate it in the
same conditions as regards security.
The fact that you are presently reading this means that you have had
knowledge of the CeCILL license and that you accept its terms.
outLyzer version 3.0 16/10/2020
Author: Etienne Muller
E-mail: muller.etienne@hotmail.fr
Sources: http://github.com/EtieM/outLyzer
# _ __
# ___ _ _| |_ / / _ _ _______ _ __
# / _ \| | | | __|/ / | | | |_ / _ \ '__|
# | (_) | |_| | |_/ /__| |_| |/ / __/ |
# \___/ \__,_|\__\____/\__, /___\___|_|
# |___/
CONTENTS OF THIS FILE
---------------------
* Introduction
* Requirements
* Installation
* Utilisation
INTRODUCTION
------------
OutLyzer is a variant-caller conceived for low allele-ratio mutations
detection, based on sequencing background noise evaluation. It evaluates if
the mutation is significantly different from background noise, using modified
Thompson tau technique.
This program was conceived in the Department of Cancer Biology and Genetics,
Francois Baclesse Cancer Center, Caen, France
Program can be downloaded at: http://github.com/EtieM/outLyzer
REQUIREMENTS
------------
- Linux OS
- Python v3
- Python librairies: subprocess / numpy / scipy / argparse / multiprocessing
- Samtools v1.2 to v1.9
INSTALLATION
------------
Before launching outLyzer program, it is preferable to set Samtools path as an environment variable:
- For one session: $export samtools=/path/to/samtools
- Permanantly: write previous command line in .bashrc file
Otherwise complete Samtools path must be indicated in outLyzer launching command line.
UTILISATION
-----------
outLyzer has 2 main operating modes:
- calling mode: $python outLyzer.py calling -bed regionFile.bed -ref referenceFile.fa -bam fileToAnalyse.bam -output /path/to/resultDir/[-arguments]
This mode works as a standard variant-caller and analyse a whole BAM file to highlight mutations, compiled in a VCF file.
optional arguments:
-h, --help show this help message and exit
-samtools SAMTOOLS Complete Samtools path if not specified in environment
variable
-pythonPath PYTHONPATH
Complete python path if different from default python
version
-core CORE define number of cores used to process analysis [1]
-cut CUT defines into how many parts bed file is divide [3]
-bed BED bed file required for analysis [REQUIRED]
-bam BAM bam File to analyze [REQUIRED]
-ref REF faidx indexed reference sequence file (fasta)
[REQUIRED]
-output OUTPUT output Path To write results [REQUIRED]
-t T Student t value used in modified Thompson tau
technique [0.001]
-bal BAL minimum Forward / Reverse read proportion [0.3]
-Q Q minimum average Phred Score to be considered as a real
mutation (only relevant for SNP) [20]
-SDQ SDQ maximum Standard deviation authorized for average
Phred Score [7]
-WS WS Window Size: region (number of bp) around the mutation
on which background noise have to be determined [200]
-WSmin WSMIN Window Size Minimum Size: minimum region size (number
of bp) required for analysis [10]
-x X Multiplicative factor that specifies how often the
mutation must be above background noise [2]
-AS Analysis sensitivity: Returns an additional file
containing analysis average sensitivity for each line
of bed file
-FRcor Forward Reverse Correction: take into account any
imbalance in the Forward-Reverse reads distribution in
the Forward / Reverse alternative Read Proportion
(-bal option)
-force FORCE Force noise background determination to go below a
fixed proportion of the Depth [default = disabled; 0
-> 1 ex: "-force 0.2"]
-HSM HSM HotSpot Metrics: Produce sensitivity Threshold for
HotSpot positions, in an additional file. Requires
formated HotSpot File in argument (see documentation
for more details).
-verbose VERBOSE If verbose mode is set to 1, details analysis process
steps [0]
Precisions for HSM option:
/!\ File required for HotSpot Metrics must be formatted as follows:
chrN startPosition Annotation
Each column must be separated by a tabulation, and annotation column must be present.
ex: chr12 25398284 KRAS_codon12
It will return a tabulated file containing for each position a local estimation of sensitivity, displayed as a percentage.
-positionAnalysis mode: $ python outLyzer.py positionAnalysis -bam fileToAnalyse.bam -ref referenceFile.fa -position chr12:123456 [-arguments]
This mode gives an evaluation of sequencing data and local noise background for one chromosomic position
optional arguments:
-h, --help show this help message and exit
-samtools SAMTOOLS Complete Samtools path if not specified in environment
variable
-bam BAM bam File to analyze [REQUIRED]
-position POSITION chromosomic position to analyze (ex: chr3:123456789)
[REQUIRED]
-ref REF faidx indexed reference sequence file (fasta) [REQUIRED]
-t T Student t value used in modified Thompson tau technique
[0.001]
-bal BAL minimum Forward / Reverse read proportion [0.3]
-Q Q minimum average Phred Score to be considered as a real
mutation (only relevant for SNP) [20]
-SDQ SDQ maximum Standard deviation authorized for average Phred
Score [7]
-WS WS Window Size: region (number of bp) around the mutation
on which background noise have to be determined [200]
-force FORCE Force noise background determination to go below a
fixed proportion of the Depth [default = disabled; 0
-> 1 ex: "-force 0.2"]
It directly displays results in standard output as follows:
['1', '0', '0', '0', '2', '0', '1', '0', '0', '0', '1', '1', '0', '1',
'1', '0', '0', '1', '0', '0', '0', '1', '0', '2', '1', '1', '0', '1',
'0', '0', '2', '2', '0', '0', '0', '1', '0', '1', '0', '0', '0', '0',
'0', '0', '0', '0', '1', '2', '0', '0', '1', '1', '0', '0', '1', '0',
'2', '0', '0', '1', '0', '1', '0', '0', '0', '1', '0', '3', '0', '0',
'0', '1', '0', '2', '0', '0', '0', '2', '0', '0', '0', '1', '2', '0',
'0', '0', '0', '2', '1', '2', '1', '0', '0', '3', '0', '0', '0', '0',
'0', '1', '964', '1', '0', '0', '0', '0', '0', '0', '0', '1', '0', '1',
'1', '0', '0', '0', '1', '2', '0', '0', '0', '0', '0', '1', '5', '1', '0',
'1', '0', '0', '0', '0', '1', '0', '1', '0', '1', '1', '0', '0', '0', '2',
'0', '0', '0', '0', '2', '1', '1', '5', '0', '1', '2', '2', '0', '0', '1',
'0', '1', '0', '0', '1', '0', '1', '1', '4', '0', '0', '0', '2', '0', '0',
'1', '0', '0', '2', '2', '1', '2', '2', '1', '3', '0', '5', '4', '1', '0',
'1', '2', '0', '0', '2', '0', '1', '0', '2', '0', '0', '0', '4', '1']
Mutation Position: chr12:25378562
Reference Allele: C
Alternative Allele: T
Depth: 1835
Allele Frequency (%): 52.48
Phred Quality: 26.5
Phred Standard Deviation: 1.73
Forward / Reverse alt: 494/469 (51.3% / 48.7%)
overAll Balance: 50.5% / 49.5%
Corrected alt F/R: 51.0% / 49.0%
Raw background Noise: 3
Stretch nearby: None
Motif nearby: None
The sequence of numbers represents, for each genomic position in the analyzed window,
the number of alternative reads.
Alternative Allele: mutated base. (WT = Wild Type - No mutation on this position)
Depth = total number of reads aligned to this position
Allele Frequency = proportion of alternative reads
Phred Quality = average PHRED quality for all alternative reads
Phred standard deviation = standard deviation for average Phred quality score mentioned above
Forward / Reverse = number of alternative reads sequenced in forward / reverse
overAll Balance: proportion of reads (wild-type AND mutated) sequenced in forward / reverse
Corrected alt F/R: ponderation of alternative Forward / Reverse balance based on overall balance
Raw background = sequencing background noise around the mutation, expressed in number of reads
Stretch nearby: indicates if there is a strecth nearby the mutation
Motif nearby: indicates if there is a repetitive DNA-sequence motif nearby the mutation
-Utilisation test:
$ python outLyzer_v3.0.py -ref reference.fasta -bed test_dataSet.bed -bam test_dataSet.bam -HSM HotSpot_positions_HSM.bed -AS -FRcor -output outputPath
Results should correspond to resultsExample files