-
Notifications
You must be signed in to change notification settings - Fork 0
Exome istats Documentation
First make sure you have genie scripts installed and working
genie exome istats -h
you should see a message like below, otherwise follow the instructions in this page (link)
usage:
exome istats [options] --filter=FILE
options:
--exome=FILE dbs or clinseq [default: clinseq]
--filter=FILE file containing filter rules
--geneset=FILE geneset file
--out=PREFIX outname prefix [default: out]
--nojob to run frontend
--dry-run just show the codes
--njobs=NUMBER number of parallel jobs; applicable only when running in front end
- the
--exome
argument is to choose the exome dataset, you can choose either dbs or clinseq. If you don't use this argument, the default dataset will be clinseq. - the
--filter
argument takes a file that contains the filter rules to apply. If you have multiple filter files, list all of them in a file, name it with.list
extension and provide that file. - the
--geneset
argument takes a file that contains the gene symbols (one per line). If you have multiple geneset files, list all in a file, name the file with.list
extension and provide that file. - other arguments are self explanatory.
- the only argument that is necessary is
--filter
and others are optional.
if I have just a single filter file, I need to analyse dbs exomes then
genie exome istats --exome=dbs --filter=myfilter --out=some_name
if I have multiple filter files, then I will create a list file (myfilter.list, the extension .list
is very important) containing absolute paths to each filter file (one per line)
genie exome istats --exome=dbs --filter=myfilter.list --out=somename
If i have one filter file and one geneset file and want to analyse clinseq exomes
genie exome istats --exome=clinseq --filter=myfilter --geneset=mygeneset --out=some_name
#I can also skip `--exome` argument, since the default is clinseq
genie exome istats --filter=myfilter --geneset=mygeneset --out=some_name
If i have multiple filter files and multiple geneset files, each group listed in , say myfilter.list and mygeneset.list
genie exome istats --filter=myfilter.list --geneset=mygeneset.list --out=some_name
You need 2 pieces of information: exact names of the tables and exact names of the columns within each table. You can get this info from this page (link). A filter file is a plain text file, with 2 columns (no column names). Each row in the first column should have a table name and the corresponding second column should have the expression (filter rule)
An example filter file will look like this,
exacnonpsych is.na(ExAC_AF)
polysift SIFT=="deleterious"&PolyPhen=="damaging"
basic is.na(MAX_AF)
It's very important that there shouldn't be any blank space within the expressions, for example, ExAC_AF>0.001 & ExAC_AF_EAS>0.001
will not work but ExAC_AF>0.001&ExAC_AF_EAS>0.001
will work.
No. | Operation | Operator | data type | Example | Notes |
---|---|---|---|---|---|
1. | greater than | > | numeric | ExAC_AF>0.001 | |
2. | less than | < | numeric | ||
3. | greater than or equal to | >= | numeric | ExAC_AF>=0.001 | |
4. | equal to | == | numeric and character | for character it matches exactly and this can be a problem, try to avoid this, see below why | |
5. | pattern match | %like% | character | Consequence%like%"%stop-gain%" | this matches pattern, note the % symbol between the query word |
6. | is not available | is.na(..) | NA | is.na(ExAC_AF) |
Please use %like%
operator instead of ==
operator. Because,==
matches exactly, so you will miss variants that contains additional annotation value in the same cell. For example, see below table,
Variant | C1 | C1=="A" | C1%like%"%A%" |
---|---|---|---|
snp1 | A | TRUE | TRUE |
snp2 | A,B | FALSE | TRUE |
You will miss snp2 if you use C1=="A" as your query.
A geneset file is just a file with list of gene symbols. Example,
DRD4
DUSP3
UBTF
STXBP5
You can use multiple filter files and multiple geneset files in a single run.