Skip to content

Exome istats Documentation

Veera edited this page Sep 18, 2017 · 3 revisions

Framework

First make sure you have genie scripts installed and working

genie exome istats -h

you should see a message like below, otherwise follow the instructions in this page (link)

usage:
 exome istats [options] --filter=FILE 

options:
 --exome=FILE   dbs or clinseq [default: clinseq]
 --filter=FILE  file containing filter rules
 --geneset=FILE geneset file
 --out=PREFIX   outname prefix [default: out]
 --nojob        to run frontend
 --dry-run      just show the codes
 --njobs=NUMBER number of parallel jobs; applicable only when running in front end
  • the --exome argument is to choose the exome dataset, you can choose either dbs or clinseq. If you don't use this argument, the default dataset will be clinseq.
  • the --filter argument takes a file that contains the filter rules to apply. If you have multiple filter files, list all of them in a file, name it with .list extension and provide that file.
  • the --geneset argument takes a file that contains the gene symbols (one per line). If you have multiple geneset files, list all in a file, name the file with .list extension and provide that file.
  • other arguments are self explanatory.
  • the only argument that is necessary is --filter and others are optional.

examples

if I have just a single filter file, I need to analyse dbs exomes then

genie exome istats --exome=dbs --filter=myfilter --out=some_name

if I have multiple filter files, then I will create a list file (myfilter.list, the extension .list is very important) containing absolute paths to each filter file (one per line)

genie exome istats --exome=dbs --filter=myfilter.list --out=somename 

If i have one filter file and one geneset file and want to analyse clinseq exomes

genie exome istats --exome=clinseq --filter=myfilter --geneset=mygeneset --out=some_name

#I can also skip `--exome` argument, since the default is clinseq

genie exome istats --filter=myfilter --geneset=mygeneset --out=some_name

If i have multiple filter files and multiple geneset files, each group listed in , say myfilter.list and mygeneset.list

genie exome istats --filter=myfilter.list --geneset=mygeneset.list --out=some_name

Output

The output file is sample wise summary stats as created by 'plinkseq istats` (see this link ). This can be further used to run analysis in R with your phenotype and covariate files.

How to create a filter file?

You need 2 pieces of information: exact names of the tables and exact names of the columns within each table. You can get this info from this page (link). A filter file is a plain text file, with 2 columns (no column names). Each row in the first column should have a table name and the corresponding second column should have the expression (filter rule)

An example filter file will look like this,

exacnonpsych is.na(ExAC_AF)
polysift     SIFT=="deleterious"&PolyPhen=="damaging"
basic        is.na(MAX_AF)

It's very important that there shouldn't be any blank space within the expressions, for example, ExAC_AF>0.001 & ExAC_AF_EAS>0.001 will not work but ExAC_AF>0.001&ExAC_AF_EAS>0.001 will work.

About expressions

No. Operation Operator data type Example Notes
1. greater than > numeric ExAC_AF>0.001
2. less than < numeric
3. greater than or equal to >= numeric ExAC_AF>=0.001
4. equal to == numeric and character for character it matches exactly and this can be a problem, try to avoid this, see below why
5. pattern match %like% character Consequence%like%"%stop-gain%" this matches pattern, note the % symbol between the query word
6. is not available is.na(..) NA is.na(ExAC_AF)

Please use %like% operator instead of == operator. Because,== matches exactly, so you will miss variants that contains additional annotation value in the same cell. For example, see below table,

Variant C1 C1=="A" C1%like%"%A%"
snp1 A TRUE TRUE
snp2 A,B FALSE TRUE

You will miss snp2 if you use C1=="A" as your query.

How to create a geneset file?

A geneset file is just a file with list of gene symbols. Example,

DRD4
DUSP3
UBTF
STXBP5

You can use multiple filter files and multiple geneset files in a single run.

Clone this wiki locally