Skip to content

Commit

Permalink
update docs to include GLIMPSE Hail Batch example
Browse files Browse the repository at this point in the history
  • Loading branch information
LindoNkambule committed Feb 5, 2025
1 parent c16f68e commit 76fce6c
Show file tree
Hide file tree
Showing 11 changed files with 127 additions and 20 deletions.
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/imputation.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/tutorial.doctree
Binary file not shown.
4 changes: 3 additions & 1 deletion docs/_build/html/_sources/imputation.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,11 @@ Arguments and options
* - Argument
- Description
* - :code:`--input-file`
- Path to where the VCF for target genotypes paths is
- Path to where the VCF or TSV with target VCF/BAM files is
* - :code:`--vcf-ref`
- Reference panel file to use for imputation
* - :code:`--chromosomes`
- Chromosome(s) to run imputation for. Default is :code:`all`
* - :code:`--local`
- Type of service. Default is Service backend where jobs are executed on a multi-tenant compute cluster in Google Cloud
* - :code:`--billing-project`
Expand Down
37 changes: 35 additions & 2 deletions docs/_build/html/_sources/tutorial.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,21 @@ This is a short tutorial on how to use the different modules of GWASpy.
1. Datasets
###########

We will be using simulated test data (on GRCh37) from RICOPILI. Below is how it can be downloaded and copied to a Google bucket
We will be using simulated test data (on GRCh37) from RICOPILI for most of the examples. Below is how it can be downloaded and copied to a Google bucket

.. code-block:: sh
wget https://personal.broadinstitute.org/sawasthi/share_links/UzoZK7Yfd7nTzIxHamCh1rSOiIOSdj_gwas-qcerrors.py/sim_sim1a_eur_sa_merge.miss.{bed,bim,fam} .
gsutil cp sim_sim1a_eur_sa_merge.miss.{bed,bim,fam} gs://my-gcs/bucket/test_data
For low-coverage genotype imputation using GLIMPSE, we will be using the 1X downsampled NA12878 file from the GLIMPSE
tutorial. Below is how it can be downloaded and copied to a Google bucket

.. code-block:: sh
wget wget https://github.com/odelaneau/GLIMPSE/raw/refs/heads/master/tutorial/NA12878_1x_bam/NA12878.{bam,bam.bai} .
gsutil cp NA12878.{bam,bam.bai} gs://my-gcs/bucket/test_data
2. Start a dataproc cluster with GWASpy installed
#################################################

Expand Down Expand Up @@ -164,7 +172,32 @@ Now you can easily run both phasing and imputation using the following command
./nextflow run main.nf -c nextflow.config -profile gbatch -params-file params.json
5. Low-coverage WGS imputation using GLIMPSE
6. Low-coverage WGS imputation using GLIMPSE
############################################
**6.1 Hail Batch** (should be ~$0.5 and takes <10 minutes)
Unlike phasing using IMPUTE5, GLIMPSE takes BAM files as input, and since we usually have one BAM file per sample, the
input to the imputation module when using GLIMPSE is a TSV file without a header and has two columns: first column with
sample ID and second column with the actual path to the BAM file. Only one sample/BAM per row is allowed in the TSV.
Below is an example of a file saved as :code:`gs://my-gcs/bucket/test_data/na12878_test.tsv`
.. list-table::
:widths: 15 50
:header-rows: 0
* - NA12878
- gs://my-gcs/bucket/test_data/NA12878.bam
Once you have saved the TSV to a bucket, you can run GLIMPSE phasing and imputation using the following command
.. code-block:: sh
imputation --input-file gs://my-gcs/bucket/test_data/na12878_test.tsv --vcf-ref hgdp1kgp \
--output-filename sim_sim1a_eur_sa_merge.miss_qced.phased.imputed \
--out-dir gs://my-gcs/bucket/test_data/GWASpy/lowcov_imputation --n-samples 1 --n-ref-samples 4091 \
--billing-project my-billing-project --chromosomes 22 --software glimpse2
**6.2. Nextflow**
**COMING VERY SOON**
19 changes: 11 additions & 8 deletions docs/_build/html/imputation.html
Original file line number Diff line number Diff line change
Expand Up @@ -122,30 +122,33 @@ <h2>Arguments and options<a class="headerlink" href="#arguments-and-options" tit
</thead>
<tbody>
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--input-file</span></code></p></td>
<td><p>Path to where the VCF for target genotypes paths is</p></td>
<td><p>Path to where the VCF or TSV with target VCF/BAM files is</p></td>
</tr>
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--vcf-ref</span></code></p></td>
<td><p>Reference panel file to use for imputation</p></td>
</tr>
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--local</span></code></p></td>
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--chromosomes</span></code></p></td>
<td><p>Chromosome(s) to run imputation for. Default is <code class="code docutils literal notranslate"><span class="pre">all</span></code></p></td>
</tr>
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--local</span></code></p></td>
<td><p>Type of service. Default is Service backend where jobs are executed on a multi-tenant compute cluster in Google Cloud</p></td>
</tr>
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--billing-project</span></code></p></td>
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--billing-project</span></code></p></td>
<td><p>Billing project to be used for the jobs</p></td>
</tr>
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--n-samples</span></code></p></td>
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--n-samples</span></code></p></td>
<td><p>Number of target samples to be imputed. We use this to estimate resources for some of the jobs</p></td>
</tr>
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--n-ref-samples</span></code></p></td>
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--n-ref-samples</span></code></p></td>
<td><p>Number of reference samples. We use this to estimate resources for some of the jobs</p></td>
</tr>
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--software</span></code></p></td>
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--software</span></code></p></td>
<td><p>Software to use for phasing. Options: [<code class="code docutils literal notranslate"><span class="pre">beagle5</span></code>, <code class="code docutils literal notranslate"><span class="pre">impute5</span></code>]. Default is <code class="code docutils literal notranslate"><span class="pre">impute5</span></code></p></td>
</tr>
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--output-filename</span></code></p></td>
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--output-filename</span></code></p></td>
<td><p>Output filename without file extension</p></td>
</tr>
<tr class="row-even"><td><p><code class="code docutils literal notranslate"><span class="pre">--out-dir</span></code></p></td>
<tr class="row-odd"><td><p><code class="code docutils literal notranslate"><span class="pre">--out-dir</span></code></p></td>
<td><p>Path to where output files will be saved</p></td>
</tr>
</tbody>
Expand Down
2 changes: 1 addition & 1 deletion docs/_build/html/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ <h1>Contents<a class="headerlink" href="#contents" title="Link to this heading">
<li class="toctree-l2"><a class="reference internal" href="tutorial.html#pre-imputation-qc">3. Pre-imputation QC</a></li>
<li class="toctree-l2"><a class="reference internal" href="tutorial.html#pca">4. PCA</a></li>
<li class="toctree-l2"><a class="reference internal" href="tutorial.html#phasing-and-imputation">5. Phasing and Imputation</a></li>
<li class="toctree-l2"><a class="reference internal" href="tutorial.html#low-coverage-wgs-imputation-using-glimpse">5. Low-coverage WGS imputation using GLIMPSE</a></li>
<li class="toctree-l2"><a class="reference internal" href="tutorial.html#low-coverage-wgs-imputation-using-glimpse">6. Low-coverage WGS imputation using GLIMPSE</a></li>
</ul>
</li>
</ul>
Expand Down
Loading

0 comments on commit 76fce6c

Please sign in to comment.