Skip to content

Commit

Permalink
Merge pull request #122 from pepkit/dev
Browse files Browse the repository at this point in the history
Release v0.12.4
  • Loading branch information
khoroshevskyi authored Aug 1, 2023
2 parents 7824861 + 73946b7 commit 109b10e
Show file tree
Hide file tree
Showing 12 changed files with 764 additions and 676 deletions.
4 changes: 4 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## [0.12.4] -- 2023-08-01
- Fixed SRA convert
- Added how to convert SRA

## [0.12.3] -- 2023-06-21
- Fixed preserving order of project keys (#119)

Expand Down
2 changes: 1 addition & 1 deletion docs/sra_convert.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ This effectively makes it easier to interact with *project-level* management of

## Tutorial

See the [tutorial](raw-data-downloading.md) for an example of how to use `sraconvert`.
See the [how-to](how_to_convert_fastq_from_sra.md) for an example of how to use `sraconvert`.
5 changes: 5 additions & 0 deletions docs_jupyter/build/processed-data-downloading.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ Calling geofetch will do 4 tasks:

Complete details about geofetch outputs is cataloged in the [metadata outputs reference](metadata_output.md).

from IPython.core.display import SVG
SVG(filename='logo.svg')

![arguments_outputs.svg](attachment:arguments_outputs.svg)

## Download the data

First, create the metadata for processed data (by adding --processed and --just-metadata):
Expand Down
307 changes: 0 additions & 307 deletions docs_jupyter/build/raw-data-downloading.md
Original file line number Diff line number Diff line change
Expand Up @@ -382,313 +382,6 @@ Writing: /home/bnt4me/Virginia/repos/geof2/geofetch/docs_jupyter/red_algae/GSE67
```

## Convert to fastq format

Now the `.sra` files have been downloaded. The project that was automatically created by GEO contained an amendment for sra file conversion. This project expects you to have an environment variable called `SRARAW` that points to the location where `prefetch` stores your `.sra` files. We also should define a `$SRAFQ` variable to point to where we ant the fastq files stored. In this command below, we set these on the fly for this command, but you can also just use globals.

We'll use `-d` first to do a dry run:


```bash
SRARAW=${HOME}/ncbi/public/sra/ SRAFQ=red_algae/fastq \
looper run red_algae/red_algae_config.yaml -a sra_convert -p local -d
```

```.output
Looper version: 1.2.0-dev
Command: run
Using amendments: sra_convert
Activating compute package 'local'
## [1 of 4] sample: Cm_BlueLight_Rep1; pipeline: sra_convert
Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_BlueLight_Rep1.sub
Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_BlueLight_Rep1.sub
Dry run, not submitted
## [2 of 4] sample: Cm_BlueLight_Rep2; pipeline: sra_convert
Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_BlueLight_Rep2.sub
Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_BlueLight_Rep2.sub
Dry run, not submitted
## [3 of 4] sample: Cm_Darkness_Rep1; pipeline: sra_convert
Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_Darkness_Rep1.sub
Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_Darkness_Rep1.sub
Dry run, not submitted
## [4 of 4] sample: Cm_Darkness_Rep2; pipeline: sra_convert
Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_Darkness_Rep2.sub
Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_Darkness_Rep2.sub
Dry run, not submitted
Looper finished
Samples valid for job generation: 4 of 4
Commands submitted: 4 of 4
Jobs submitted: 4
Dry run. No jobs were actually submitted.
```

And now the real thing:


```bash
SRARAW=${HOME}/ncbi/public/sra/ SRAFQ=red_algae/fastq \
looper run red_algae/red_algae_config.yaml -a sra_convert -p local \
--command-extra=--keep-sra
```

```.output
Looper version: 1.2.0-dev
Command: run
Using amendments: sra_convert
Activating compute package 'local'
## [1 of 4] sample: Cm_BlueLight_Rep1; pipeline: sra_convert
Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_BlueLight_Rep1.sub
Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_BlueLight_Rep1.sub
Compute node: zither
Start time: 2020-05-21 17:40:56
Using outfolder: red_algae/results_pipeline/SRX969073
### Pipeline run code and environment:
* Command: `/home/nsheff/.local/bin/sraconvert --srr /home/nsheff/ncbi/public/sra//SRR1930183.sra --sample-name SRX969073 -O red_algae/results_pipeline --keep-sra`
* Compute host: zither
* Working dir: /home/nsheff/code/geofetch/docs_jupyter
* Outfolder: red_algae/results_pipeline/SRX969073/
* Pipeline started at: (05-21 17:40:57) elapsed: 0.0 _TIME_
### Version log:
* Python version: 3.7.5
* Pypiper dir: `/home/nsheff/.local/lib/python3.7/site-packages/pypiper`
* Pypiper version: 0.12.1
* Pipeline dir: `/home/nsheff/.local/bin`
* Pipeline version: None
### Arguments passed to pipeline:
* `bamfolder`: ``
* `config_file`: `sraconvert.yaml`
* `format`: `fastq`
* `fqfolder`: `red_algae/fastq`
* `keep_sra`: `True`
* `logdev`: `False`
* `mode`: `convert`
* `output_parent`: `red_algae/results_pipeline`
* `recover`: `False`
* `sample_name`: `['SRX969073']`
* `silent`: `False`
* `srafolder`: `/home/nsheff/ncbi/public/sra/`
* `srr`: `['/home/nsheff/ncbi/public/sra//SRR1930183.sra']`
* `verbosity`: `None`
----------------------------------------
Processing 1 of 1 files: SRR1930183
Target to produce: `red_algae/fastq/SRR1930183_1.fastq.gz`
> `fastq-dump /home/nsheff/ncbi/public/sra//SRR1930183.sra --split-files --gzip -O red_algae/fastq` (9436)
<pre>
Read 1068319 spots for /home/nsheff/ncbi/public/sra//SRR1930183.sra
Written 1068319 spots for /home/nsheff/ncbi/public/sra//SRR1930183.sra
</pre>
Command completed. Elapsed time: 0:00:38. Running peak memory: 0.067GB.
PID: 9436; Command: fastq-dump; Return code: 0; Memory used: 0.067GB
Already completed files: []
### Pipeline completed. Epilogue
* Elapsed time (this run): 0:00:38
* Total elapsed time (all runs): 0:00:38
* Peak memory (this run): 0.0666 GB
* Pipeline completed time: 2020-05-21 17:41:35
## [2 of 4] sample: Cm_BlueLight_Rep2; pipeline: sra_convert
Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_BlueLight_Rep2.sub
Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_BlueLight_Rep2.sub
Compute node: zither
Start time: 2020-05-21 17:41:36
Using outfolder: red_algae/results_pipeline/SRX969074
### Pipeline run code and environment:
* Command: `/home/nsheff/.local/bin/sraconvert --srr /home/nsheff/ncbi/public/sra//SRR1930184.sra --sample-name SRX969074 -O red_algae/results_pipeline --keep-sra`
* Compute host: zither
* Working dir: /home/nsheff/code/geofetch/docs_jupyter
* Outfolder: red_algae/results_pipeline/SRX969074/
* Pipeline started at: (05-21 17:41:36) elapsed: 0.0 _TIME_
### Version log:
* Python version: 3.7.5
* Pypiper dir: `/home/nsheff/.local/lib/python3.7/site-packages/pypiper`
* Pypiper version: 0.12.1
* Pipeline dir: `/home/nsheff/.local/bin`
* Pipeline version: None
### Arguments passed to pipeline:
* `bamfolder`: ``
* `config_file`: `sraconvert.yaml`
* `format`: `fastq`
* `fqfolder`: `red_algae/fastq`
* `keep_sra`: `True`
* `logdev`: `False`
* `mode`: `convert`
* `output_parent`: `red_algae/results_pipeline`
* `recover`: `False`
* `sample_name`: `['SRX969074']`
* `silent`: `False`
* `srafolder`: `/home/nsheff/ncbi/public/sra/`
* `srr`: `['/home/nsheff/ncbi/public/sra//SRR1930184.sra']`
* `verbosity`: `None`
----------------------------------------
Processing 1 of 1 files: SRR1930184
Target exists: `red_algae/fastq/SRR1930184_1.fastq.gz`
Already completed files: []
### Pipeline completed. Epilogue
* Elapsed time (this run): 0:00:00
* Total elapsed time (all runs): 0:00:00
* Peak memory (this run): 0 GB
* Pipeline completed time: 2020-05-21 17:41:36
## [3 of 4] sample: Cm_Darkness_Rep1; pipeline: sra_convert
Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_Darkness_Rep1.sub
Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_Darkness_Rep1.sub
Compute node: zither
Start time: 2020-05-21 17:41:36
Using outfolder: red_algae/results_pipeline/SRX969075
### Pipeline run code and environment:
* Command: `/home/nsheff/.local/bin/sraconvert --srr /home/nsheff/ncbi/public/sra//SRR1930185.sra --sample-name SRX969075 -O red_algae/results_pipeline --keep-sra`
* Compute host: zither
* Working dir: /home/nsheff/code/geofetch/docs_jupyter
* Outfolder: red_algae/results_pipeline/SRX969075/
* Pipeline started at: (05-21 17:41:36) elapsed: 0.0 _TIME_
### Version log:
* Python version: 3.7.5
* Pypiper dir: `/home/nsheff/.local/lib/python3.7/site-packages/pypiper`
* Pypiper version: 0.12.1
* Pipeline dir: `/home/nsheff/.local/bin`
* Pipeline version: None
### Arguments passed to pipeline:
* `bamfolder`: ``
* `config_file`: `sraconvert.yaml`
* `format`: `fastq`
* `fqfolder`: `red_algae/fastq`
* `keep_sra`: `True`
* `logdev`: `False`
* `mode`: `convert`
* `output_parent`: `red_algae/results_pipeline`
* `recover`: `False`
* `sample_name`: `['SRX969075']`
* `silent`: `False`
* `srafolder`: `/home/nsheff/ncbi/public/sra/`
* `srr`: `['/home/nsheff/ncbi/public/sra//SRR1930185.sra']`
* `verbosity`: `None`
----------------------------------------
Processing 1 of 1 files: SRR1930185
Target to produce: `red_algae/fastq/SRR1930185_1.fastq.gz`
> `fastq-dump /home/nsheff/ncbi/public/sra//SRR1930185.sra --split-files --gzip -O red_algae/fastq` (9607)
<pre>
Read 1707508 spots for /home/nsheff/ncbi/public/sra//SRR1930185.sra
Written 1707508 spots for /home/nsheff/ncbi/public/sra//SRR1930185.sra
</pre>
Command completed. Elapsed time: 0:01:01. Running peak memory: 0.066GB.
PID: 9607; Command: fastq-dump; Return code: 0; Memory used: 0.066GB
Already completed files: []
### Pipeline completed. Epilogue
* Elapsed time (this run): 0:01:01
* Total elapsed time (all runs): 0:01:01
* Peak memory (this run): 0.0656 GB
* Pipeline completed time: 2020-05-21 17:42:37
## [4 of 4] sample: Cm_Darkness_Rep2; pipeline: sra_convert
Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_Darkness_Rep2.sub
Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_Darkness_Rep2.sub
Compute node: zither
Start time: 2020-05-21 17:42:38
Using outfolder: red_algae/results_pipeline/SRX969076
### Pipeline run code and environment:
* Command: `/home/nsheff/.local/bin/sraconvert --srr /home/nsheff/ncbi/public/sra//SRR1930186.sra --sample-name SRX969076 -O red_algae/results_pipeline --keep-sra`
* Compute host: zither
* Working dir: /home/nsheff/code/geofetch/docs_jupyter
* Outfolder: red_algae/results_pipeline/SRX969076/
* Pipeline started at: (05-21 17:42:38) elapsed: 0.0 _TIME_
### Version log:
* Python version: 3.7.5
* Pypiper dir: `/home/nsheff/.local/lib/python3.7/site-packages/pypiper`
* Pypiper version: 0.12.1
* Pipeline dir: `/home/nsheff/.local/bin`
* Pipeline version: None
### Arguments passed to pipeline:
* `bamfolder`: ``
* `config_file`: `sraconvert.yaml`
* `format`: `fastq`
* `fqfolder`: `red_algae/fastq`
* `keep_sra`: `True`
* `logdev`: `False`
* `mode`: `convert`
* `output_parent`: `red_algae/results_pipeline`
* `recover`: `False`
* `sample_name`: `['SRX969076']`
* `silent`: `False`
* `srafolder`: `/home/nsheff/ncbi/public/sra/`
* `srr`: `['/home/nsheff/ncbi/public/sra//SRR1930186.sra']`
* `verbosity`: `None`
----------------------------------------
Processing 1 of 1 files: SRR1930186
Target to produce: `red_algae/fastq/SRR1930186_1.fastq.gz`
> `fastq-dump /home/nsheff/ncbi/public/sra//SRR1930186.sra --split-files --gzip -O red_algae/fastq` (9780)
<pre>
Read 1224029 spots for /home/nsheff/ncbi/public/sra//SRR1930186.sra
Written 1224029 spots for /home/nsheff/ncbi/public/sra//SRR1930186.sra
</pre>
Command completed. Elapsed time: 0:00:44. Running peak memory: 0.067GB.
PID: 9780; Command: fastq-dump; Return code: 0; Memory used: 0.067GB
Already completed files: []
### Pipeline completed. Epilogue
* Elapsed time (this run): 0:00:44
* Total elapsed time (all runs): 0:00:44
* Peak memory (this run): 0.0673 GB
* Pipeline completed time: 2020-05-21 17:43:22
Looper finished
Samples valid for job generation: 4 of 4
Commands submitted: 4 of 4
Jobs submitted: 4
```

Now that's done, let's take a look in the `red_algae/fastq` folder (where we set the `$SRAFQ` variable).


```bash
ls red_algae/fastq
```

```.output
SRR1930183_1.fastq.gz SRR1930184_2.fastq.gz SRR1930186_1.fastq.gz
SRR1930183_2.fastq.gz SRR1930185_1.fastq.gz SRR1930186_2.fastq.gz
SRR1930184_1.fastq.gz SRR1930185_2.fastq.gz
```

By default, the sra conversion script will delete the `.sra` files after they have been converted to fastq. You can keep them if you want by passing `--keep-sra`, which you can do by passing `--command-extra=--keep-sra` to your `looper run` command.


## Finalize the project config and sample annotation

Expand Down
Loading

0 comments on commit 109b10e

Please sign in to comment.