Merge pull request #122 from pepkit/dev

Release v0.12.4
pepkit · Aug 1, 2023 · 109b10e · 109b10e
2 parents 7824861 + 73946b7
commit 109b10e
Show file tree

Hide file tree

Showing 12 changed files with 764 additions and 676 deletions.
diff --git a/docs/changelog.md b/docs/changelog.md
@@ -1,5 +1,9 @@
 # Changelog
 
+## [0.12.4] -- 2023-08-01
+- Fixed SRA convert
+- Added how to convert SRA
+
 ## [0.12.3] -- 2023-06-21
 - Fixed preserving order of project keys (#119)
 

diff --git a/docs/sra_convert.md b/docs/sra_convert.md
@@ -15,4 +15,4 @@ This effectively makes it easier to interact with *project-level* management of
 
 ## Tutorial
 
-See the [tutorial](raw-data-downloading.md) for an example of how to use `sraconvert`.
+See the [how-to](how_to_convert_fastq_from_sra.md) for an example of how to use `sraconvert`.
diff --git a/docs_jupyter/build/processed-data-downloading.md b/docs_jupyter/build/processed-data-downloading.md
@@ -24,6 +24,11 @@ Calling geofetch will do 4 tasks:
 
 Complete details about geofetch outputs is cataloged in the [metadata outputs reference](metadata_output.md).
 
+from IPython.core.display import SVG
+SVG(filename='logo.svg')
+
+![arguments_outputs.svg](attachment:arguments_outputs.svg)
+
 ## Download the data
 
 First, create the metadata for processed data (by adding --processed and --just-metadata):

diff --git a/docs_jupyter/build/raw-data-downloading.md b/docs_jupyter/build/raw-data-downloading.md
@@ -382,313 +382,6 @@ Writing: /home/bnt4me/Virginia/repos/geof2/geofetch/docs_jupyter/red_algae/GSE67
 
 ```
 
-## Convert to fastq format
-
-Now the `.sra` files have been downloaded. The project that was automatically created by GEO contained an amendment for sra file conversion. This project expects you to have an environment variable called `SRARAW` that points to the location where `prefetch` stores your `.sra` files. We also should define a `$SRAFQ` variable to point to where we ant the fastq files stored. In this command below, we set these on the fly for this command, but you can also just use globals.
-
-We'll use `-d` first to do a dry run:
-
-
-```bash
-SRARAW=${HOME}/ncbi/public/sra/ SRAFQ=red_algae/fastq \
-  looper run red_algae/red_algae_config.yaml -a sra_convert -p local -d
-```
-
-```.output
-Looper version: 1.2.0-dev
-Command: run
-Using amendments: sra_convert
-Activating compute package 'local'
-## [1 of 4] sample: Cm_BlueLight_Rep1; pipeline: sra_convert
-Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_BlueLight_Rep1.sub
-Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_BlueLight_Rep1.sub
-Dry run, not submitted
-## [2 of 4] sample: Cm_BlueLight_Rep2; pipeline: sra_convert
-Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_BlueLight_Rep2.sub
-Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_BlueLight_Rep2.sub
-Dry run, not submitted
-## [3 of 4] sample: Cm_Darkness_Rep1; pipeline: sra_convert
-Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_Darkness_Rep1.sub
-Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_Darkness_Rep1.sub
-Dry run, not submitted
-## [4 of 4] sample: Cm_Darkness_Rep2; pipeline: sra_convert
-Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_Darkness_Rep2.sub
-Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_Darkness_Rep2.sub
-Dry run, not submitted
-
-Looper finished
-Samples valid for job generation: 4 of 4
-Commands submitted: 4 of 4
-Jobs submitted: 4
-Dry run. No jobs were actually submitted.
-
-```
-
-And now the real thing:
-
-
-```bash
-SRARAW=${HOME}/ncbi/public/sra/ SRAFQ=red_algae/fastq \
-  looper run red_algae/red_algae_config.yaml -a sra_convert -p local \
-  --command-extra=--keep-sra
-```
-
-```.output
-Looper version: 1.2.0-dev
-Command: run
-Using amendments: sra_convert
-Activating compute package 'local'
-## [1 of 4] sample: Cm_BlueLight_Rep1; pipeline: sra_convert
-Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_BlueLight_Rep1.sub
-Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_BlueLight_Rep1.sub
-Compute node: zither
-Start time: 2020-05-21 17:40:56
-Using outfolder: red_algae/results_pipeline/SRX969073
-### Pipeline run code and environment:
-
-*              Command:  `/home/nsheff/.local/bin/sraconvert --srr /home/nsheff/ncbi/public/sra//SRR1930183.sra --sample-name SRX969073 -O red_algae/results_pipeline --keep-sra`
-*         Compute host:  zither
-*          Working dir:  /home/nsheff/code/geofetch/docs_jupyter
-*            Outfolder:  red_algae/results_pipeline/SRX969073/
-*  Pipeline started at:   (05-21 17:40:57) elapsed: 0.0 _TIME_
-
-### Version log:
-
-*       Python version:  3.7.5
-*          Pypiper dir:  `/home/nsheff/.local/lib/python3.7/site-packages/pypiper`
-*      Pypiper version:  0.12.1
-*         Pipeline dir:  `/home/nsheff/.local/bin`
-*     Pipeline version:  None
-
-### Arguments passed to pipeline:
-
-*          `bamfolder`:  ``
-*        `config_file`:  `sraconvert.yaml`
-*             `format`:  `fastq`
-*           `fqfolder`:  `red_algae/fastq`
-*           `keep_sra`:  `True`
-*             `logdev`:  `False`
-*               `mode`:  `convert`
-*      `output_parent`:  `red_algae/results_pipeline`
-*            `recover`:  `False`
-*        `sample_name`:  `['SRX969073']`
-*             `silent`:  `False`
-*          `srafolder`:  `/home/nsheff/ncbi/public/sra/`
-*                `srr`:  `['/home/nsheff/ncbi/public/sra//SRR1930183.sra']`
-*          `verbosity`:  `None`
-
-----------------------------------------
-
-Processing 1 of 1 files: SRR1930183
-Target to produce: `red_algae/fastq/SRR1930183_1.fastq.gz`  
-
-> `fastq-dump /home/nsheff/ncbi/public/sra//SRR1930183.sra --split-files --gzip -O red_algae/fastq` (9436)
-<pre>
-Read 1068319 spots for /home/nsheff/ncbi/public/sra//SRR1930183.sra
-Written 1068319 spots for /home/nsheff/ncbi/public/sra//SRR1930183.sra
-</pre>
-Command completed. Elapsed time: 0:00:38. Running peak memory: 0.067GB.  
-  PID: 9436;	Command: fastq-dump;	Return code: 0;	Memory used: 0.067GB
-
-Already completed files: []
-
-### Pipeline completed. Epilogue
-*        Elapsed time (this run):  0:00:38
-*  Total elapsed time (all runs):  0:00:38
-*         Peak memory (this run):  0.0666 GB
-*        Pipeline completed time: 2020-05-21 17:41:35
-## [2 of 4] sample: Cm_BlueLight_Rep2; pipeline: sra_convert
-Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_BlueLight_Rep2.sub
-Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_BlueLight_Rep2.sub
-Compute node: zither
-Start time: 2020-05-21 17:41:36
-Using outfolder: red_algae/results_pipeline/SRX969074
-### Pipeline run code and environment:
-
-*              Command:  `/home/nsheff/.local/bin/sraconvert --srr /home/nsheff/ncbi/public/sra//SRR1930184.sra --sample-name SRX969074 -O red_algae/results_pipeline --keep-sra`
-*         Compute host:  zither
-*          Working dir:  /home/nsheff/code/geofetch/docs_jupyter
-*            Outfolder:  red_algae/results_pipeline/SRX969074/
-*  Pipeline started at:   (05-21 17:41:36) elapsed: 0.0 _TIME_
-
-### Version log:
-
-*       Python version:  3.7.5
-*          Pypiper dir:  `/home/nsheff/.local/lib/python3.7/site-packages/pypiper`
-*      Pypiper version:  0.12.1
-*         Pipeline dir:  `/home/nsheff/.local/bin`
-*     Pipeline version:  None
-
-### Arguments passed to pipeline:
-
-*          `bamfolder`:  ``
-*        `config_file`:  `sraconvert.yaml`
-*             `format`:  `fastq`
-*           `fqfolder`:  `red_algae/fastq`
-*           `keep_sra`:  `True`
-*             `logdev`:  `False`
-*               `mode`:  `convert`
-*      `output_parent`:  `red_algae/results_pipeline`
-*            `recover`:  `False`
-*        `sample_name`:  `['SRX969074']`
-*             `silent`:  `False`
-*          `srafolder`:  `/home/nsheff/ncbi/public/sra/`
-*                `srr`:  `['/home/nsheff/ncbi/public/sra//SRR1930184.sra']`
-*          `verbosity`:  `None`
-
-----------------------------------------
-
-Processing 1 of 1 files: SRR1930184
-Target exists: `red_algae/fastq/SRR1930184_1.fastq.gz`  
-Already completed files: []
-
-### Pipeline completed. Epilogue
-*        Elapsed time (this run):  0:00:00
-*  Total elapsed time (all runs):  0:00:00
-*         Peak memory (this run):  0 GB
-*        Pipeline completed time: 2020-05-21 17:41:36
-## [3 of 4] sample: Cm_Darkness_Rep1; pipeline: sra_convert
-Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_Darkness_Rep1.sub
-Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_Darkness_Rep1.sub
-Compute node: zither
-Start time: 2020-05-21 17:41:36
-Using outfolder: red_algae/results_pipeline/SRX969075
-### Pipeline run code and environment:
-
-*              Command:  `/home/nsheff/.local/bin/sraconvert --srr /home/nsheff/ncbi/public/sra//SRR1930185.sra --sample-name SRX969075 -O red_algae/results_pipeline --keep-sra`
-*         Compute host:  zither
-*          Working dir:  /home/nsheff/code/geofetch/docs_jupyter
-*            Outfolder:  red_algae/results_pipeline/SRX969075/
-*  Pipeline started at:   (05-21 17:41:36) elapsed: 0.0 _TIME_
-
-### Version log:
-
-*       Python version:  3.7.5
-*          Pypiper dir:  `/home/nsheff/.local/lib/python3.7/site-packages/pypiper`
-*      Pypiper version:  0.12.1
-*         Pipeline dir:  `/home/nsheff/.local/bin`
-*     Pipeline version:  None
-
-### Arguments passed to pipeline:
-
-*          `bamfolder`:  ``
-*        `config_file`:  `sraconvert.yaml`
-*             `format`:  `fastq`
-*           `fqfolder`:  `red_algae/fastq`
-*           `keep_sra`:  `True`
-*             `logdev`:  `False`
-*               `mode`:  `convert`
-*      `output_parent`:  `red_algae/results_pipeline`
-*            `recover`:  `False`
-*        `sample_name`:  `['SRX969075']`
-*             `silent`:  `False`
-*          `srafolder`:  `/home/nsheff/ncbi/public/sra/`
-*                `srr`:  `['/home/nsheff/ncbi/public/sra//SRR1930185.sra']`
-*          `verbosity`:  `None`
-
-----------------------------------------
-
-Processing 1 of 1 files: SRR1930185
-Target to produce: `red_algae/fastq/SRR1930185_1.fastq.gz`  
-
-> `fastq-dump /home/nsheff/ncbi/public/sra//SRR1930185.sra --split-files --gzip -O red_algae/fastq` (9607)
-<pre>
-Read 1707508 spots for /home/nsheff/ncbi/public/sra//SRR1930185.sra
-Written 1707508 spots for /home/nsheff/ncbi/public/sra//SRR1930185.sra
-</pre>
-Command completed. Elapsed time: 0:01:01. Running peak memory: 0.066GB.  
-  PID: 9607;	Command: fastq-dump;	Return code: 0;	Memory used: 0.066GB
-
-Already completed files: []
-
-### Pipeline completed. Epilogue
-*        Elapsed time (this run):  0:01:01
-*  Total elapsed time (all runs):  0:01:01
-*         Peak memory (this run):  0.0656 GB
-*        Pipeline completed time: 2020-05-21 17:42:37
-## [4 of 4] sample: Cm_Darkness_Rep2; pipeline: sra_convert
-Writing script to /home/nsheff/code/geofetch/docs_jupyter/red_algae/submission/sra_convert_Cm_Darkness_Rep2.sub
-Job script (n=1; 0.00Gb): red_algae/submission/sra_convert_Cm_Darkness_Rep2.sub
-Compute node: zither
-Start time: 2020-05-21 17:42:38
-Using outfolder: red_algae/results_pipeline/SRX969076
-### Pipeline run code and environment:
-
-*              Command:  `/home/nsheff/.local/bin/sraconvert --srr /home/nsheff/ncbi/public/sra//SRR1930186.sra --sample-name SRX969076 -O red_algae/results_pipeline --keep-sra`
-*         Compute host:  zither
-*          Working dir:  /home/nsheff/code/geofetch/docs_jupyter
-*            Outfolder:  red_algae/results_pipeline/SRX969076/
-*  Pipeline started at:   (05-21 17:42:38) elapsed: 0.0 _TIME_
-
-### Version log:
-
-*       Python version:  3.7.5
-*          Pypiper dir:  `/home/nsheff/.local/lib/python3.7/site-packages/pypiper`
-*      Pypiper version:  0.12.1
-*         Pipeline dir:  `/home/nsheff/.local/bin`
-*     Pipeline version:  None
-
-### Arguments passed to pipeline:
-
-*          `bamfolder`:  ``
-*        `config_file`:  `sraconvert.yaml`
-*             `format`:  `fastq`
-*           `fqfolder`:  `red_algae/fastq`
-*           `keep_sra`:  `True`
-*             `logdev`:  `False`
-*               `mode`:  `convert`
-*      `output_parent`:  `red_algae/results_pipeline`
-*            `recover`:  `False`
-*        `sample_name`:  `['SRX969076']`
-*             `silent`:  `False`
-*          `srafolder`:  `/home/nsheff/ncbi/public/sra/`
-*                `srr`:  `['/home/nsheff/ncbi/public/sra//SRR1930186.sra']`
-*          `verbosity`:  `None`
-
-----------------------------------------
-
-Processing 1 of 1 files: SRR1930186
-Target to produce: `red_algae/fastq/SRR1930186_1.fastq.gz`  
-
-> `fastq-dump /home/nsheff/ncbi/public/sra//SRR1930186.sra --split-files --gzip -O red_algae/fastq` (9780)
-<pre>
-Read 1224029 spots for /home/nsheff/ncbi/public/sra//SRR1930186.sra
-Written 1224029 spots for /home/nsheff/ncbi/public/sra//SRR1930186.sra
-</pre>
-Command completed. Elapsed time: 0:00:44. Running peak memory: 0.067GB.  
-  PID: 9780;	Command: fastq-dump;	Return code: 0;	Memory used: 0.067GB
-
-Already completed files: []
-
-### Pipeline completed. Epilogue
-*        Elapsed time (this run):  0:00:44
-*  Total elapsed time (all runs):  0:00:44
-*         Peak memory (this run):  0.0673 GB
-*        Pipeline completed time: 2020-05-21 17:43:22
-
-Looper finished
-Samples valid for job generation: 4 of 4
-Commands submitted: 4 of 4
-Jobs submitted: 4
-
-```
-
-Now that's done, let's take a look in the `red_algae/fastq` folder (where we set the `$SRAFQ` variable).
-
-
-```bash
-ls red_algae/fastq
-```
-
-```.output
-SRR1930183_1.fastq.gz  SRR1930184_2.fastq.gz  SRR1930186_1.fastq.gz
-SRR1930183_2.fastq.gz  SRR1930185_1.fastq.gz  SRR1930186_2.fastq.gz
-SRR1930184_1.fastq.gz  SRR1930185_2.fastq.gz
-
-```
-
-By default, the sra conversion script will delete the `.sra` files after they have been converted to fastq. You can keep them if you want by passing `--keep-sra`, which you can do by passing `--command-extra=--keep-sra` to your `looper run` command.
-
 
 ## Finalize the project config and sample annotation
Original file line number	Diff line number	Diff line change
Expand Up		@@ -15,4 +15,4 @@ This effectively makes it easier to interact with project-level management of

		## Tutorial

		See the [tutorial](raw-data-downloading.md) for an example of how to use `sraconvert`.
		See the [how-to](how_to_convert_fastq_from_sra.md) for an example of how to use `sraconvert`.