From 8b214a442080eebb83a0983008f66721ff2812b1 Mon Sep 17 00:00:00 2001
From: "Ziyang \"Claude\" Hu" <33562602+ClaudeHu@users.noreply.github.com>
Date: Tue, 2 Apr 2024 16:24:03 -0400
Subject: [PATCH 1/7] Update text2bednn-search-interface.md
---
.../tutorials/text2bednn-search-interface.md | 51 ++++++++++++++-----
1 file changed, 37 insertions(+), 14 deletions(-)
diff --git a/docs/geniml/tutorials/text2bednn-search-interface.md b/docs/geniml/tutorials/text2bednn-search-interface.md
index 5c284ee..25913d6 100644
--- a/docs/geniml/tutorials/text2bednn-search-interface.md
+++ b/docs/geniml/tutorials/text2bednn-search-interface.md
@@ -10,7 +10,7 @@ file embedding vectors, and the BED files whose embedding vectors are closest to
## Store embedding vectors
It is recommended to use `geniml.search.backend.HNSWBackend` to store embedding vectors. In the `HNSWBackend` that stores each BED file embedding
vector, the `payload` should contain the name of BED file. In the `HNSWBackend` that stores the embedding vectors of each
-metadata string, the `payload` should contain the name of BED files that have that string in metadata.
+metadata string, the `payload` should contain the original string text and the names of BED files that have that string in metadata.
## Train the model
Training a `Vec2VecFNN` needs x-y pairs of vectors (x: metadata embedding vector; y: BED embedding vector). A pair of a metadata embedding
@@ -39,24 +39,45 @@ v2v_torch_contrast.train(
```
-## text2bednn search interface
-The `TextToBedNNSearchInterface` includes model that encode natural language to vectors (default: `FlagEmbedding`), a
-model that encode natural language embedding vectors to BED file embedding vectors (`Embed2EmbedNN`), and a `search` backend.
+## Search interface
+A search interface consists of a storage backend where vectors are stored, and a module (`geniml.search.query2vec`) that embed the query.
+`geniml.search` supports two types of queries: region set query and text query.
+
+### Region set query
+
+`BED2Vec` embed the query region set with a `Region2VecExModel`, and the embedding vector is used to perform KNN search within the backend.
```python
-from geniml.text2bednn.text2bednn import Text2BEDSearchInterface
+from geniml.search import BED2BEDSearchInterface, BED2Vec
+
+# init BED2Vec with a hugging face repo of a Region2VecExModel
+bed2vec = BED2Vec("databio/r2v-ChIP-atlas-hg38-v2")
+
+# the search_backend can be QdrantBackend or HNSWBackend
+search_interface = BED2BEDSearchInterface(search_backend, bed2vec)
+
+# the query cam be a RegionSet object (see geniml.io) or path to a BED file in disk
+file_search_result = search_interface.query_search("path/to/a/bed/file.bed", 5)
+```
+
+### Text query
-# initiate the search interface
-file_interface = Text2BEDSearchInterface(nl_model, e2enn, hnsw_backend)
+`Text2Vec` embed the query string with a with a natural language embedding model first (default: `FlagEmbedding`), and then maps the text embedding vector into the embedding space of region sets through a trained `Vec2VecFNN`.
-# natural language query string
-query_term = "human, kidney, blood"
-# perform KNN search with K = 5, the id of stored vectors and the distance / similarity score will be returned
-ids, scores = file_interface.nl_vec_search(query_term, 5)
+```
+from geniml.search import Text2BEDSearchInterface, Text2Vec
+
+text2vec = Text2Vec(
+ "sentence-transformers/all-MiniLM-L6-v2", # either a hugging face repo or an object from geniml.text2bednn.embedder
+ "databio/v2v-geo-hg38" # either a hugging face repo or a Vec2VecFNN
+)
+
+search_interface = Text2BEDSearchInterface(search_backend, text2vec)
+text_search_result = search_interface.query_search("cancer cells", 5)
```
-### Evaluate search performance
With a dictionary that contains query strings and id of relevant query results in search backend in this format:
+
```
{
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. - PID: 24312; Command: zcat; Return code: 0; Memory used: 0.003GB - - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db1/33xf84g5 ` -File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db1/33xf84g5) has passed Quality Control! -Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE105587_ENCFF018NNF_conservative_idr_thresholded_peaks_GRCh38.bed.gz -Determining path to chrom.sizes asset via Refgenie. -Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml -/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml. -Initialize DBBackend -/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring "_pipeline_name" - return create_model( -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py", line 689, in _engine -Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml - return self.db_engine_key -AttributeError: 'DBBackend' object has no attribute 'db_engine_key' - -During handling of the above exception, another exception occurred: - -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/bin/bedboss", line 8, in
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. - PID: 24344; Command: zcat; Return code: 0; Memory used: 0.0GB - - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db2/lypwq5fe ` -File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db2/lypwq5fe) has passed Quality Control! -Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE105977_ENCFF617QGK_optimal_idr_thresholded_peaks_GRCh38.bed.gz -Determining path to chrom.sizes asset via Refgenie. -Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml -/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml. -Initialize DBBackend -/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring "_pipeline_name" - return create_model( -Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py", line 689, in _engine - return self.db_engine_key -AttributeError: 'DBBackend' object has no attribute 'db_engine_key' - -During handling of the above exception, another exception occurred: - -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/bin/bedboss", line 8, in
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. - PID: 24374; Command: zcat; Return code: 0; Memory used: 0.0GB - - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db3/_5zvvg7p ` -File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db3/_5zvvg7p) has passed Quality Control! -Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE105977_ENCFF793SZW_conservative_idr_thresholded_peaks_GRCh38.bed.gz -Determining path to chrom.sizes asset via Refgenie. -Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml -/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml. -Initialize DBBackend -/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring "_pipeline_name" - return create_model( -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py", line 689, in _engine -Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml - return self.db_engine_key -AttributeError: 'DBBackend' object has no attribute 'db_engine_key' - -During handling of the above exception, another exception occurred: - -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/bin/bedboss", line 8, in
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. - PID: 24404; Command: zcat; Return code: 0; Memory used: 0.003GB - - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db4/gig106fd ` -File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db4/gig106fd) has passed Quality Control! -Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE105977_ENCFF937CGY_peaks_GRCh38.bed.gz -Determining path to chrom.sizes asset via Refgenie. -Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml -/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml. -Initialize DBBackend -/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring "_pipeline_name" - return create_model( -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py", line 689, in _engine -Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml - return self.db_engine_key -AttributeError: 'DBBackend' object has no attribute 'db_engine_key' - -During handling of the above exception, another exception occurred: - -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/bin/bedboss", line 8, in
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. - PID: 24435; Command: zcat; Return code: 0; Memory used: 0.003GB - - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db5/ix1s2r3k ` -File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db5/ix1s2r3k) has passed Quality Control! -Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE91663_ENCFF316ASR_peaks_GRCh38.bed.gz -Determining path to chrom.sizes asset via Refgenie. -Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml -/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml. -Initialize DBBackend -/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring "_pipeline_name" - return create_model( -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py", line 689, in _engine -Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml - return self.db_engine_key -AttributeError: 'DBBackend' object has no attribute 'db_engine_key' - -During handling of the above exception, another exception occurred: - -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/bin/bedboss", line 8, in
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. - PID: 24466; Command: zcat; Return code: 0; Memory used: 0.003GB - - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db6/jrhj1l5n ` -File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db6/jrhj1l5n) has passed Quality Control! -Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE91663_ENCFF319TPR_conservative_idr_thresholded_peaks_GRCh38.bed.gz -Determining path to chrom.sizes asset via Refgenie. -Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml -/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml. -Initialize DBBackend -/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring "_pipeline_name" - return create_model( -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py", line 689, in _engine -Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml - return self.db_engine_key -AttributeError: 'DBBackend' object has no attribute 'db_engine_key' - -During handling of the above exception, another exception occurred: - -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/bin/bedboss", line 8, in
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. - PID: 24496; Command: zcat; Return code: 0; Memory used: 0.003GB - - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db7/9r0q9410 ` -File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db7/9r0q9410) has passed Quality Control! -Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE91663_ENCFF553KIK_optimal_idr_thresholded_peaks_GRCh38.bed.gz -Determining path to chrom.sizes asset via Refgenie. -Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml -/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml. -Initialize DBBackend -/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring "_pipeline_name" - return create_model( -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py", line 689, in _engine -Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml - return self.db_engine_key -AttributeError: 'DBBackend' object has no attribute 'db_engine_key' - -During handling of the above exception, another exception occurred: - -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/bin/bedboss", line 8, in
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. - PID: 24527; Command: zcat; Return code: 0; Memory used: 0.003GB - - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db8/ny2pxb01 ` -File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db8/ny2pxb01) has passed Quality Control! -Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSM2423312_ENCFF155HVK_peaks_GRCh38.bed.gz -Determining path to chrom.sizes asset via Refgenie. -Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml -/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml. -Initialize DBBackend -/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring "_pipeline_name" - return create_model( -Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py", line 689, in _engine - return self.db_engine_key -AttributeError: 'DBBackend' object has no attribute 'db_engine_key' - -During handling of the above exception, another exception occurred: - -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/bin/bedboss", line 8, in
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. - PID: 24559; Command: zcat; Return code: 0; Memory used: 0.003GB - - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedhost_demo_db9/h6i4w9_0 ` -File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedhost_demo_db9/h6i4w9_0) has passed Quality Control! -Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSM2423313_ENCFF722AOG_peaks_GRCh38.bed.gz -Determining path to chrom.sizes asset via Refgenie. -Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml -/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml. -Initialize DBBackend -/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring "_pipeline_name" - return create_model( -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py", line 689, in _engine -Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml - return self.db_engine_key -AttributeError: 'DBBackend' object has no attribute 'db_engine_key' - -During handling of the above exception, another exception occurred: - -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/bin/bedboss", line 8, in
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. - PID: 24590; Command: zcat; Return code: 0; Memory used: 0.003GB - - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db10/l3b3cyqx ` -File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db10/l3b3cyqx) has passed Quality Control! -Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSM2827349_ENCFF196DNQ_peaks_GRCh38.bed.gz -Determining path to chrom.sizes asset via Refgenie. -Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml -/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml. -Initialize DBBackend -/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring "_pipeline_name" - return create_model( -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py", line 689, in _engine -Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml - return self.db_engine_key -AttributeError: 'DBBackend' object has no attribute 'db_engine_key' - -During handling of the above exception, another exception occurred: - -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/bin/bedboss", line 8, in
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. - PID: 24621; Command: zcat; Return code: 0; Memory used: 0.003GB - - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db11/2pfkxwx0 ` -File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db11/2pfkxwx0) has passed Quality Control! -Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSM2827350_ENCFF928JXU_peaks_GRCh38.bed.gz -Determining path to chrom.sizes asset via Refgenie. -Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml -/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml. -Initialize DBBackend -/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring "_pipeline_name" - return create_model( -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py", line 689, in _engine -Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml - return self.db_engine_key -AttributeError: 'DBBackend' object has no attribute 'db_engine_key' - -During handling of the above exception, another exception occurred: - -Traceback (most recent call last): - File "/home/bnt4me/virginia/venv/jupyter/bin/bedboss", line 8, in
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. - PID: 2477650; Command: cp; Return code: 0; Memory used: 0.0GB - - -> `gzip ./bed/hg19_example1.bed ` (2477652) -
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. - PID: 2477652; Command: gzip; Return code: 0; Memory used: 0.0GB - -Running bedqc... -Target to produce: `./bed/bedmaker_logs/test_bed/xl67fcgi` - -> `zcat ./bed/hg19_example1.bed.gz > ./bed/bedmaker_logs/test_bed/xl67fcgi` (2477654) -
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. - PID: 2477654; Command: zcat; Return code: 0; Memory used: 0.0GB - -Targetless command, running... - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh ./bed/bedmaker_logs/test_bed/xl67fcgi ` (2477656) -
-1000-Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. - PID: 2477656; Command: bash; Return code: 0; Memory used: 0.0GB - -Starting cleanup: 1 files; 0 conditional files for cleanup - -Cleaning up flagged intermediate files. . . - -### Pipeline completed. Epilogue -* Elapsed time (this run): 0:00:00 -* Total elapsed time (all runs): 0:00:00 -* Peak memory (this run): 0 GB -* Pipeline completed time: 2023-02-08 15:39:09 -Generating bigBed files for: ../test/data/bed/hg19/correct/hg19_example1.bed -Determining path to chrom.sizes asset via Refgenie. -Creating refgenie genome config file... -Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/genome_config.yaml -/home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/alias/hg19/fasta/default/hg19.chrom.sizes -Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/alias/hg19/fasta/default/hg19.chrom.sizes -Target to produce: `./bigbed/jckj3p1d` - -> `zcat ./bed/hg19_example1.bed.gz | sort -k1,1 -k2,2n > ./bigbed/jckj3p1d` (2477666,2477667) -
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. - PID: 2477666; Command: zcat; Return code: 0; Memory used: 0.0GB - PID: 2477667; Command: sort; Return code: 0; Memory used: 0.0GB - -Running: bedToBigBed -type=bed6+3 ./bigbed/jckj3p1d /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/alias/hg19/fasta/default/hg19.chrom.sizes ./bigbed/hg19_example1.bigBed -Target to produce: `./bigbed/hg19_example1.bigBed` - -> `bedToBigBed -type=bed6+3 ./bigbed/jckj3p1d /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/alias/hg19/fasta/default/hg19.chrom.sizes ./bigbed/hg19_example1.bigBed` (2477669) -
-pass1 - making usageList (1 chroms): 1 millis -pass2 - checking and writing primary data (175 records, 9 fields): 0 millis --Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. - PID: 2477669; Command: bedToBigBed; Return code: 0; Memory used: 0.0GB - -Starting cleanup: 2 files; 0 conditional files for cleanup - -Cleaning up flagged intermediate files. . . - -### Pipeline completed. Epilogue -* Elapsed time (this run): 0:00:00 -* Total elapsed time (all runs): 0:00:00 -* Peak memory (this run): 0 GB -* Pipeline completed time: 2023-02-08 15:39:09 - -``` - -### Let's check if bed file was created (or copied) - - -```bash -ls bed -``` - -```.output -bedmaker_logs hg19_example1.bed.gz - -``` - -### Let's check if bigbed file was created - - -```bash -ls bigbed -``` - -```.output -hg19_example1.bigBed - -``` - -### everything was finished successfuly and files are ready for further analysis! diff --git a/docs/bedboss/code/bedqc-tutorial.md b/docs/bedboss/code/bedqc-tutorial.md deleted file mode 100644 index e21dd71..0000000 --- a/docs/bedboss/code/bedqc-tutorial.md +++ /dev/null @@ -1,71 +0,0 @@ -jupyter:True -# bedqc tutorial - -To check Quality of bed file use this command: `badboss qc` - - -```bash -bedboss qc --help -``` - -```.output -usage: bedboss qc [-h] --bedfile BEDFILE --outfolder OUTFOLDER - -options: - -h, --help show this help message and exit - --bedfile BEDFILE a full path to bed file to process - --outfolder OUTFOLDER - a full path to output log folder. - -``` - -bedqc example: - - -```bash -bedboss qc --bedfile ../test/data/bed/hg19/correct/hg19_example1.bed --outfolder . -``` - -```.output -Running bedqc... -### Pipeline run code and environment: - -* Command: `/home/bnt4me/virginia/venv/jupyter/bin/bedboss qc --bedfile ../test/data/bed/hg19/correct/hg19_example1.bed --outfolder .` -* Compute host: bnt4me-Precision-5560 -* Working dir: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter -* Outfolder: ./ -* Pipeline started at: (02-08 15:44:57) elapsed: 0.0 _TIME_ - -### Version log: - -* Python version: 3.10.6 -* Pypiper dir: `/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pypiper` -* Pypiper version: 0.12.3 -* Pipeline dir: `/home/bnt4me/virginia/venv/jupyter/bin` -* Pipeline version: None - -### Arguments passed to pipeline: - - ----------------------------------------- - -Target exists: `../test/data/bed/hg19/correct/hg19_example1.bed` -Targetless command, running... - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh ../test/data/bed/hg19/correct/hg19_example1.bed ` (2478311) -
-1000-Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. - PID: 2478311; Command: bash; Return code: 0; Memory used: 0.0GB - -Starting cleanup: 1 files; 0 conditional files for cleanup - -Cleaning up flagged intermediate files. . . - -### Pipeline completed. Epilogue -* Elapsed time (this run): 0:00:00 -* Total elapsed time (all runs): 0:00:00 -* Peak memory (this run): 0 GB -* Pipeline completed time: 2023-02-08 15:44:57 - -``` diff --git a/docs/bedboss/code/bedstat-tutorial.md b/docs/bedboss/code/bedstat-tutorial.md deleted file mode 100644 index d506a7e..0000000 --- a/docs/bedboss/code/bedstat-tutorial.md +++ /dev/null @@ -1,371 +0,0 @@ -jupyter:True -# bedboss stat - -This tutorial is intended to introduce you to bedstat, pipeline that produces statistics and plots based on bed and bigbed files - -### 1. Install all dependencies and initialize database for it - -- Install dependecies: [How to install R dependencies](./how_to_install_r_dep/) -- Initialize database: [How to initialize database](./how_to_create_database/) -- Create config file: [How to create config file](./how_to_bedbase_config/) - -### 2. Create working repository - - -```bash -mkdir stat_tutorial ; cd stat_tutorial -``` - -Create config file by downloading it and configuring it - - -```bash -cat bedbase_config_test.yaml -``` - -```.output -path: - pipeline_output_path: $BEDBOSS_OUTPUT_PATH # do not change it - bedstat_dir: bedstat_output - remote_url_base: null - bedbuncher_dir: bedbucher_output -database: - host: localhost - port: 5432 - password: docker - user: postgres - name: pep-db - dialect: postgresql - driver: psycopg2 -server: - host: 0.0.0.0 - port: 8000 -remotes: - http: - prefix: https://data.bedbase.org/ - description: HTTP compatible path - s3: - prefix: s3://data.bedbase.org/ - description: S3 compatible path - -``` - -### 3. Download bed and bigbed files - -Bed file - - -```bash -wget -O sample1.bed.gz https://github.com/bedbase/bedboss/raw/dev/test/data/bed/hg19/correct/sample1.bed.gz - -``` - -```.output ---2023-02-28 15:32:57-- https://github.com/bedbase/bedboss/raw/dev/test/data/bed/hg19/correct/sample1.bed.gz -Resolving github.com (github.com)... 140.82.113.3 -Connecting to github.com (github.com)|140.82.113.3|:443... connected. -HTTP request sent, awaiting response... 302 Found -Location: https://raw.githubusercontent.com/bedbase/bedboss/dev/test/data/bed/hg19/correct/sample1.bed.gz [following] ---2023-02-28 15:32:57-- https://raw.githubusercontent.com/bedbase/bedboss/dev/test/data/bed/hg19/correct/sample1.bed.gz -Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ... -Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. -HTTP request sent, awaiting response... 200 OK -Length: 7087126 (6.8M) [application/octet-stream] -Saving to: ‘sample1.bed.gz’ - -sample1.bed.gz 100%[===================>] 6.76M --.-KB/s in 0.07s - -2023-02-28 15:32:58 (95.8 MB/s) - ‘sample1.bed.gz’ saved [7087126/7087126] - - -``` - -BigBed file - - -```bash -wget -O sample1.bigBed https://github.com/bedbase/bedboss/raw/dev/test/data/bigbed/hg19/correct/sample1.bigBed - -``` - -```.output ---2023-02-28 15:33:00-- https://github.com/bedbase/bedboss/raw/dev/test/data/bigbed/hg19/correct/sample1.bigBed -Resolving github.com (github.com)... 140.82.113.3 -Connecting to github.com (github.com)|140.82.113.3|:443... connected. -HTTP request sent, awaiting response... 302 Found -Location: https://raw.githubusercontent.com/bedbase/bedboss/dev/test/data/bigbed/hg19/correct/sample1.bigBed [following] ---2023-02-28 15:33:00-- https://raw.githubusercontent.com/bedbase/bedboss/dev/test/data/bigbed/hg19/correct/sample1.bigBed -Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ... -Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected. -HTTP request sent, awaiting response... 200 OK -Length: 13092350 (12M) [application/octet-stream] -Saving to: ‘sample1.bigBed’ - -sample1.bigBed 100%[===================>] 12.49M --.-KB/s in 0.1s - -2023-02-28 15:33:00 (101 MB/s) - ‘sample1.bigBed’ saved [13092350/13092350] - - -``` - - -```bash -ls -``` - -```.output -bedbase_config_test.yaml sample1.bed.gz sample1.bigBed - -``` - -### 4. Run statistics: - -Additionally we need some metadata about files. 1) genome assembly, config file and know output folder. - - -```bash -bedboss stat --help -``` - -```.output -usage: bedboss stat [-h] --bedfile BEDFILE --outfolder OUTFOLDER - [--open-signal-matrix OPEN_SIGNAL_MATRIX] [--ensdb ENSDB] - [--bigbed BIGBED] --bedbase-config BEDBASE_CONFIG - [-y SAMPLE_YAML] --genome GENOME_ASSEMBLY [--no-db-commit] - [--just-db-commit] - -options: - -h, --help show this help message and exit - --bedfile BEDFILE a full path to bed file to process [Required] - --outfolder OUTFOLDER - Pipeline output folder [Required] - --open-signal-matrix OPEN_SIGNAL_MATRIX - a full path to the openSignalMatrix required for the - tissue specificity plots - --ensdb ENSDB a full path to the ensdb gtf file required for genomes - not in GDdata - --bigbed BIGBED a full path to the bigbed files - --bedbase-config BEDBASE_CONFIG - a path to the bedbase configuration file [Required] - -y SAMPLE_YAML, --sample-yaml SAMPLE_YAML - a yaml config file with sample attributes to pass on - more metadata into the database - --genome GENOME_ASSEMBLY - genome assembly of the sample [Required] - --no-db-commit whether the JSON commit to the database should be - skipped - --just-db-commit whether just to commit the JSON to the database - -``` - - -```bash -bedboss stat \ ---bedfile ./sample1.bed.gz \ ---bigbed ./sample1.bigBed \ ---outfolder ./test_output \ ---genome hg19 \ ---bedbase-config ./bedbase_config_test.yaml - -``` - -```.output -Warning: You're running an interactive python session. This works, but pypiper cannot tee the output, so results are only logged to screen. -### Pipeline run code and environment: - -* Command: `/home/bnt4me/virginia/venv/jupyter/bin/bedboss stat --bedfile ./sample1.bed.gz --bigbed ./sample1.bigBed --outfolder ./test_output --genome hg19 --bedbase-config ./bedbase_config_test.yaml` -* Compute host: bnt4me-Precision-5560 -* Working dir: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial -* Outfolder: ./test_output/ -* Pipeline started at: (02-28 15:46:52) elapsed: 0.0 _TIME_ - -### Version log: - -* Python version: 3.10.6 -* Pypiper dir: `/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pypiper` -* Pypiper version: 0.12.3 -* Pipeline dir: `/home/bnt4me/virginia/venv/jupyter/bin` -* Pipeline version: 0.1.0-dev1 - -### Arguments passed to pipeline: - - ----------------------------------------- - -Target to produce: `/home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1.json` - -> `Rscript /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedstat/tools/regionstat.R --bedfilePath=./sample1.bed.gz --fileId=sample1 --openSignalMatrix=None --outputFolder=/home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c --genome=hg19 --ensdb=None --digest=c557c915a9901ce377ef724806ff7a2c` (530529) -
-Loading required package: IRanges -Loading required package: BiocGenerics - -Attaching package: ‘BiocGenerics’ - -The following objects are masked from ‘package:stats’: - - IQR, mad, sd, var, xtabs - -The following objects are masked from ‘package:base’: - - anyDuplicated, append, as.data.frame, basename, cbind, colnames, - dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, - grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, - order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, - rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, - union, unique, unsplit, which.max, which.min - -Loading required package: S4Vectors -Loading required package: stats4 - -Attaching package: ‘S4Vectors’ - -The following objects are masked from ‘package:base’: - - expand.grid, I, unname - -Loading required package: GenomicRanges -Loading required package: GenomeInfoDb -[?25hsnapshotDate(): 2021-10-19 -[?25h[?25hLoading required package: GenomicFeatures -Loading required package: AnnotationDbi -Loading required package: Biobase -Welcome to Bioconductor - - Vignettes contain introductory material; view with - 'browseVignettes()'. To cite Bioconductor, see - 'citation("Biobase")', and for packages 'citation("pkgname")'. - -Loading required package: AnnotationFilter - -Attaching package: 'ensembldb' - -The following object is masked from 'package:stats': - - filter - -[?25h[?25h[?25hLoading required package: R.oo -Loading required package: R.methodsS3 -R.methodsS3 v1.8.2 (2022-06-13 22:00:14 UTC) successfully loaded. See ?R.methodsS3 for help. -R.oo v1.25.0 (2022-06-12 02:20:02 UTC) successfully loaded. See ?R.oo for help. - -Attaching package: 'R.oo' - -The following object is masked from 'package:R.methodsS3': - - throw - -The following object is masked from 'package:GenomicRanges': - - trim - -The following object is masked from 'package:IRanges': - - trim - -The following objects are masked from 'package:methods': - - getClasses, getMethods - -The following objects are masked from 'package:base': - - attach, detach, load, save - -R.utils v2.12.2 (2022-11-11 22:00:03 UTC) successfully loaded. See ?R.utils for help. - -Attaching package: 'R.utils' - -The following object is masked from 'package:utils': - - timestamp - -The following objects are masked from 'package:base': - - cat, commandArgs, getOption, isOpen, nullfile, parse, warnings - -[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_tssdist" -Scale for x is already present. -Adding another scale for x, which will replace the existing scale. -[1] "Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_tssdist" -Successfully calculated and plot TSS distance. -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_chrombins" -[1] "Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_chrombins" -Successfully calculated and plot chromosomes region distribution. -Calculating overlaps... -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_paritions" -[1] "Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_paritions" -Successfully calculated and plot regions distribution over genomic partitions. -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_expected_partitions" -[1] "Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_expected_partitions" -Successfully calculated and plot expected distribution over genomic partitions. -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_cumulative_partitions" -[1] "Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_cumulative_partitions" -Successfully calculated and plot cumulative distribution over genomic partitions. -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_widths_histogram" -[1] "Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_widths_histogram" -Successfully calculated and plot quantile-trimmed histogram of widths. -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_neighbor_distances" -[1] "Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_neighbor_distances" -Successfully calculated and plot distance between neighbor regions. -open signal matrix not provided. Skipping tissue specificity plot ... -[?25h[?25h-Command completed. Elapsed time: 0:00:20. Running peak memory: 1.358GB. - PID: 530529; Command: Rscript; Return code: 0; Memory used: 1.358GB - -These results exist for 'c557c915a9901ce377ef724806ff7a2c': bedfile, genome - -### Pipeline completed. Epilogue -* Elapsed time (this run): 0:00:20 -* Total elapsed time (all runs): 0:00:20 -* Peak memory (this run): 1.3577 GB -* Pipeline completed time: 2023-02-28 15:47:12 - -``` - -After plots and statistics were produced, we can look at them - - -```bash -ls test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c -``` - -```.output -sample1_chrombins.pdf sample1_neighbor_distances.png -sample1_chrombins.png sample1_paritions.pdf -sample1_cumulative_partitions.pdf sample1_paritions.png -sample1_cumulative_partitions.png sample1_plots.json -sample1_expected_partitions.pdf sample1_tssdist.pdf -sample1_expected_partitions.png sample1_tssdist.png -sample1.json sample1_widths_histogram.pdf -sample1_neighbor_distances.pdf sample1_widths_histogram.png - -``` - - -```bash -cat test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1.json -``` - -```.output -{ - "name": ["sample1"], - "regions_no": [300000], - "mean_region_width": [663.9], - "md5sum": ["c557c915a9901ce377ef724806ff7a2c"], - "median_TSS_dist": [48580], - "exon_frequency": [14871], - "exon_percentage": [0.0496], - "fiveUTR_frequency": [8981], - "fiveUTR_percentage": [0.0299], - "intergenic_frequency": [141763], - "intergenic_percentage": [0.4725], - "intron_frequency": [106638], - "intron_percentage": [0.3555], - "promoterCore_frequency": [10150], - "promoterCore_percentage": [0.0338], - "promoterProx_frequency": [6851], - "promoterProx_percentage": [0.0228], - "threeUTR_frequency": [10746], - "threeUTR_percentage": [0.0358] -} - -``` diff --git a/docs/bedboss/code/tutorial-all.md b/docs/bedboss/code/tutorial-all.md deleted file mode 100644 index 450b9f6..0000000 --- a/docs/bedboss/code/tutorial-all.md +++ /dev/null @@ -1,499 +0,0 @@ -jupyter:True -# Bedboss-all tutorial - -This tutorial is attended to show base exaple of using bedboss all function that inclueds all 3 pipelines: bedmake, bedqc and bedstat - -### 1. First let's create new working repository - - -```bash -mkdir all_tutorial ; cd all_tutorial -``` - -### 2. To run our pipelines we need to check if we have installed all dependencies. To do so we can run dependencies check script that can be found in docs. - - -```bash -wget -O req_test.sh https://raw.githubusercontent.com/bedbase/bedboss/68910f5142a95d92c27ef53eafb9c35599af2fbd/test/bash_requirements_test.sh -``` - -```.output ---2023-08-11 06:58:27-- https://raw.githubusercontent.com/bedbase/bedboss/68910f5142a95d92c27ef53eafb9c35599af2fbd/test/bash_requirements_test.sh -Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... -Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. -HTTP request sent, awaiting response... 200 OK -Length: 3927 (3.8K) [text/plain] -Saving to: ‘req_test.sh’ - -req_test.sh 100%[===================>] 3.83K --.-KB/s in 0.006s - -2023-08-11 06:58:28 (608 KB/s) - ‘req_test.sh’ saved [3927/3927] - - -``` - - -```bash -chmod u+x ./req_test.sh -``` - - -```bash -./req_test.sh -``` - -```.output ------------------------------------------------------------ - - bedboss installation check - ------------------------------------------------------------ -Checking native installation... -Language compilers... ------------------------------------------------------------ -✔ python is installed correctly -✔ R is installed correctly ------------------------------------------------------------ -Checking bedmaker dependencies... ------------------------------------------------------------ -✔ package bedboss @ file:///home/bnt4me/virginia/repos/bedbase_all/bedboss -✔ package refgenconf==0.12.2 -✔ bedToBigBed is installed correctly -⚠ WARNING: 'bigBedToBed' is not installed. To install 'bigBedToBed' check bedboss documentation: https://bedboss.databio.org/ -⚠ WARNING: 'bigWigToBedGraph' is not installed. To install 'bigWigToBedGraph' check bedboss documentation: https://bedboss.databio.org/ -⚠ WARNING: 'wigToBigWig' is not installed. To install 'wigToBigWig' check bedboss documentation: https://bedboss.databio.org/ ------------------------------------------------------------ -Checking required R packages for bedstat... ------------------------------------------------------------ -✔ SUCCESS: R package: optparse -✔ SUCCESS: R package: ensembldb -✔ SUCCESS: R package: ExperimentHub -✔ SUCCESS: R package: AnnotationHub -✔ SUCCESS: R package: AnnotationFilter -✔ SUCCESS: R package: BSgenome -✔ SUCCESS: R package: GenomicFeatures -✔ SUCCESS: R package: GenomicDistributions -✔ SUCCESS: R package: GenomicDistributionsData -✔ SUCCESS: R package: GenomeInfoDb -✔ SUCCESS: R package: ensembldb -✔ SUCCESS: R package: tools -✔ SUCCESS: R package: R.utils -✔ SUCCESS: R package: LOLA -Number of WARNINGS: 3 - -``` - -### 3. All requirements are installed, now lets run our pipeline - -To run pipeline, we need to provide few required arguments: -1. sample_name -2. input_file -3. input_type -4. outfolder -5. genome -6. bedbase_config - -If you don't have bedbase config file, or initialized bedbase db you can check documnetation how to do it: https://bedboss.databio.org/ - - -```bash -pip install bedboss==0.1.0a2 -``` - -```.output -Requirement already satisfied: bedboss==0.1.0a2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (0.1.0a2) -Requirement already satisfied: piper>=0.13.2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.13.2) -Requirement already satisfied: pandas>=1.5.3 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (2.0.3) -Requirement already satisfied: peppy>=0.35.7 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.35.7) -Requirement already satisfied: requests>=2.28.2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (2.28.2) -Requirement already satisfied: logmuse>=0.2.7 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.2.7) -Requirement already satisfied: yacman>=0.8.4 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.9.1) -Requirement already satisfied: refgenconf>=0.12.2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.12.2) -Requirement already satisfied: bbconf==0.4.0a1 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.4.0a1) -Requirement already satisfied: ubiquerg>=0.6.2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.6.2) -Requirement already satisfied: pipestat>=0.4.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bbconf==0.4.0a1->bedboss==0.1.0a2) (0.4.1) -Requirement already satisfied: sqlalchemy<2.0.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bbconf==0.4.0a1->bedboss==0.1.0a2) (1.4.41) -Requirement already satisfied: tzdata>=2022.1 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pandas>=1.5.3->bedboss==0.1.0a2) (2023.3) -Requirement already satisfied: pytz>=2020.1 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pandas>=1.5.3->bedboss==0.1.0a2) (2022.7.1) -Requirement already satisfied: python-dateutil>=2.8.2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pandas>=1.5.3->bedboss==0.1.0a2) (2.8.2) -Requirement already satisfied: numpy>=1.21.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pandas>=1.5.3->bedboss==0.1.0a2) (1.24.1) -Requirement already satisfied: attmap>=0.13.2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from peppy>=0.35.7->bedboss==0.1.0a2) (0.13.2) -Requirement already satisfied: pyyaml in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from peppy>=0.35.7->bedboss==0.1.0a2) (6.0) -Requirement already satisfied: rich>=10.3.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from peppy>=0.35.7->bedboss==0.1.0a2) (13.3.0) -Requirement already satisfied: psutil in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from piper>=0.13.2->bedboss==0.1.0a2) (5.9.4) -Requirement already satisfied: tqdm in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from refgenconf>=0.12.2->bedboss==0.1.0a2) (4.64.1) -Requirement already satisfied: pyfaidx in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from refgenconf>=0.12.2->bedboss==0.1.0a2) (0.7.1) -Requirement already satisfied: future in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from refgenconf>=0.12.2->bedboss==0.1.0a2) (0.18.3) -Requirement already satisfied: jsonschema>=3.0.1 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from refgenconf>=0.12.2->bedboss==0.1.0a2) (4.17.3) -Requirement already satisfied: charset-normalizer<4,>=2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from requests>=2.28.2->bedboss==0.1.0a2) (3.0.1) -Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from requests>=2.28.2->bedboss==0.1.0a2) (1.26.14) -Requirement already satisfied: idna<4,>=2.5 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from requests>=2.28.2->bedboss==0.1.0a2) (3.4) -Requirement already satisfied: certifi>=2017.4.17 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from requests>=2.28.2->bedboss==0.1.0a2) (2022.12.7) -Requirement already satisfied: oyaml in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from yacman>=0.8.4->bedboss==0.1.0a2) (1.0) -Requirement already satisfied: attrs>=17.4.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from jsonschema>=3.0.1->refgenconf>=0.12.2->bedboss==0.1.0a2) (22.2.0) -Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from jsonschema>=3.0.1->refgenconf>=0.12.2->bedboss==0.1.0a2) (0.19.3) -Requirement already satisfied: psycopg2-binary in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pipestat>=0.4.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (2.9.5) -Requirement already satisfied: eido in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pipestat>=0.4.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (0.2.1) -Requirement already satisfied: sqlmodel>=0.0.8 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pipestat>=0.4.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (0.0.8) -Requirement already satisfied: pydantic<2.0.0,>=1.10.7 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pipestat>=0.4.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (1.10.12) -Requirement already satisfied: six>=1.5 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas>=1.5.3->bedboss==0.1.0a2) (1.16.0) -Requirement already satisfied: markdown-it-py<3.0.0,>=2.1.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from rich>=10.3.0->peppy>=0.35.7->bedboss==0.1.0a2) (2.1.0) -Requirement already satisfied: pygments<3.0.0,>=2.14.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from rich>=10.3.0->peppy>=0.35.7->bedboss==0.1.0a2) (2.14.0) -Requirement already satisfied: greenlet!=0.4.17 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from sqlalchemy<2.0.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (2.0.1) -Requirement already satisfied: setuptools>=0.7 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pyfaidx->refgenconf>=0.12.2->bedboss==0.1.0a2) (65.5.1) -Requirement already satisfied: mdurl~=0.1 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from markdown-it-py<3.0.0,>=2.1.0->rich>=10.3.0->peppy>=0.35.7->bedboss==0.1.0a2) (0.1.2) -Requirement already satisfied: typing-extensions>=4.2.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pydantic<2.0.0,>=1.10.7->pipestat>=0.4.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (4.4.0) -Requirement already satisfied: sqlalchemy2-stubs in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from sqlmodel>=0.0.8->pipestat>=0.4.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (0.0.2a35) - -[notice] A new release of pip available: 22.3.1 -> 23.2.1 -[notice] To update, run: pip install --upgrade pip - -``` - - -```bash -bedboss all -``` - -```.output -usage: bedboss all [-h] --outfolder OUTFOLDER -s SAMPLE_NAME -f INPUT_FILE -t - INPUT_TYPE -g GENOME [-r RFG_CONFIG] - [--chrom-sizes CHROM_SIZES] [-n] [--standard-chrom] - [--check-qc] [--open-signal-matrix OPEN_SIGNAL_MATRIX] - [--ensdb ENSDB] --bedbase-config BEDBASE_CONFIG - [-y SAMPLE_YAML] [--no-db-commit] [--just-db-commit] -bedboss all: error: the following arguments are required: --outfolder, -s/--sample-name, -f/--input-file, -t/--input-type, -g/--genome, --bedbase-config - -``` - - - -Let's download sample file. Information about this file you can find here: https://pephub.databio.org/bedbase/GSE177859?tag=default - - -```bash -wget -O sample1.bed.gz ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5379nnn/GSM5379062/suppl/GSM5379062_ENCFF834LRN_peaks_GRCh38.bed.gz -``` - -```.output ---2023-08-11 07:12:28-- ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5379nnn/GSM5379062/suppl/GSM5379062_ENCFF834LRN_peaks_GRCh38.bed.gz - => ‘sample1.bed.gz’ -Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.12, 130.14.250.10, 2607:f220:41f:250::229, ... -Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.12|:21... connected. -Logging in as anonymous ... Logged in! -==> SYST ... done. ==> PWD ... done. -==> TYPE I ... done. ==> CWD (1) /geo/samples/GSM5379nnn/GSM5379062/suppl ... done. -==> SIZE GSM5379062_ENCFF834LRN_peaks_GRCh38.bed.gz ... 5470278 -==> PASV ... done. ==> RETR GSM5379062_ENCFF834LRN_peaks_GRCh38.bed.gz ... done. -Length: 5470278 (5.2M) (unauthoritative) - -GSM5379062_ENCFF834 100%[===================>] 9.76M 1008KB/s in 24s - -2023-08-11 07:12:55 (419 KB/s) - ‘sample1.bed.gz’ saved [10231006] - - -``` - - -```bash - -``` - -let's create bedbase config file: - - -```bash -cat bedbase_config_test.yaml -``` - -```.output -cat: bedbase_config_test.yaml: No such file or directory - -``` - - - -Now let's run bedboss: - - -```bash -bedboss all --sample-name tutorial_f1 \ ---input-file sample1.bed.gz \ ---input-type bed \ ---outfolder ./tutorial \ ---genome GRCh38 \ ---bedbase-config bedbase_config_test.yaml -``` - -```.output -Warning: You're running an interactive python session. This works, but pypiper cannot tee the output, so results are only logged to screen. -### Pipeline run code and environment: - -* Command: `/home/bnt4me/virginia/venv/jupyter/bin/bedboss all --sample-name tutorial_f1 --input-file sample1.bed.gz --input-type bed --outfolder ./tutorial --genome GRCh38 --bedbase-config bedbase_config_test.yaml` -* Compute host: bnt4me-Precision-5560 -* Working dir: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial -* Outfolder: ./tutorial/ -* Pipeline started at: (02-27 12:47:26) elapsed: 0.0 _TIME_ - -### Version log: - -* Python version: 3.10.6 -* Pypiper dir: `/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pypiper` -* Pypiper version: 0.12.3 -* Pipeline dir: `/home/bnt4me/virginia/venv/jupyter/bin` -* Pipeline version: None - -### Arguments passed to pipeline: - - ----------------------------------------- - -Unused arguments: {'command': 'all'} -Getting Open Signal Matrix file path... -output_bed = ./tutorial/bed_files/sample1.bed.gz -output_bigbed = ./tutorial/bigbed_files -Output directory does not exist. Creating: ./tutorial/bed_files -BigBed directory does not exist. Creating: ./tutorial/bigbed_files -bedmaker logs directory doesn't exist. Creating one... -Got input type: bed -Converting sample1.bed.gz to BED format. -Target to produce: `./tutorial/bed_files/sample1.bed.gz` - -> `cp sample1.bed.gz ./tutorial/bed_files/sample1.bed.gz` (434320) -
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. - PID: 434320; Command: cp; Return code: 0; Memory used: 0.0GB - -Running bedqc... -Unused arguments: {} -Target to produce: `./tutorial/bed_files/bedmaker_logs/tutorial_f1/rigumni8` - -> `zcat ./tutorial/bed_files/sample1.bed.gz > ./tutorial/bed_files/bedmaker_logs/tutorial_f1/rigumni8` (434322) -
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. - PID: 434322; Command: zcat; Return code: 0; Memory used: 0.003GB - -Targetless command, running... - -> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh ./tutorial/bed_files/bedmaker_logs/tutorial_f1/rigumni8 ` (434324) -
-236000-Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. - PID: 434324; Command: bash; Return code: 0; Memory used: 0.0GB - -File (./tutorial/bed_files/bedmaker_logs/tutorial_f1/rigumni8) has passed Quality Control! -Generating bigBed files for: sample1.bed.gz -Determining path to chrom.sizes asset via Refgenie. -Creating refgenie genome config file... -Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/genome_config.yaml -/home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes -Target to produce: `./tutorial/bigbed_files/vzxyqexz` - -> `zcat ./tutorial/bed_files/sample1.bed.gz | sort -k1,1 -k2,2n > ./tutorial/bigbed_files/vzxyqexz` (434335,434336) -
--Command completed. Elapsed time: 0:00:00. Running peak memory: 0.007GB. - PID: 434335; Command: zcat; Return code: 0; Memory used: 0.002GB - PID: 434336; Command: sort; Return code: 0; Memory used: 0.007GB - -Running: /home/bnt4me/virginia/repos/bedbase_all/bedboss/bedToBigBed -type=bed6+4 ./tutorial/bigbed_files/vzxyqexz /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes ./tutorial/bigbed_files/sample1.bigBed -Target to produce: `./tutorial/bigbed_files/sample1.bigBed` - -> `/home/bnt4me/virginia/repos/bedbase_all/bedboss/bedToBigBed -type=bed6+4 ./tutorial/bigbed_files/vzxyqexz /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes ./tutorial/bigbed_files/sample1.bigBed` (434338) -
-pass1 - making usageList (25 chroms): 27 millis -pass2 - checking and writing primary data (222016 records, 10 fields): 413 millis --Command completed. Elapsed time: 0:00:01. Running peak memory: 0.007GB. - PID: 434338; Command: /home/bnt4me/virginia/repos/bedbase_all/bedboss/bedToBigBed; Return code: 0; Memory used: 0.004GB - -Target to produce: `/home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1.json` - -> `Rscript /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedstat/tools/regionstat.R --bedfilePath=./tutorial/bed_files/sample1.bed.gz --fileId=sample1 --openSignalMatrix=./openSignalMatrix/openSignalMatrix_hg38_percentile99_01_quantNormalized_round4d.txt.gz --outputFolder=/home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5 --genome=hg38 --ensdb=None --digest=eb617f28e129c401be94069e0fdedbb5` (434343) -
-Loading required package: IRanges -Loading required package: BiocGenerics - -Attaching package: ‘BiocGenerics’ - -The following objects are masked from ‘package:stats’: - - IQR, mad, sd, var, xtabs - -The following objects are masked from ‘package:base’: - - anyDuplicated, append, as.data.frame, basename, cbind, colnames, - dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, - grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, - order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, - rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, - union, unique, unsplit, which.max, which.min - -Loading required package: S4Vectors -Loading required package: stats4 - -Attaching package: ‘S4Vectors’ - -The following objects are masked from ‘package:base’: - - expand.grid, I, unname - -Loading required package: GenomicRanges -Loading required package: GenomeInfoDb -[?25hsnapshotDate(): 2021-10-19 -[?25h[?25hLoading required package: GenomicFeatures -Loading required package: AnnotationDbi -Loading required package: Biobase -Welcome to Bioconductor - - Vignettes contain introductory material; view with - 'browseVignettes()'. To cite Bioconductor, see - 'citation("Biobase")', and for packages 'citation("pkgname")'. - -Loading required package: AnnotationFilter - -Attaching package: 'ensembldb' - -The following object is masked from 'package:stats': - - filter - -[?25h[?25h[?25hLoading required package: R.oo -Loading required package: R.methodsS3 -R.methodsS3 v1.8.2 (2022-06-13 22:00:14 UTC) successfully loaded. See ?R.methodsS3 for help. -R.oo v1.25.0 (2022-06-12 02:20:02 UTC) successfully loaded. See ?R.oo for help. - -Attaching package: 'R.oo' - -The following object is masked from 'package:R.methodsS3': - - throw - -The following object is masked from 'package:GenomicRanges': - - trim - -The following object is masked from 'package:IRanges': - - trim - -The following objects are masked from 'package:methods': - - getClasses, getMethods - -The following objects are masked from 'package:base': - - attach, detach, load, save - -R.utils v2.12.2 (2022-11-11 22:00:03 UTC) successfully loaded. See ?R.utils for help. - -Attaching package: 'R.utils' - -The following object is masked from 'package:utils': - - timestamp - -The following objects are masked from 'package:base': - - cat, commandArgs, getOption, isOpen, nullfile, parse, warnings - -[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25hsee ?GenomicDistributionsData and browseVignettes('GenomicDistributionsData') for documentation -loading from cache -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_tssdist" -Scale for x is already present. -Adding another scale for x, which will replace the existing scale. -[1] "Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_tssdist" -Successfully calculated and plot TSS distance. -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_chrombins" -see ?GenomicDistributionsData and browseVignettes('GenomicDistributionsData') for documentation -loading from cache -[1] "Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_chrombins" -Successfully calculated and plot chromosomes region distribution. -see ?GenomicDistributionsData and browseVignettes('GenomicDistributionsData') for documentation -loading from cache -Calculating overlaps... -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_paritions" -[1] "Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_paritions" -Successfully calculated and plot regions distribution over genomic partitions. -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_expected_partitions" -see ?GenomicDistributionsData and browseVignettes('GenomicDistributionsData') for documentation -loading from cache -see ?GenomicDistributionsData and browseVignettes('GenomicDistributionsData') for documentation -loading from cache -[1] "Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_expected_partitions" -Successfully calculated and plot expected distribution over genomic partitions. -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_cumulative_partitions" -see ?GenomicDistributionsData and browseVignettes('GenomicDistributionsData') for documentation -loading from cache -[1] "Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_cumulative_partitions" -Successfully calculated and plot cumulative distribution over genomic partitions. -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_widths_histogram" -[1] "Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_widths_histogram" -Successfully calculated and plot quantile-trimmed histogram of widths. -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_neighbor_distances" -[1] "Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_neighbor_distances" -Successfully calculated and plot distance between neighbor regions. -[1] "Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_open_chromatin" -[1] "Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_open_chromatin" -Successfully calculated and plot cell specific enrichment for open chromatin. -[?25h[?25h-Command completed. Elapsed time: 0:00:49. Running peak memory: 3.843GB. - PID: 434343; Command: Rscript; Return code: 0; Memory used: 3.843GB - -These results exist for 'eb617f28e129c401be94069e0fdedbb5': name, regions_no, mean_region_width, md5sum, bedfile, genome, bigbedfile, widths_histogram, neighbor_distances -Starting cleanup: 2 files; 0 conditional files for cleanup - -Cleaning up flagged intermediate files. . . - -### Pipeline completed. Epilogue -* Elapsed time (this run): 0:00:50 -* Total elapsed time (all runs): 0:00:50 -* Peak memory (this run): 3.8432 GB -* Pipeline completed time: 2023-02-27 12:48:16 - -``` - -Now let's check if all files where saved - - -```bash -ls tutorial/bed_files -``` - -```.output -bedmaker_logs sample1.bed.gz - -``` - - -```bash -ls tutorial/bigbed_files -``` - -```.output -sample1.bigBed - -``` - - -```bash -ls tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/ -``` - -```.output -sample1_chrombins.pdf sample1_open_chromatin.pdf -sample1_chrombins.png sample1_open_chromatin.png -sample1_cumulative_partitions.pdf sample1_paritions.pdf -sample1_cumulative_partitions.png sample1_paritions.png -sample1_expected_partitions.pdf sample1_plots.json -sample1_expected_partitions.png sample1_tssdist.pdf -sample1.json sample1_tssdist.png -sample1_neighbor_distances.pdf sample1_widths_histogram.pdf -sample1_neighbor_distances.png sample1_widths_histogram.png - -``` - -Everything was ran correctly:) diff --git a/docs/bedboss/how-to-install-requirements.md b/docs/bedboss/how-to-install-requirements.md index 24ac979..f6a5ce5 100644 --- a/docs/bedboss/how-to-install-requirements.md +++ b/docs/bedboss/how-to-install-requirements.md @@ -4,14 +4,12 @@ 1. Install R: https://cran.r-project.org/bin/linux/ubuntu/fullREADME.html 2. Download this script: [installRdeps.R](https://github.com/databio/bedboss/blob/dev/scripts/installRdeps.R) 3. Install dependencies by running this command in your terminal: ```Rscript installRdeps.R``` -4. Run `bedboss requirements-check` to check if everything was installed correctly. +4. Run `bedboss check-requirements` to check if everything was installed correctly. # How to install regionset conversion tools: -- bedToBigBed: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed -- bigBedToBed: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigBedToBed -- bigWigToBedGraph: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigWigToBedGraph -- wigToBigWig: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/wigToBigWig - - +- **bedToBigBed**: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed +- **bigBedToBed**: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigBedToBed +- **bigWigToBedGraph**: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigWigToBedGraph +- **wigToBigWig**: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/wigToBigWig diff --git a/docs/bedboss/notebooks/bedbase-tutorial.ipynb b/docs/bedboss/notebooks/bedbase-tutorial.ipynb deleted file mode 100644 index ac7cf8d..0000000 --- a/docs/bedboss/notebooks/bedbase-tutorial.ipynb +++ /dev/null @@ -1,3224 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# BEDBASE workflow tutorial\n", - "\n", - "This demo demonstrates how to process, analyze, visualize, and serve BED files. The process has 5 steps: First, the [bedmaker](https://github.com/databio/bedmaker) pipeline converts different region data files (bed, bedGraph, bigBed, bigWig, and wig) into BED format and generates bigBed format for each file for visualization in Genome Browser. An optional step, the [bedqc](https://github.com/databio/bedqc) pipline, flags the BED files that you might not want to include in the downstream analysis. Second, individual BED files are analyzed using the [bedstat](https://github.com/databio/bedstat) pipeline. Third, BED files are grouped and then analyzed as groups using the [bedbuncher](https://github.com/databio/bedbuncher) pipeline. Fourth, [bedembed](https://github.com/databio/bedembed) uses the StarSpace method to embed the bed files and the meta data, and the distances between the file labels and trained search terms will be calculated with cosine distance. Finally, the BED files, along with statistics, plots, and grouping information, is served via a web interface and RESTful API using the [bedhost](https://github.com/databio/bedhost) package.\n", - "\n", - "**Glossary of terms:**\n", - "\n", - "- *bedfile*: a tab-delimited file with one genomic region per line. Each genomic region is decribed by 3 required columns: chrom, start and end.\n", - "- *bedset*: a collection of BED files grouped by with a shared biological, experimental, or logical criterion.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "toc": true - }, - "source": [ - "
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. \n", - " PID: 24312;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.003GB\n", - "\n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db1/33xf84g5 `\n", - "File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db1/33xf84g5) has passed Quality Control!\n", - "Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE105587_ENCFF018NNF_conservative_idr_thresholded_peaks_GRCh38.bed.gz\n", - "Determining path to chrom.sizes asset via Refgenie.\n", - "Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml\n", - "/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml.\n", - "Initialize DBBackend\n", - "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring \"_pipeline_name\"\n", - " return create_model(\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py\", line 689, in _engine\n", - "Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml\n", - " return self.db_engine_key\n", - "AttributeError: 'DBBackend' object has no attribute 'db_engine_key'\n", - "\n", - "During handling of the above exception, another exception occurred:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/bin/bedboss\", line 8, in
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. \n", - " PID: 24344;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.0GB\n", - "\n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db2/lypwq5fe `\n", - "File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db2/lypwq5fe) has passed Quality Control!\n", - "Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE105977_ENCFF617QGK_optimal_idr_thresholded_peaks_GRCh38.bed.gz\n", - "Determining path to chrom.sizes asset via Refgenie.\n", - "Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml\n", - "/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml.\n", - "Initialize DBBackend\n", - "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring \"_pipeline_name\"\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " return create_model(\n", - "Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py\", line 689, in _engine\n", - " return self.db_engine_key\n", - "AttributeError: 'DBBackend' object has no attribute 'db_engine_key'\n", - "\n", - "During handling of the above exception, another exception occurred:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/bin/bedboss\", line 8, in
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. \n", - " PID: 24374;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.0GB\n", - "\n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db3/_5zvvg7p `\n", - "File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db3/_5zvvg7p) has passed Quality Control!\n", - "Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE105977_ENCFF793SZW_conservative_idr_thresholded_peaks_GRCh38.bed.gz\n", - "Determining path to chrom.sizes asset via Refgenie.\n", - "Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml\n", - "/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml.\n", - "Initialize DBBackend\n", - "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring \"_pipeline_name\"\n", - " return create_model(\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py\", line 689, in _engine\n", - "Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml\n", - " return self.db_engine_key\n", - "AttributeError: 'DBBackend' object has no attribute 'db_engine_key'\n", - "\n", - "During handling of the above exception, another exception occurred:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/bin/bedboss\", line 8, in
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. \n", - " PID: 24404;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.003GB\n", - "\n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db4/gig106fd `\n", - "File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db4/gig106fd) has passed Quality Control!\n", - "Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE105977_ENCFF937CGY_peaks_GRCh38.bed.gz\n", - "Determining path to chrom.sizes asset via Refgenie.\n", - "Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml\n", - "/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml.\n", - "Initialize DBBackend\n", - "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring \"_pipeline_name\"\n", - " return create_model(\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py\", line 689, in _engine\n", - "Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml\n", - " return self.db_engine_key\n", - "AttributeError: 'DBBackend' object has no attribute 'db_engine_key'\n", - "\n", - "During handling of the above exception, another exception occurred:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/bin/bedboss\", line 8, in
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. \n", - " PID: 24435;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.003GB\n", - "\n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db5/ix1s2r3k `\n", - "File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db5/ix1s2r3k) has passed Quality Control!\n", - "Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE91663_ENCFF316ASR_peaks_GRCh38.bed.gz\n", - "Determining path to chrom.sizes asset via Refgenie.\n", - "Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml\n", - "/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml.\n", - "Initialize DBBackend\n", - "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring \"_pipeline_name\"\n", - " return create_model(\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py\", line 689, in _engine\n", - "Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml\n", - " return self.db_engine_key\n", - "AttributeError: 'DBBackend' object has no attribute 'db_engine_key'\n", - "\n", - "During handling of the above exception, another exception occurred:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/bin/bedboss\", line 8, in
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. \n", - " PID: 24466;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.003GB\n", - "\n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db6/jrhj1l5n `\n", - "File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db6/jrhj1l5n) has passed Quality Control!\n", - "Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE91663_ENCFF319TPR_conservative_idr_thresholded_peaks_GRCh38.bed.gz\n", - "Determining path to chrom.sizes asset via Refgenie.\n", - "Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml\n", - "/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml.\n", - "Initialize DBBackend\n", - "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring \"_pipeline_name\"\n", - " return create_model(\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py\", line 689, in _engine\n", - "Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml\n", - " return self.db_engine_key\n", - "AttributeError: 'DBBackend' object has no attribute 'db_engine_key'\n", - "\n", - "During handling of the above exception, another exception occurred:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/bin/bedboss\", line 8, in
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. \n", - " PID: 24496;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.003GB\n", - "\n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db7/9r0q9410 `\n", - "File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db7/9r0q9410) has passed Quality Control!\n", - "Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSE91663_ENCFF553KIK_optimal_idr_thresholded_peaks_GRCh38.bed.gz\n", - "Determining path to chrom.sizes asset via Refgenie.\n", - "Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml\n", - "/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml.\n", - "Initialize DBBackend\n", - "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring \"_pipeline_name\"\n", - " return create_model(\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py\", line 689, in _engine\n", - "Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml\n", - " return self.db_engine_key\n", - "AttributeError: 'DBBackend' object has no attribute 'db_engine_key'\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "During handling of the above exception, another exception occurred:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/bin/bedboss\", line 8, in
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. \n", - " PID: 24527;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.003GB\n", - "\n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db8/ny2pxb01 `\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db8/ny2pxb01) has passed Quality Control!\n", - "Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSM2423312_ENCFF155HVK_peaks_GRCh38.bed.gz\n", - "Determining path to chrom.sizes asset via Refgenie.\n", - "Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml\n", - "/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml.\n", - "Initialize DBBackend\n", - "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring \"_pipeline_name\"\n", - " return create_model(\n", - "Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py\", line 689, in _engine\n", - " return self.db_engine_key\n", - "AttributeError: 'DBBackend' object has no attribute 'db_engine_key'\n", - "\n", - "During handling of the above exception, another exception occurred:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/bin/bedboss\", line 8, in
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. \n", - " PID: 24559;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.003GB\n", - "\n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedhost_demo_db9/h6i4w9_0 `\n", - "File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedhost_demo_db9/h6i4w9_0) has passed Quality Control!\n", - "Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSM2423313_ENCFF722AOG_peaks_GRCh38.bed.gz\n", - "Determining path to chrom.sizes asset via Refgenie.\n", - "Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml\n", - "/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml.\n", - "Initialize DBBackend\n", - "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring \"_pipeline_name\"\n", - " return create_model(\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py\", line 689, in _engine\n", - "Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml\n", - " return self.db_engine_key\n", - "AttributeError: 'DBBackend' object has no attribute 'db_engine_key'\n", - "\n", - "During handling of the above exception, another exception occurred:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/bin/bedboss\", line 8, in
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. \n", - " PID: 24590;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.003GB\n", - "\n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db10/l3b3cyqx `\n", - "File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db10/l3b3cyqx) has passed Quality Control!\n", - "Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSM2827349_ENCFF196DNQ_peaks_GRCh38.bed.gz\n", - "Determining path to chrom.sizes asset via Refgenie.\n", - "Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml\n", - "/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml.\n", - "Initialize DBBackend\n", - "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring \"_pipeline_name\"\n", - " return create_model(\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py\", line 689, in _engine\n", - "Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml\n", - " return self.db_engine_key\n", - "AttributeError: 'DBBackend' object has no attribute 'db_engine_key'\n", - "\n", - "During handling of the above exception, another exception occurred:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/bin/bedboss\", line 8, in
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. \n", - " PID: 24621;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.003GB\n", - "\n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db11/2pfkxwx0 `\n", - "File (/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/outputs/outputs/bedstat_output/bedstat_pipeline_logs/bed_files/bedmaker_logs/bedbase_demo_db11/2pfkxwx0) has passed Quality Control!\n", - "Generating bigBed files for: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/files/GSM2827350_ENCFF928JXU_peaks_GRCh38.bed.gz\n", - "Determining path to chrom.sizes asset via Refgenie.\n", - "Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/genome_config.yaml\n", - "/home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Config: /home/bnt4me/virginia/repos/bedbase_all/bedbase/docs_jupyter/bedbase_tutorial/bedbase/tutorial_files/bedboss/config_db_local.yaml.\n", - "Initialize DBBackend\n", - "/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/parsed_schema.py:284: RuntimeWarning: fields may not start with an underscore, ignoring \"_pipeline_name\"\n", - " return create_model(\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pipestat/backends/dbbackend.py\", line 689, in _engine\n", - "Using default schema: /home/bnt4me/virginia/venv/jupyter/bin/pipestat_output_schema.yaml\n", - " return self.db_engine_key\n", - "AttributeError: 'DBBackend' object has no attribute 'db_engine_key'\n", - "\n", - "During handling of the above exception, another exception occurred:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/home/bnt4me/virginia/venv/jupyter/bin/bedboss\", line 8, in
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. \n", - " PID: 2477650;\tCommand: cp;\tReturn code: 0;\tMemory used: 0.0GB\n", - "\n", - "\n", - "> `gzip ./bed/hg19_example1.bed ` (2477652)\n", - "
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. \n", - " PID: 2477652;\tCommand: gzip;\tReturn code: 0;\tMemory used: 0.0GB\n", - "\n", - "Running bedqc...\n", - "Target to produce: `./bed/bedmaker_logs/test_bed/xl67fcgi` \n", - "\n", - "> `zcat ./bed/hg19_example1.bed.gz > ./bed/bedmaker_logs/test_bed/xl67fcgi` (2477654)\n", - "
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. \n", - " PID: 2477654;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.0GB\n", - "\n", - "Targetless command, running... \n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh ./bed/bedmaker_logs/test_bed/xl67fcgi ` (2477656)\n", - "
\n", - "1000\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. \n", - " PID: 2477656;\tCommand: bash;\tReturn code: 0;\tMemory used: 0.0GB\n", - "\n", - "Starting cleanup: 1 files; 0 conditional files for cleanup\n", - "\n", - "Cleaning up flagged intermediate files. . .\n", - "\n", - "### Pipeline completed. Epilogue\n", - "* Elapsed time (this run): 0:00:00\n", - "* Total elapsed time (all runs): 0:00:00\n", - "* Peak memory (this run): 0 GB\n", - "* Pipeline completed time: 2023-02-08 15:39:09\n", - "Generating bigBed files for: ../test/data/bed/hg19/correct/hg19_example1.bed\n", - "Determining path to chrom.sizes asset via Refgenie.\n", - "Creating refgenie genome config file...\n", - "Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/genome_config.yaml\n", - "/home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/alias/hg19/fasta/default/hg19.chrom.sizes\n", - "Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/alias/hg19/fasta/default/hg19.chrom.sizes\n", - "Target to produce: `./bigbed/jckj3p1d` \n", - "\n", - "> `zcat ./bed/hg19_example1.bed.gz | sort -k1,1 -k2,2n > ./bigbed/jckj3p1d` (2477666,2477667)\n", - "
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. \n", - " PID: 2477666;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.0GB \n", - " PID: 2477667;\tCommand: sort;\tReturn code: 0;\tMemory used: 0.0GB\n", - "\n", - "Running: bedToBigBed -type=bed6+3 ./bigbed/jckj3p1d /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/alias/hg19/fasta/default/hg19.chrom.sizes ./bigbed/hg19_example1.bigBed\n", - "Target to produce: `./bigbed/hg19_example1.bigBed` \n", - "\n", - "> `bedToBigBed -type=bed6+3 ./bigbed/jckj3p1d /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/alias/hg19/fasta/default/hg19.chrom.sizes ./bigbed/hg19_example1.bigBed` (2477669)\n", - "
\n", - "pass1 - making usageList (1 chroms): 1 millis\n", - "pass2 - checking and writing primary data (175 records, 9 fields): 0 millis\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. \n", - " PID: 2477669;\tCommand: bedToBigBed;\tReturn code: 0;\tMemory used: 0.0GB\n", - "\n", - "Starting cleanup: 2 files; 0 conditional files for cleanup\n", - "\n", - "Cleaning up flagged intermediate files. . .\n", - "\n", - "### Pipeline completed. Epilogue\n", - "* Elapsed time (this run): 0:00:00\n", - "* Total elapsed time (all runs): 0:00:00\n", - "* Peak memory (this run): 0 GB\n", - "* Pipeline completed time: 2023-02-08 15:39:09\n" - ] - } - ], - "source": [ - " bedboss make --sample-name test_bed \\\n", - " --input-file ../test/data/bed/hg19/correct/hg19_example1.bed \\\n", - " --input-type bed \\\n", - " --genome hg19 \\\n", - " --output-bed ./bed \\\n", - " --output-bigbed ./bigbed \n" - ] - }, - { - "cell_type": "markdown", - "id": "6b175141", - "metadata": {}, - "source": [ - "### Let's check if bed file was created (or copied)" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "70ee37f5", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001B[0m\u001B[01;34mbedmaker_logs\u001B[0m \u001B[01;31mhg19_example1.bed.gz\u001B[0m\n" - ] - } - ], - "source": [ - "ls bed" - ] - }, - { - "cell_type": "markdown", - "id": "49f19d08", - "metadata": {}, - "source": [ - "### Let's check if bigbed file was created" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "cfd3c9f7", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "hg19_example1.bigBed\n" - ] - } - ], - "source": [ - "ls bigbed" - ] - }, - { - "cell_type": "markdown", - "id": "5c4837b0", - "metadata": {}, - "source": [ - "### everything was finished successfuly and files are ready for further analysis!" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Bash", - "language": "bash", - "name": "bash" - }, - "language_info": { - "codemirror_mode": "shell", - "file_extension": ".sh", - "mimetype": "text/x-sh", - "name": "bash" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/docs/bedboss/notebooks/bedqc-tutorial.ipynb b/docs/bedboss/notebooks/bedqc-tutorial.ipynb deleted file mode 100644 index 3935965..0000000 --- a/docs/bedboss/notebooks/bedqc-tutorial.ipynb +++ /dev/null @@ -1,124 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "c35a64ab", - "metadata": {}, - "source": [ - "# bedqc tutorial" - ] - }, - { - "cell_type": "markdown", - "id": "2b642ffb", - "metadata": {}, - "source": [ - "To check Quality of bed file use this command: `badboss qc`" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "b67214fe", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "usage: bedboss qc [-h] --bedfile BEDFILE --outfolder OUTFOLDER\n", - "\n", - "options:\n", - " -h, --help show this help message and exit\n", - " --bedfile BEDFILE a full path to bed file to process\n", - " --outfolder OUTFOLDER\n", - " a full path to output log folder.\n" - ] - } - ], - "source": [ - "bedboss qc --help" - ] - }, - { - "cell_type": "markdown", - "id": "eab75d79", - "metadata": {}, - "source": [ - "bedqc example:" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "1488b255", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Running bedqc...\n", - "### Pipeline run code and environment:\n", - "\n", - "* Command: `/home/bnt4me/virginia/venv/jupyter/bin/bedboss qc --bedfile ../test/data/bed/hg19/correct/hg19_example1.bed --outfolder .`\n", - "* Compute host: bnt4me-Precision-5560\n", - "* Working dir: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter\n", - "* Outfolder: ./\n", - "* Pipeline started at: (02-08 15:44:57) elapsed: 0.0 _TIME_\n", - "\n", - "### Version log:\n", - "\n", - "* Python version: 3.10.6\n", - "* Pypiper dir: `/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pypiper`\n", - "* Pypiper version: 0.12.3\n", - "* Pipeline dir: `/home/bnt4me/virginia/venv/jupyter/bin`\n", - "* Pipeline version: None\n", - "\n", - "### Arguments passed to pipeline:\n", - "\n", - "\n", - "----------------------------------------\n", - "\n", - "Target exists: `../test/data/bed/hg19/correct/hg19_example1.bed` \n", - "Targetless command, running... \n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh ../test/data/bed/hg19/correct/hg19_example1.bed ` (2478311)\n", - "
\n", - "1000\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. \n", - " PID: 2478311;\tCommand: bash;\tReturn code: 0;\tMemory used: 0.0GB\n", - "\n", - "Starting cleanup: 1 files; 0 conditional files for cleanup\n", - "\n", - "Cleaning up flagged intermediate files. . .\n", - "\n", - "### Pipeline completed. Epilogue\n", - "* Elapsed time (this run): 0:00:00\n", - "* Total elapsed time (all runs): 0:00:00\n", - "* Peak memory (this run): 0 GB\n", - "* Pipeline completed time: 2023-02-08 15:44:57\n" - ] - } - ], - "source": [ - "bedboss qc --bedfile ../test/data/bed/hg19/correct/hg19_example1.bed --outfolder ." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Bash", - "language": "bash", - "name": "bash" - }, - "language_info": { - "codemirror_mode": "shell", - "file_extension": ".sh", - "mimetype": "text/x-sh", - "name": "bash" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/docs/bedboss/notebooks/bedstat-tutorial.ipynb b/docs/bedboss/notebooks/bedstat-tutorial.ipynb deleted file mode 100644 index 60d448b..0000000 --- a/docs/bedboss/notebooks/bedstat-tutorial.ipynb +++ /dev/null @@ -1,528 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "c35a64ab", - "metadata": {}, - "source": [ - "# bedboss stat" - ] - }, - { - "cell_type": "markdown", - "id": "2b642ffb", - "metadata": {}, - "source": [ - "This tutorial is intended to introduce you to bedstat, pipeline that produces statistics and plots based on bed and bigbed files" - ] - }, - { - "cell_type": "markdown", - "id": "a5f49a8c", - "metadata": {}, - "source": [ - "### 1. Install all dependencies and initialize database for it" - ] - }, - { - "cell_type": "markdown", - "id": "7392c92e", - "metadata": {}, - "source": [ - "- Install dependecies: [How to install R dependencies](./how_to_install_r_dep/)\n", - "- Initialize database: [How to initialize database](./how_to_create_database/)\n", - "- Create config file: [How to create config file](./how_to_bedbase_config/)" - ] - }, - { - "cell_type": "markdown", - "id": "668c260f", - "metadata": {}, - "source": [ - "### 2. Create working repository" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "95ff14bf", - "metadata": {}, - "outputs": [], - "source": [ - "mkdir stat_tutorial ; cd stat_tutorial " - ] - }, - { - "cell_type": "markdown", - "id": "edbecd02", - "metadata": {}, - "source": [ - "Create config file by downloading it and configuring it" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "id": "1daff328", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "path:\n", - " pipeline_output_path: $BEDBOSS_OUTPUT_PATH # do not change it\n", - " bedstat_dir: bedstat_output\n", - " remote_url_base: null\n", - " bedbuncher_dir: bedbucher_output\n", - "database:\n", - " host: localhost\n", - " port: 5432\n", - " password: docker\n", - " user: postgres\n", - " name: pep-db\n", - " dialect: postgresql\n", - " driver: psycopg2\n", - "server:\n", - " host: 0.0.0.0\n", - " port: 8000\n", - "remotes:\n", - " http:\n", - " prefix: https://data.bedbase.org/\n", - " description: HTTP compatible path\n", - " s3:\n", - " prefix: s3://data.bedbase.org/\n", - " description: S3 compatible path\n" - ] - } - ], - "source": [ - "cat bedbase_config_test.yaml" - ] - }, - { - "cell_type": "markdown", - "id": "0ee154a8", - "metadata": {}, - "source": [ - "### 3. Download bed and bigbed files" - ] - }, - { - "cell_type": "markdown", - "id": "6010e161", - "metadata": {}, - "source": [ - "Bed file" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "53346258", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "--2023-02-28 15:32:57-- https://github.com/bedbase/bedboss/raw/dev/test/data/bed/hg19/correct/sample1.bed.gz\n", - "Resolving github.com (github.com)... 140.82.113.3\n", - "Connecting to github.com (github.com)|140.82.113.3|:443... connected.\n", - "HTTP request sent, awaiting response... 302 Found\n", - "Location: https://raw.githubusercontent.com/bedbase/bedboss/dev/test/data/bed/hg19/correct/sample1.bed.gz [following]\n", - "--2023-02-28 15:32:57-- https://raw.githubusercontent.com/bedbase/bedboss/dev/test/data/bed/hg19/correct/sample1.bed.gz\n", - "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...\n", - "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 7087126 (6.8M) [application/octet-stream]\n", - "Saving to: ‘sample1.bed.gz’\n", - "\n", - "sample1.bed.gz 100%[===================>] 6.76M --.-KB/s in 0.07s \n", - "\n", - "2023-02-28 15:32:58 (95.8 MB/s) - ‘sample1.bed.gz’ saved [7087126/7087126]\n", - "\n" - ] - } - ], - "source": [ - "wget -O sample1.bed.gz https://github.com/bedbase/bedboss/raw/dev/test/data/bed/hg19/correct/sample1.bed.gz\n" - ] - }, - { - "cell_type": "markdown", - "id": "6e933bd6", - "metadata": {}, - "source": [ - "BigBed file" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "8df43a61", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "--2023-02-28 15:33:00-- https://github.com/bedbase/bedboss/raw/dev/test/data/bigbed/hg19/correct/sample1.bigBed\n", - "Resolving github.com (github.com)... 140.82.113.3\n", - "Connecting to github.com (github.com)|140.82.113.3|:443... connected.\n", - "HTTP request sent, awaiting response... 302 Found\n", - "Location: https://raw.githubusercontent.com/bedbase/bedboss/dev/test/data/bigbed/hg19/correct/sample1.bigBed [following]\n", - "--2023-02-28 15:33:00-- https://raw.githubusercontent.com/bedbase/bedboss/dev/test/data/bigbed/hg19/correct/sample1.bigBed\n", - "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...\n", - "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 13092350 (12M) [application/octet-stream]\n", - "Saving to: ‘sample1.bigBed’\n", - "\n", - "sample1.bigBed 100%[===================>] 12.49M --.-KB/s in 0.1s \n", - "\n", - "2023-02-28 15:33:00 (101 MB/s) - ‘sample1.bigBed’ saved [13092350/13092350]\n", - "\n" - ] - } - ], - "source": [ - "wget -O sample1.bigBed https://github.com/bedbase/bedboss/raw/dev/test/data/bigbed/hg19/correct/sample1.bigBed\n" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "id": "540122c5", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "bedbase_config_test.yaml \u001B[0m\u001B[01;31msample1.bed.gz\u001B[0m sample1.bigBed\n" - ] - } - ], - "source": [ - "ls" - ] - }, - { - "cell_type": "markdown", - "id": "7e8e007a", - "metadata": {}, - "source": [ - "### 4. Run statistics:" - ] - }, - { - "cell_type": "markdown", - "id": "9a69ec14", - "metadata": {}, - "source": [ - "Additionally we need some metadata about files. 1) genome assembly, config file and know output folder." - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "id": "628234aa", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "usage: bedboss stat [-h] --bedfile BEDFILE --outfolder OUTFOLDER\n", - " [--open-signal-matrix OPEN_SIGNAL_MATRIX] [--ensdb ENSDB]\n", - " [--bigbed BIGBED] --bedbase-config BEDBASE_CONFIG\n", - " [-y SAMPLE_YAML] --genome GENOME_ASSEMBLY [--no-db-commit]\n", - " [--just-db-commit]\n", - "\n", - "options:\n", - " -h, --help show this help message and exit\n", - " --bedfile BEDFILE a full path to bed file to process [Required]\n", - " --outfolder OUTFOLDER\n", - " Pipeline output folder [Required]\n", - " --open-signal-matrix OPEN_SIGNAL_MATRIX\n", - " a full path to the openSignalMatrix required for the\n", - " tissue specificity plots\n", - " --ensdb ENSDB a full path to the ensdb gtf file required for genomes\n", - " not in GDdata\n", - " --bigbed BIGBED a full path to the bigbed files\n", - " --bedbase-config BEDBASE_CONFIG\n", - " a path to the bedbase configuration file [Required]\n", - " -y SAMPLE_YAML, --sample-yaml SAMPLE_YAML\n", - " a yaml config file with sample attributes to pass on\n", - " more metadata into the database\n", - " --genome GENOME_ASSEMBLY\n", - " genome assembly of the sample [Required]\n", - " --no-db-commit whether the JSON commit to the database should be\n", - " skipped\n", - " --just-db-commit whether just to commit the JSON to the database\n" - ] - } - ], - "source": [ - "bedboss stat --help" - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "id": "468f5508", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Warning: You're running an interactive python session. This works, but pypiper cannot tee the output, so results are only logged to screen.\n", - "### Pipeline run code and environment:\n", - "\n", - "* Command: `/home/bnt4me/virginia/venv/jupyter/bin/bedboss stat --bedfile ./sample1.bed.gz --bigbed ./sample1.bigBed --outfolder ./test_output --genome hg19 --bedbase-config ./bedbase_config_test.yaml`\n", - "* Compute host: bnt4me-Precision-5560\n", - "* Working dir: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial\n", - "* Outfolder: ./test_output/\n", - "* Pipeline started at: (02-28 15:46:52) elapsed: 0.0 _TIME_\n", - "\n", - "### Version log:\n", - "\n", - "* Python version: 3.10.6\n", - "* Pypiper dir: `/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pypiper`\n", - "* Pypiper version: 0.12.3\n", - "* Pipeline dir: `/home/bnt4me/virginia/venv/jupyter/bin`\n", - "* Pipeline version: 0.1.0-dev1\n", - "\n", - "### Arguments passed to pipeline:\n", - "\n", - "\n", - "----------------------------------------\n", - "\n", - "Target to produce: `/home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1.json` \n", - "\n", - "> `Rscript /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedstat/tools/regionstat.R --bedfilePath=./sample1.bed.gz --fileId=sample1 --openSignalMatrix=None --outputFolder=/home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c --genome=hg19 --ensdb=None --digest=c557c915a9901ce377ef724806ff7a2c` (530529)\n", - "
\n", - "Loading required package: IRanges\n", - "Loading required package: BiocGenerics\n", - "\n", - "Attaching package: ‘BiocGenerics’\n", - "\n", - "The following objects are masked from ‘package:stats’:\n", - "\n", - " IQR, mad, sd, var, xtabs\n", - "\n", - "The following objects are masked from ‘package:base’:\n", - "\n", - " anyDuplicated, append, as.data.frame, basename, cbind, colnames,\n", - " dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,\n", - " grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,\n", - " order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,\n", - " rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,\n", - " union, unique, unsplit, which.max, which.min\n", - "\n", - "Loading required package: S4Vectors\n", - "Loading required package: stats4\n", - "\n", - "Attaching package: ‘S4Vectors’\n", - "\n", - "The following objects are masked from ‘package:base’:\n", - "\n", - " expand.grid, I, unname\n", - "\n", - "Loading required package: GenomicRanges\n", - "Loading required package: GenomeInfoDb\n", - "\u001B[?25hsnapshotDate(): 2021-10-19\n", - "\u001B[?25h\u001B[?25hLoading required package: GenomicFeatures\n", - "Loading required package: AnnotationDbi\n", - "Loading required package: Biobase\n", - "Welcome to Bioconductor\n", - "\n", - " Vignettes contain introductory material; view with\n", - " 'browseVignettes()'. To cite Bioconductor, see\n", - " 'citation(\"Biobase\")', and for packages 'citation(\"pkgname\")'.\n", - "\n", - "Loading required package: AnnotationFilter\n", - "\n", - "Attaching package: 'ensembldb'\n", - "\n", - "The following object is masked from 'package:stats':\n", - "\n", - " filter\n", - "\n", - "\u001B[?25h\u001B[?25h\u001B[?25hLoading required package: R.oo\n", - "Loading required package: R.methodsS3\n", - "R.methodsS3 v1.8.2 (2022-06-13 22:00:14 UTC) successfully loaded. See ?R.methodsS3 for help.\n", - "R.oo v1.25.0 (2022-06-12 02:20:02 UTC) successfully loaded. See ?R.oo for help.\n", - "\n", - "Attaching package: 'R.oo'\n", - "\n", - "The following object is masked from 'package:R.methodsS3':\n", - "\n", - " throw\n", - "\n", - "The following object is masked from 'package:GenomicRanges':\n", - "\n", - " trim\n", - "\n", - "The following object is masked from 'package:IRanges':\n", - "\n", - " trim\n", - "\n", - "The following objects are masked from 'package:methods':\n", - "\n", - " getClasses, getMethods\n", - "\n", - "The following objects are masked from 'package:base':\n", - "\n", - " attach, detach, load, save\n", - "\n", - "R.utils v2.12.2 (2022-11-11 22:00:03 UTC) successfully loaded. See ?R.utils for help.\n", - "\n", - "Attaching package: 'R.utils'\n", - "\n", - "The following object is masked from 'package:utils':\n", - "\n", - " timestamp\n", - "\n", - "The following objects are masked from 'package:base':\n", - "\n", - " cat, commandArgs, getOption, isOpen, nullfile, parse, warnings\n", - "\n", - "\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h\u001B[?25h[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_tssdist\"\n", - "\u001B[1m\u001B[22mScale for \u001B[32mx\u001B[39m is already present.\n", - "Adding another scale for \u001B[32mx\u001B[39m, which will replace the existing scale.\n", - "[1] \"Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_tssdist\"\n", - "Successfully calculated and plot TSS distance.\n", - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_chrombins\"\n", - "[1] \"Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_chrombins\"\n", - "Successfully calculated and plot chromosomes region distribution.\n", - "Calculating overlaps...\n", - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_paritions\"\n", - "[1] \"Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_paritions\"\n", - "Successfully calculated and plot regions distribution over genomic partitions.\n", - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_expected_partitions\"\n", - "[1] \"Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_expected_partitions\"\n", - "Successfully calculated and plot expected distribution over genomic partitions.\n", - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_cumulative_partitions\"\n", - "[1] \"Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_cumulative_partitions\"\n", - "Successfully calculated and plot cumulative distribution over genomic partitions.\n", - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_widths_histogram\"\n", - "[1] \"Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_widths_histogram\"\n", - "Successfully calculated and plot quantile-trimmed histogram of widths.\n", - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/stat_tutorial/test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_neighbor_distances\"\n", - "[1] \"Writing plot json: output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1_neighbor_distances\"\n", - "Successfully calculated and plot distance between neighbor regions.\n", - "open signal matrix not provided. Skipping tissue specificity plot ... \n", - "\u001B[?25h\u001B[?25h\n", - "Command completed. Elapsed time: 0:00:20. Running peak memory: 1.358GB. \n", - " PID: 530529;\tCommand: Rscript;\tReturn code: 0;\tMemory used: 1.358GB\n", - "\n", - "These results exist for 'c557c915a9901ce377ef724806ff7a2c': bedfile, genome\n", - "\n", - "### Pipeline completed. Epilogue\n", - "* Elapsed time (this run): 0:00:20\n", - "* Total elapsed time (all runs): 0:00:20\n", - "* Peak memory (this run): 1.3577 GB\n", - "* Pipeline completed time: 2023-02-28 15:47:12\n" - ] - } - ], - "source": [ - "bedboss stat \\\n", - "--bedfile ./sample1.bed.gz \\\n", - "--bigbed ./sample1.bigBed \\\n", - "--outfolder ./test_output \\\n", - "--genome hg19 \\\n", - "--bedbase-config ./bedbase_config_test.yaml \n" - ] - }, - { - "cell_type": "markdown", - "id": "c745d9b1", - "metadata": {}, - "source": [ - "After plots and statistics were produced, we can look at them" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "id": "208bfa9b", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "sample1_chrombins.pdf \u001B[0m\u001B[01;35msample1_neighbor_distances.png\u001B[0m\n", - "\u001B[01;35msample1_chrombins.png\u001B[0m sample1_paritions.pdf\n", - "sample1_cumulative_partitions.pdf \u001B[01;35msample1_paritions.png\u001B[0m\n", - "\u001B[01;35msample1_cumulative_partitions.png\u001B[0m sample1_plots.json\n", - "sample1_expected_partitions.pdf sample1_tssdist.pdf\n", - "\u001B[01;35msample1_expected_partitions.png\u001B[0m \u001B[01;35msample1_tssdist.png\u001B[0m\n", - "sample1.json sample1_widths_histogram.pdf\n", - "sample1_neighbor_distances.pdf \u001B[01;35msample1_widths_histogram.png\u001B[0m\n" - ] - } - ], - "source": [ - "ls test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "id": "fe670243", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\n", - " \"name\": [\"sample1\"],\n", - " \"regions_no\": [300000],\n", - " \"mean_region_width\": [663.9],\n", - " \"md5sum\": [\"c557c915a9901ce377ef724806ff7a2c\"],\n", - " \"median_TSS_dist\": [48580],\n", - " \"exon_frequency\": [14871],\n", - " \"exon_percentage\": [0.0496],\n", - " \"fiveUTR_frequency\": [8981],\n", - " \"fiveUTR_percentage\": [0.0299],\n", - " \"intergenic_frequency\": [141763],\n", - " \"intergenic_percentage\": [0.4725],\n", - " \"intron_frequency\": [106638],\n", - " \"intron_percentage\": [0.3555],\n", - " \"promoterCore_frequency\": [10150],\n", - " \"promoterCore_percentage\": [0.0338],\n", - " \"promoterProx_frequency\": [6851],\n", - " \"promoterProx_percentage\": [0.0228],\n", - " \"threeUTR_frequency\": [10746],\n", - " \"threeUTR_percentage\": [0.0358]\n", - "}\n" - ] - } - ], - "source": [ - "cat test_output/output/bedstat_output/c557c915a9901ce377ef724806ff7a2c/sample1.json" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Bash", - "language": "bash", - "name": "bash" - }, - "language_info": { - "codemirror_mode": "shell", - "file_extension": ".sh", - "mimetype": "text/x-sh", - "name": "bash" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/docs/bedboss/notebooks/tutorial-all.ipynb b/docs/bedboss/notebooks/tutorial-all.ipynb deleted file mode 100644 index 823e9fc..0000000 --- a/docs/bedboss/notebooks/tutorial-all.ipynb +++ /dev/null @@ -1,691 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "5ed57409", - "metadata": {}, - "source": [ - "# Bedboss-all tutorial" - ] - }, - { - "cell_type": "markdown", - "id": "e9e494b7", - "metadata": {}, - "source": [ - "This tutorial is attended to show base exaple of using bedboss all function that inclueds all 3 pipelines: bedmake, bedqc and bedstat" - ] - }, - { - "cell_type": "markdown", - "id": "3169c5cf", - "metadata": {}, - "source": [ - "### 1. First let's create new working repository" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "b089c6f1", - "metadata": {}, - "outputs": [], - "source": [ - "mkdir all_tutorial ; cd all_tutorial " - ] - }, - { - "cell_type": "markdown", - "id": "ecf10dee", - "metadata": {}, - "source": [ - "### 2. To run our pipelines we need to check if we have installed all dependencies. To do so we can run dependencies check script that can be found in docs." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "221c24cb", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "--2023-08-11 06:58:27-- https://raw.githubusercontent.com/bedbase/bedboss/68910f5142a95d92c27ef53eafb9c35599af2fbd/test/bash_requirements_test.sh\n", - "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n", - "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 3927 (3.8K) [text/plain]\n", - "Saving to: ‘req_test.sh’\n", - "\n", - "req_test.sh 100%[===================>] 3.83K --.-KB/s in 0.006s \n", - "\n", - "2023-08-11 06:58:28 (608 KB/s) - ‘req_test.sh’ saved [3927/3927]\n", - "\n" - ] - } - ], - "source": [ - "wget -O req_test.sh https://raw.githubusercontent.com/bedbase/bedboss/68910f5142a95d92c27ef53eafb9c35599af2fbd/test/bash_requirements_test.sh" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "32c7757a", - "metadata": {}, - "outputs": [], - "source": [ - "chmod u+x ./req_test.sh" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "c4df6265", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "-----------------------------------------------------------\n", - " \n", - " bedboss installation check \n", - " \n", - "-----------------------------------------------------------\n", - "Checking native installation... \n", - "Language compilers... \n", - "-----------------------------------------------------------\n", - "\u001b[0;32m✔ python is installed correctly\u001b[0m\n", - "\u001b[0;32m✔ R is installed correctly\u001b[0m\n", - "-----------------------------------------------------------\n", - "Checking bedmaker dependencies... \n", - "-----------------------------------------------------------\n", - "\u001b[0;32m✔ package bedboss @ file:///home/bnt4me/virginia/repos/bedbase_all/bedboss\u001b[0m\n", - "\u001b[0;32m✔ package refgenconf==0.12.2\u001b[0m\n", - "\u001b[0;32m✔ bedToBigBed is installed correctly\u001b[0m\n", - "\u001b[0;33m⚠ WARNING: 'bigBedToBed' is not installed. To install 'bigBedToBed' check bedboss documentation: https://bedboss.databio.org/\u001b[0m\n", - "\u001b[0;33m⚠ WARNING: 'bigWigToBedGraph' is not installed. To install 'bigWigToBedGraph' check bedboss documentation: https://bedboss.databio.org/\u001b[0m\n", - "\u001b[0;33m⚠ WARNING: 'wigToBigWig' is not installed. To install 'wigToBigWig' check bedboss documentation: https://bedboss.databio.org/\u001b[0m\n", - "-----------------------------------------------------------\n", - "Checking required R packages for bedstat... \n", - "-----------------------------------------------------------\n", - "\u001b[0;32m✔ SUCCESS: R package: optparse\u001b[0m\n", - "\u001b[0;32m✔ SUCCESS: R package: ensembldb\u001b[0m\n", - "\u001b[0;32m✔ SUCCESS: R package: ExperimentHub\u001b[0m\n", - "\u001b[0;32m✔ SUCCESS: R package: AnnotationHub\u001b[0m\n", - "\u001b[0;32m✔ SUCCESS: R package: AnnotationFilter\u001b[0m\n", - "\u001b[0;32m✔ SUCCESS: R package: BSgenome\u001b[0m\n", - "\u001b[0;32m✔ SUCCESS: R package: GenomicFeatures\u001b[0m\n", - "\u001b[0;32m✔ SUCCESS: R package: GenomicDistributions\u001b[0m\n", - "\u001b[0;32m✔ SUCCESS: R package: GenomicDistributionsData\u001b[0m\n", - "\u001b[0;32m✔ SUCCESS: R package: GenomeInfoDb\u001b[0m\n", - "\u001b[0;32m✔ SUCCESS: R package: ensembldb\u001b[0m\n", - "\u001b[0;32m✔ SUCCESS: R package: tools\u001b[0m\n", - "\u001b[0;32m✔ SUCCESS: R package: R.utils\u001b[0m\n", - "\u001b[0;32m✔ SUCCESS: R package: LOLA\u001b[0m\n", - "Number of WARNINGS: 3\n" - ] - } - ], - "source": [ - "./req_test.sh" - ] - }, - { - "cell_type": "markdown", - "id": "44aa2dcd", - "metadata": {}, - "source": [ - "### 3. All requirements are installed, now lets run our pipeline" - ] - }, - { - "cell_type": "markdown", - "id": "50549ec4", - "metadata": {}, - "source": [ - "To run pipeline, we need to provide few required arguments:\n", - "1. sample_name\n", - "2. input_file\n", - "3. input_type\n", - "4. outfolder\n", - "5. genome\n", - "6. bedbase_config\n", - "\n", - "If you don't have bedbase config file, or initialized bedbase db you can check documnetation how to do it: https://bedboss.databio.org/" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "id": "b71f7610", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Requirement already satisfied: bedboss==0.1.0a2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (0.1.0a2)\n", - "Requirement already satisfied: piper>=0.13.2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.13.2)\n", - "Requirement already satisfied: pandas>=1.5.3 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (2.0.3)\n", - "Requirement already satisfied: peppy>=0.35.7 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.35.7)\n", - "Requirement already satisfied: requests>=2.28.2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (2.28.2)\n", - "Requirement already satisfied: logmuse>=0.2.7 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.2.7)\n", - "Requirement already satisfied: yacman>=0.8.4 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.9.1)\n", - "Requirement already satisfied: refgenconf>=0.12.2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.12.2)\n", - "Requirement already satisfied: bbconf==0.4.0a1 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.4.0a1)\n", - "Requirement already satisfied: ubiquerg>=0.6.2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bedboss==0.1.0a2) (0.6.2)\n", - "Requirement already satisfied: pipestat>=0.4.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bbconf==0.4.0a1->bedboss==0.1.0a2) (0.4.1)\n", - "Requirement already satisfied: sqlalchemy<2.0.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from bbconf==0.4.0a1->bedboss==0.1.0a2) (1.4.41)\n", - "Requirement already satisfied: tzdata>=2022.1 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pandas>=1.5.3->bedboss==0.1.0a2) (2023.3)\n", - "Requirement already satisfied: pytz>=2020.1 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pandas>=1.5.3->bedboss==0.1.0a2) (2022.7.1)\n", - "Requirement already satisfied: python-dateutil>=2.8.2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pandas>=1.5.3->bedboss==0.1.0a2) (2.8.2)\n", - "Requirement already satisfied: numpy>=1.21.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pandas>=1.5.3->bedboss==0.1.0a2) (1.24.1)\n", - "Requirement already satisfied: attmap>=0.13.2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from peppy>=0.35.7->bedboss==0.1.0a2) (0.13.2)\n", - "Requirement already satisfied: pyyaml in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from peppy>=0.35.7->bedboss==0.1.0a2) (6.0)\n", - "Requirement already satisfied: rich>=10.3.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from peppy>=0.35.7->bedboss==0.1.0a2) (13.3.0)\n", - "Requirement already satisfied: psutil in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from piper>=0.13.2->bedboss==0.1.0a2) (5.9.4)\n", - "Requirement already satisfied: tqdm in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from refgenconf>=0.12.2->bedboss==0.1.0a2) (4.64.1)\n", - "Requirement already satisfied: pyfaidx in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from refgenconf>=0.12.2->bedboss==0.1.0a2) (0.7.1)\n", - "Requirement already satisfied: future in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from refgenconf>=0.12.2->bedboss==0.1.0a2) (0.18.3)\n", - "Requirement already satisfied: jsonschema>=3.0.1 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from refgenconf>=0.12.2->bedboss==0.1.0a2) (4.17.3)\n", - "Requirement already satisfied: charset-normalizer<4,>=2 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from requests>=2.28.2->bedboss==0.1.0a2) (3.0.1)\n", - "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from requests>=2.28.2->bedboss==0.1.0a2) (1.26.14)\n", - "Requirement already satisfied: idna<4,>=2.5 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from requests>=2.28.2->bedboss==0.1.0a2) (3.4)\n", - "Requirement already satisfied: certifi>=2017.4.17 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from requests>=2.28.2->bedboss==0.1.0a2) (2022.12.7)\n", - "Requirement already satisfied: oyaml in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from yacman>=0.8.4->bedboss==0.1.0a2) (1.0)\n", - "Requirement already satisfied: attrs>=17.4.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from jsonschema>=3.0.1->refgenconf>=0.12.2->bedboss==0.1.0a2) (22.2.0)\n", - "Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from jsonschema>=3.0.1->refgenconf>=0.12.2->bedboss==0.1.0a2) (0.19.3)\n", - "Requirement already satisfied: psycopg2-binary in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pipestat>=0.4.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (2.9.5)\n", - "Requirement already satisfied: eido in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pipestat>=0.4.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (0.2.1)\n", - "Requirement already satisfied: sqlmodel>=0.0.8 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pipestat>=0.4.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (0.0.8)\n", - "Requirement already satisfied: pydantic<2.0.0,>=1.10.7 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pipestat>=0.4.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (1.10.12)\n", - "Requirement already satisfied: six>=1.5 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas>=1.5.3->bedboss==0.1.0a2) (1.16.0)\n", - "Requirement already satisfied: markdown-it-py<3.0.0,>=2.1.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from rich>=10.3.0->peppy>=0.35.7->bedboss==0.1.0a2) (2.1.0)\n", - "Requirement already satisfied: pygments<3.0.0,>=2.14.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from rich>=10.3.0->peppy>=0.35.7->bedboss==0.1.0a2) (2.14.0)\n", - "Requirement already satisfied: greenlet!=0.4.17 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from sqlalchemy<2.0.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (2.0.1)\n", - "Requirement already satisfied: setuptools>=0.7 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pyfaidx->refgenconf>=0.12.2->bedboss==0.1.0a2) (65.5.1)\n", - "Requirement already satisfied: mdurl~=0.1 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from markdown-it-py<3.0.0,>=2.1.0->rich>=10.3.0->peppy>=0.35.7->bedboss==0.1.0a2) (0.1.2)\n", - "Requirement already satisfied: typing-extensions>=4.2.0 in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from pydantic<2.0.0,>=1.10.7->pipestat>=0.4.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (4.4.0)\n", - "Requirement already satisfied: sqlalchemy2-stubs in /home/bnt4me/virginia/venv/bedboss/lib/python3.10/site-packages (from sqlmodel>=0.0.8->pipestat>=0.4.0->bbconf==0.4.0a1->bedboss==0.1.0a2) (0.0.2a35)\n", - "\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.2.1\u001b[0m\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n" - ] - } - ], - "source": [ - "pip install bedboss==0.1.0a2" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "id": "627ee6a3", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "usage: bedboss all [-h] --outfolder OUTFOLDER -s SAMPLE_NAME -f INPUT_FILE -t\n", - " INPUT_TYPE -g GENOME [-r RFG_CONFIG]\n", - " [--chrom-sizes CHROM_SIZES] [-n] [--standard-chrom]\n", - " [--check-qc] [--open-signal-matrix OPEN_SIGNAL_MATRIX]\n", - " [--ensdb ENSDB] --bedbase-config BEDBASE_CONFIG\n", - " [-y SAMPLE_YAML] [--no-db-commit] [--just-db-commit]\n", - "bedboss all: error: the following arguments are required: --outfolder, -s/--sample-name, -f/--input-file, -t/--input-type, -g/--genome, --bedbase-config\n" - ] - }, - { - "ename": "", - "evalue": "2", - "output_type": "error", - "traceback": [] - } - ], - "source": [ - "bedboss all" - ] - }, - { - "cell_type": "markdown", - "id": "e9a7acf1", - "metadata": {}, - "source": [ - "Let's download sample file. Information about this file you can find here: https://pephub.databio.org/bedbase/GSE177859?tag=default" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "id": "107b36af", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "--2023-08-11 07:12:28-- ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5379nnn/GSM5379062/suppl/GSM5379062_ENCFF834LRN_peaks_GRCh38.bed.gz\n", - " => ‘sample1.bed.gz’\n", - "Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.12, 130.14.250.10, 2607:f220:41f:250::229, ...\n", - "Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.12|:21... connected.\n", - "Logging in as anonymous ... Logged in!\n", - "==> SYST ... done. ==> PWD ... done.\n", - "==> TYPE I ... done. ==> CWD (1) /geo/samples/GSM5379nnn/GSM5379062/suppl ... done.\n", - "==> SIZE GSM5379062_ENCFF834LRN_peaks_GRCh38.bed.gz ... 5470278\n", - "==> PASV ... done. ==> RETR GSM5379062_ENCFF834LRN_peaks_GRCh38.bed.gz ... done.\n", - "Length: 5470278 (5.2M) (unauthoritative)\n", - "\n", - "GSM5379062_ENCFF834 100%[===================>] 9.76M 1008KB/s in 24s \n", - "\n", - "2023-08-11 07:12:55 (419 KB/s) - ‘sample1.bed.gz’ saved [10231006]\n", - "\n" - ] - } - ], - "source": [ - "wget -O sample1.bed.gz ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5379nnn/GSM5379062/suppl/GSM5379062_ENCFF834LRN_peaks_GRCh38.bed.gz" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "6d961bcd", - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "id": "c873a853", - "metadata": {}, - "source": [ - "let's create bedbase config file:" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "id": "127df991", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cat: bedbase_config_test.yaml: No such file or directory\n" - ] - }, - { - "ename": "", - "evalue": "1", - "output_type": "error", - "traceback": [] - } - ], - "source": [ - "cat bedbase_config_test.yaml" - ] - }, - { - "cell_type": "markdown", - "id": "45d79641", - "metadata": {}, - "source": [ - "Now let's run bedboss:" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "id": "0daa1402", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Warning: You're running an interactive python session. This works, but pypiper cannot tee the output, so results are only logged to screen.\n", - "### Pipeline run code and environment:\n", - "\n", - "* Command: `/home/bnt4me/virginia/venv/jupyter/bin/bedboss all --sample-name tutorial_f1 --input-file sample1.bed.gz --input-type bed --outfolder ./tutorial --genome GRCh38 --bedbase-config bedbase_config_test.yaml`\n", - "* Compute host: bnt4me-Precision-5560\n", - "* Working dir: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial\n", - "* Outfolder: ./tutorial/\n", - "* Pipeline started at: (02-27 12:47:26) elapsed: 0.0 _TIME_\n", - "\n", - "### Version log:\n", - "\n", - "* Python version: 3.10.6\n", - "* Pypiper dir: `/home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/pypiper`\n", - "* Pypiper version: 0.12.3\n", - "* Pipeline dir: `/home/bnt4me/virginia/venv/jupyter/bin`\n", - "* Pipeline version: None\n", - "\n", - "### Arguments passed to pipeline:\n", - "\n", - "\n", - "----------------------------------------\n", - "\n", - "Unused arguments: {'command': 'all'}\n", - "Getting Open Signal Matrix file path...\n", - "output_bed = ./tutorial/bed_files/sample1.bed.gz\n", - "output_bigbed = ./tutorial/bigbed_files\n", - "Output directory does not exist. Creating: ./tutorial/bed_files\n", - "BigBed directory does not exist. Creating: ./tutorial/bigbed_files\n", - "bedmaker logs directory doesn't exist. Creating one...\n", - "Got input type: bed\n", - "Converting sample1.bed.gz to BED format.\n", - "Target to produce: `./tutorial/bed_files/sample1.bed.gz` \n", - "\n", - "> `cp sample1.bed.gz ./tutorial/bed_files/sample1.bed.gz` (434320)\n", - "
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB. \n", - " PID: 434320;\tCommand: cp;\tReturn code: 0;\tMemory used: 0.0GB\n", - "\n", - "Running bedqc...\n", - "Unused arguments: {}\n", - "Target to produce: `./tutorial/bed_files/bedmaker_logs/tutorial_f1/rigumni8` \n", - "\n", - "> `zcat ./tutorial/bed_files/sample1.bed.gz > ./tutorial/bed_files/bedmaker_logs/tutorial_f1/rigumni8` (434322)\n", - "
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. \n", - " PID: 434322;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.003GB\n", - "\n", - "Targetless command, running... \n", - "\n", - "> `bash /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedqc/est_line.sh ./tutorial/bed_files/bedmaker_logs/tutorial_f1/rigumni8 ` (434324)\n", - "
\n", - "236000\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB. \n", - " PID: 434324;\tCommand: bash;\tReturn code: 0;\tMemory used: 0.0GB\n", - "\n", - "File (./tutorial/bed_files/bedmaker_logs/tutorial_f1/rigumni8) has passed Quality Control!\n", - "Generating bigBed files for: sample1.bed.gz\n", - "Determining path to chrom.sizes asset via Refgenie.\n", - "Creating refgenie genome config file...\n", - "Reading refgenie genome configuration file from file: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/genome_config.yaml\n", - "/home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Determined path to chrom.sizes asset: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes\n", - "Target to produce: `./tutorial/bigbed_files/vzxyqexz` \n", - "\n", - "> `zcat ./tutorial/bed_files/sample1.bed.gz | sort -k1,1 -k2,2n > ./tutorial/bigbed_files/vzxyqexz` (434335,434336)\n", - "
\n", - "\n", - "Command completed. Elapsed time: 0:00:00. Running peak memory: 0.007GB. \n", - " PID: 434335;\tCommand: zcat;\tReturn code: 0;\tMemory used: 0.002GB \n", - " PID: 434336;\tCommand: sort;\tReturn code: 0;\tMemory used: 0.007GB\n", - "\n", - "Running: /home/bnt4me/virginia/repos/bedbase_all/bedboss/bedToBigBed -type=bed6+4 ./tutorial/bigbed_files/vzxyqexz /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes ./tutorial/bigbed_files/sample1.bigBed\n", - "Target to produce: `./tutorial/bigbed_files/sample1.bigBed` \n", - "\n", - "> `/home/bnt4me/virginia/repos/bedbase_all/bedboss/bedToBigBed -type=bed6+4 ./tutorial/bigbed_files/vzxyqexz /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/alias/hg38/fasta/default/hg38.chrom.sizes ./tutorial/bigbed_files/sample1.bigBed` (434338)\n", - "
\n", - "pass1 - making usageList (25 chroms): 27 millis\n", - "pass2 - checking and writing primary data (222016 records, 10 fields): 413 millis\n", - "\n", - "Command completed. Elapsed time: 0:00:01. Running peak memory: 0.007GB. \n", - " PID: 434338;\tCommand: /home/bnt4me/virginia/repos/bedbase_all/bedboss/bedToBigBed;\tReturn code: 0;\tMemory used: 0.004GB\n", - "\n", - "Target to produce: `/home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1.json` \n", - "\n", - "> `Rscript /home/bnt4me/virginia/venv/jupyter/lib/python3.10/site-packages/bedboss/bedstat/tools/regionstat.R --bedfilePath=./tutorial/bed_files/sample1.bed.gz --fileId=sample1 --openSignalMatrix=./openSignalMatrix/openSignalMatrix_hg38_percentile99_01_quantNormalized_round4d.txt.gz --outputFolder=/home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5 --genome=hg38 --ensdb=None --digest=eb617f28e129c401be94069e0fdedbb5` (434343)\n", - "
\n", - "Loading required package: IRanges\n", - "Loading required package: BiocGenerics\n", - "\n", - "Attaching package: ‘BiocGenerics’\n", - "\n", - "The following objects are masked from ‘package:stats’:\n", - "\n", - " IQR, mad, sd, var, xtabs\n", - "\n", - "The following objects are masked from ‘package:base’:\n", - "\n", - " anyDuplicated, append, as.data.frame, basename, cbind, colnames,\n", - " dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,\n", - " grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,\n", - " order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,\n", - " rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,\n", - " union, unique, unsplit, which.max, which.min\n", - "\n", - "Loading required package: S4Vectors\n", - "Loading required package: stats4\n", - "\n", - "Attaching package: ‘S4Vectors’\n", - "\n", - "The following objects are masked from ‘package:base’:\n", - "\n", - " expand.grid, I, unname\n", - "\n", - "Loading required package: GenomicRanges\n", - "Loading required package: GenomeInfoDb\n", - "\u001b[?25hsnapshotDate(): 2021-10-19\n", - "\u001b[?25h\u001b[?25hLoading required package: GenomicFeatures\n", - "Loading required package: AnnotationDbi\n", - "Loading required package: Biobase\n", - "Welcome to Bioconductor\n", - "\n", - " Vignettes contain introductory material; view with\n", - " 'browseVignettes()'. To cite Bioconductor, see\n", - " 'citation(\"Biobase\")', and for packages 'citation(\"pkgname\")'.\n", - "\n", - "Loading required package: AnnotationFilter\n", - "\n", - "Attaching package: 'ensembldb'\n", - "\n", - "The following object is masked from 'package:stats':\n", - "\n", - " filter\n", - "\n", - "\u001b[?25h\u001b[?25h\u001b[?25hLoading required package: R.oo\n", - "Loading required package: R.methodsS3\n", - "R.methodsS3 v1.8.2 (2022-06-13 22:00:14 UTC) successfully loaded. See ?R.methodsS3 for help.\n", - "R.oo v1.25.0 (2022-06-12 02:20:02 UTC) successfully loaded. See ?R.oo for help.\n", - "\n", - "Attaching package: 'R.oo'\n", - "\n", - "The following object is masked from 'package:R.methodsS3':\n", - "\n", - " throw\n", - "\n", - "The following object is masked from 'package:GenomicRanges':\n", - "\n", - " trim\n", - "\n", - "The following object is masked from 'package:IRanges':\n", - "\n", - " trim\n", - "\n", - "The following objects are masked from 'package:methods':\n", - "\n", - " getClasses, getMethods\n", - "\n", - "The following objects are masked from 'package:base':\n", - "\n", - " attach, detach, load, save\n", - "\n", - "R.utils v2.12.2 (2022-11-11 22:00:03 UTC) successfully loaded. See ?R.utils for help.\n", - "\n", - "Attaching package: 'R.utils'\n", - "\n", - "The following object is masked from 'package:utils':\n", - "\n", - " timestamp\n", - "\n", - "The following objects are masked from 'package:base':\n", - "\n", - " cat, commandArgs, getOption, isOpen, nullfile, parse, warnings\n", - "\n", - "\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25h\u001b[?25hsee ?GenomicDistributionsData and browseVignettes('GenomicDistributionsData') for documentation\n", - "loading from cache\n", - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_tssdist\"\n", - "\u001b[1m\u001b[22mScale for \u001b[32mx\u001b[39m is already present.\n", - "Adding another scale for \u001b[32mx\u001b[39m, which will replace the existing scale.\n", - "[1] \"Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_tssdist\"\n", - "Successfully calculated and plot TSS distance.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_chrombins\"\n", - "see ?GenomicDistributionsData and browseVignettes('GenomicDistributionsData') for documentation\n", - "loading from cache\n", - "[1] \"Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_chrombins\"\n", - "Successfully calculated and plot chromosomes region distribution.\n", - "see ?GenomicDistributionsData and browseVignettes('GenomicDistributionsData') for documentation\n", - "loading from cache\n", - "Calculating overlaps...\n", - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_paritions\"\n", - "[1] \"Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_paritions\"\n", - "Successfully calculated and plot regions distribution over genomic partitions.\n", - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_expected_partitions\"\n", - "see ?GenomicDistributionsData and browseVignettes('GenomicDistributionsData') for documentation\n", - "loading from cache\n", - "see ?GenomicDistributionsData and browseVignettes('GenomicDistributionsData') for documentation\n", - "loading from cache\n", - "[1] \"Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_expected_partitions\"\n", - "Successfully calculated and plot expected distribution over genomic partitions.\n", - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_cumulative_partitions\"\n", - "see ?GenomicDistributionsData and browseVignettes('GenomicDistributionsData') for documentation\n", - "loading from cache\n", - "[1] \"Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_cumulative_partitions\"\n", - "Successfully calculated and plot cumulative distribution over genomic partitions.\n", - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_widths_histogram\"\n", - "[1] \"Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_widths_histogram\"\n", - "Successfully calculated and plot quantile-trimmed histogram of widths.\n", - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_neighbor_distances\"\n", - "[1] \"Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_neighbor_distances\"\n", - "Successfully calculated and plot distance between neighbor regions.\n", - "[1] \"Plotting: /home/bnt4me/virginia/repos/bedbase_all/bedboss/docs_jupyter/all_tutorial/tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_open_chromatin\"\n", - "[1] \"Writing plot json: output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/sample1_open_chromatin\"\n", - "Successfully calculated and plot cell specific enrichment for open chromatin.\n", - "\u001b[?25h\u001b[?25h\n", - "Command completed. Elapsed time: 0:00:49. Running peak memory: 3.843GB. \n", - " PID: 434343;\tCommand: Rscript;\tReturn code: 0;\tMemory used: 3.843GB\n", - "\n", - "These results exist for 'eb617f28e129c401be94069e0fdedbb5': name, regions_no, mean_region_width, md5sum, bedfile, genome, bigbedfile, widths_histogram, neighbor_distances\n", - "Starting cleanup: 2 files; 0 conditional files for cleanup\n", - "\n", - "Cleaning up flagged intermediate files. . .\n", - "\n", - "### Pipeline completed. Epilogue\n", - "* Elapsed time (this run): 0:00:50\n", - "* Total elapsed time (all runs): 0:00:50\n", - "* Peak memory (this run): 3.8432 GB\n", - "* Pipeline completed time: 2023-02-27 12:48:16\n" - ] - } - ], - "source": [ - "bedboss all --sample-name tutorial_f1 \\\n", - "--input-file sample1.bed.gz \\\n", - "--input-type bed \\\n", - "--outfolder ./tutorial \\\n", - "--genome GRCh38 \\\n", - "--bedbase-config bedbase_config_test.yaml" - ] - }, - { - "cell_type": "markdown", - "id": "63d83f3c", - "metadata": {}, - "source": [ - "Now let's check if all files where saved" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "id": "7a50535d", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[0m\u001b[01;34mbedmaker_logs\u001b[0m \u001b[01;31msample1.bed.gz\u001b[0m\n" - ] - } - ], - "source": [ - "ls tutorial/bed_files" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "id": "9a826059", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "sample1.bigBed\n" - ] - } - ], - "source": [ - "ls tutorial/bigbed_files" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "id": "aa8609fb", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "sample1_chrombins.pdf sample1_open_chromatin.pdf\n", - "\u001b[0m\u001b[01;35msample1_chrombins.png\u001b[0m \u001b[01;35msample1_open_chromatin.png\u001b[0m\n", - "sample1_cumulative_partitions.pdf sample1_paritions.pdf\n", - "\u001b[01;35msample1_cumulative_partitions.png\u001b[0m \u001b[01;35msample1_paritions.png\u001b[0m\n", - "sample1_expected_partitions.pdf sample1_plots.json\n", - "\u001b[01;35msample1_expected_partitions.png\u001b[0m sample1_tssdist.pdf\n", - "sample1.json \u001b[01;35msample1_tssdist.png\u001b[0m\n", - "sample1_neighbor_distances.pdf sample1_widths_histogram.pdf\n", - "\u001b[01;35msample1_neighbor_distances.png\u001b[0m \u001b[01;35msample1_widths_histogram.png\u001b[0m\n" - ] - } - ], - "source": [ - "ls tutorial/output/bedstat_output/eb617f28e129c401be94069e0fdedbb5/" - ] - }, - { - "cell_type": "markdown", - "id": "2208d244", - "metadata": {}, - "source": [ - "Everything was ran correctly:)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Bash", - "language": "bash", - "name": "bash" - }, - "language_info": { - "codemirror_mode": "shell", - "file_extension": ".sh", - "mimetype": "text/x-sh", - "name": "bash" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/docs/bedboss/tutorials/bedbuncher_tutorial.md b/docs/bedboss/tutorials/bedbuncher_tutorial.md index 761cf98..66b164d 100644 --- a/docs/bedboss/tutorials/bedbuncher_tutorial.md +++ b/docs/bedboss/tutorials/bedbuncher_tutorial.md @@ -3,6 +3,10 @@ Bedbuncher is used to create bedset of bed files in the bedbase database. ### 1) Create bedbase config file + +How to create config file: [configuration section](../how-to-configure.md). + + ### 2) Create pep with bed file record identifiers. To do so, you need to create a PEP with the following fields: sample_name (where sample_name is record_identifier), or `sample_name` + `record_identifier` e.g. sample_table: diff --git a/docs/bedboss/tutorials/bedindex_tutorial.md b/docs/bedboss/tutorials/bedindex_tutorial.md index 1902796..8f93c5e 100644 --- a/docs/bedboss/tutorials/bedindex_tutorial.md +++ b/docs/bedboss/tutorials/bedindex_tutorial.md @@ -1,6 +1,10 @@ ### Indexing to qdrant database ### 1. Create bedbase config file + +How to create a BEDbase configuration file is described in the [configuration section](../how-to-configure.md). + + ### 2. Run bedboss index #### From command line diff --git a/docs/bedboss/tutorials/tutorial_all.md b/docs/bedboss/tutorials/tutorial_all.md index e642a75..b14bd55 100644 --- a/docs/bedboss/tutorials/tutorial_all.md +++ b/docs/bedboss/tutorials/tutorial_all.md @@ -1,6 +1,6 @@ ## Bedboss run-all -Bedboss run-all is intended to run on ONE sample (bed file) and run all bedboss pipelines: +Bedboss run-all is intended to run on **ONE** sample (bed file) and run all bedboss pipelines: bedmaker (+ bedclassifier + bedqc) -> bedstat. After that optionally it can run bedbuncher, qdrant indexing and upload metadata to PEPhub. ### Step 1: Install all dependencies @@ -14,7 +14,7 @@ If requirements are not satisfied, you will see the list of missing packages. ### Step 2: Create bedconf.yaml file To run bedboss, you need to create a bedconf.yaml file with configuration. -Detail instructions are in the configuration section. +Detail instructions are in the [configuration section](../how-to-configure.md). ### Step 3: Run bedboss To run bedboss, you need to run the next command: @@ -32,6 +32,7 @@ Above command will run bedboss on the bed file and create a bedstat file in the It contains only required parameters. For more details, please check the usage section. By default, results will be uploaded only to the PostgreSQL database. + - To upload results to PEPhub, you need to make the `databio` org available on GitHub, then login to PEPhub, and add the `--upload-pephub` flag to the command. - To upload results to Qdrant, you need to add the `--upload-qdrant` flag to the command. - To upload actual files to S3, you need to add the `--upload-s3` flag to the command, and before uploading, you have to set up all necessary environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_ENDPOINT_URL. diff --git a/docs/bedboss/tutorials/tutorial_run_pep.md b/docs/bedboss/tutorials/tutorial_run_pep.md index 71df030..f497084 100644 --- a/docs/bedboss/tutorials/tutorial_run_pep.md +++ b/docs/bedboss/tutorials/tutorial_run_pep.md @@ -15,7 +15,7 @@ If requirements are not satisfied, you will see the list of missing packages. ### Step 2: Create bedconf.yaml file To run bedboss run-pep, you need to create a bedconf.yaml file with configuration. -Detailed instructions are in the configuration section. +Detailed instructions are in the [configuration section](../how-to-configure.md). ### Step 3: Create PEP with bed files. BEDboss PEP should contain next fields: sample_name, input_file, input_type, genome. @@ -36,6 +36,7 @@ Above command will run bedboss on the bed file and create a file with statistics It contains only required parameters. For more details, please check the usage section. By default, results will be uploaded only to the PostgreSQL database. + - To upload results to PEPhub, you need to make the `databio` org available on GitHub, then login to PEPhub, and add the `--upload-pephub` flag to the command. - To upload results to Qdrant, you need to add the `--upload-qdrant` flag to the command. - To upload actual files to S3, you need to add the `--upload-s3` flag to the command, and before uploading, you have to set up all necessary environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_ENDPOINT_URL. diff --git a/mkdocs.yml b/mkdocs.yml index d33825a..326fbf8 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -57,13 +57,13 @@ nav: - Changelog: bbconf/changelog.md - Reference: - How to cite: citations.md - - Usage: bedboss_usage.md + - Usage: bedboss/usage.md - Support: https://github.com/bedbase/bedboss/issues - Contributing: contributing.md - Changelog: changelog.md - BEDboss: - BEDboss: bedboss/README.md - - Tutorial: + - Tutorials: - BEDboss run-pep: bedboss/tutorials/tutorial_run_pep.md - BEDboss-all pipeline: bedboss/tutorials/tutorial_all.md - BEDmaker tutorial: bedboss/tutorials/bedmaker_tutorial.md From 87289d7144236969fdc89b4316b0eaf21ee416fe Mon Sep 17 00:00:00 2001 From: Khoroshevskyi
In order to start working with the BedBaseConf
object, it has to be initialized first. The constuctor requires one argument, which is a path to the configuration file (in YAML format).
The minimal configuration must define the path
section with 3 keys:
pipeline_output_path
: path to the desired output directory for the pipelinesbedstat_dir
: name of the bedstat pipeline output directorybedbuncher_dir
: name of the bedbuncher pipeline output directoryHere's an example of a minimal bedbase configuration file:
- -!cat ../tests/data/config_min.yaml
-
# min config example. Refer to bbconf/const.py for key names and default values - -path: - pipeline_output_path: $HOME/bedbase - bedstat_dir: bedstat_output - bedbuncher_dir: bedbuncher_output-
Apart from the required path
section, there are 2 other sections that can be used to configure the PostgreSQL database, used to store the metadata about the bedfiles and bedsets (database
section) and to configure the bedhost server that displays the pipeline results and provides an API to query them (server
section).
Here's an example of a complete bedbase configuration file:
- -!cat ../tests/data/config.yaml
-
database: - name: pipestat-test - user: postgres - password: pipestat-password - host: localhost -# port: 5432; intentionally commented out to test the defaults setting system -path: - pipeline_output_path: $BEDBASE_DATA_PATH/outputs - bedstat_dir: bedstat_output - bedbuncher_dir: bedbuncher_output - remote_url_base: null -server: - host: 0.0.0.0 - port: 8000-
In case any of the values shown below is not provided in the configuration file, it will be set to a default value
- -from bbconf.const import DEFAULT_SECTION_VALUES
-from attmap import AttMap
-AttMap(DEFAULT_SECTION_VALUES)
-
AttMap -path: - remote_url_base: null -database: - user: postgres - password: bedbasepassword - name: postgres - port: 5432 - host: localhost -server: - host: 0.0.0.0 - port: 80-
BedBaseConf
object usage demonstration¶bbconf
standardizes reporting of bedstat and bedbuncher results. It formalizes a way for these pipelines and downstream tools communicate -- the produced results can easily and reliably become an
-input for the server (bedhost). The object exposes API for interacting with the results and is backed by a PostgreSQL database.
bbconf
provides a way to easily determine a path to the required configuration file. The file can be pointed to by the $BEDBASE
environment variable. get_bedbase_cfg
function returns a path which can be either excplicitly provided as an argument or read from the environment variable.
import logmuse
-logmuse.init_logger("bbconf", "DEBUG")
-from bbconf import *
-
-bbc = BedBaseConf(config_path="../tests/data/config.yaml")
-
DEBU 10:09:08 | bbconf:est:266 > Configured logger 'bbconf' using logmuse v0.2.6 --
As you can see above, missing entries are populated with default values.
-BedBaseConf
objects consist of two PipestatManager
instances. These objects are responsible for bedfiles and bedsets metadata management. Additionally, BedBaseConf
maintains a "relationship table" that stores the information regarding the bedfile-bedset relationsips, i.e. which bedfile is a part of which bedset.
The PipestatManager
instances for bedfiles and bedsets can be accessed via the object properties: BedBaseConf.bed
and BedBaseConf.bedset
, respectively:
BedBaseConf.bed
:¶print(bbc.bed)
-
PipestatManager (bedfiles) -Backend: PostgreSQL -Results schema source: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/bbconf/schemas/bedfiles_schema.yaml -Status schema source: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pipestat/schemas/status_schema.yaml -Records count: 11 --
BedBaseConf.bedset
:¶print(bbc.bedset)
-
PipestatManager (bedsets) -Backend: PostgreSQL -Results schema source: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/bbconf/schemas/bedsets_schema.yaml -Status schema source: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pipestat/schemas/status_schema.yaml -Records count: 3 --
BedBaseConf.config
:¶Additionally, there's a BedBaseConf.config
property, that can be used to retrieve the bedbase project configuration values, which include both ones declared in the configuration file and default ones:
print(bbc.config)
-
database: - name: pipestat-test - user: postgres - password: pipestat-password - host: localhost - port: 5432 -path: - pipeline_output_path: $BEDBASE_DATA_PATH/outputs - bedstat_dir: bedstat_output - bedbuncher_dir: bedbuncher_output - remote_url_base: null -server: - host: 0.0.0.0 - port: 8000 --
Before we start interacting with the database, we need to establish the connection. The required database information is sourced from the object itself. Obviously, the PostgreSQL database instance has to be launched before and running in the background. For example, to run the database in a Docker container, execute these two lines:
-docker volume create postgres-data
-docker run -d --name bedbase-postgres -p 5432:5432 -e POSTGRES_PASSWORD=bedbasepassword -e POSTGRES_USER=postgres -e POSTGRES_DB=postgres -v postgres-data:/var/lib/postgresql/data postgres
-
-The environment variables passed to the container need to match the settings in BedBaseConf
object.
bbconf
package comes with a predefined schemas, that describe the required bed and bedset metadata including the identifiers and types. For example, name of the bedfile, that will be stored in the column "name"
has to be a string, whereas columns "widths_histogram"
expects an image:
print(bbc.bed.schema["name"])
-print(bbc.bed.schema["widths_histogram"])
-
{'type': 'string', 'description': 'BED file name'} -{'type': 'image', 'description': 'Quantile-trimmed histogram of widths'} --
A result of type image
is in fact a mapping with three required elements: path
, thumbnail_path
and title
. The actual jsonschema schemas can be accessed as result_schemas
property for both tables:
bbc.bed.result_schemas["widths_histogram"]
-
{'type': 'object', - 'description': 'Quantile-trimmed histogram of widths', - 'properties': {'path': {'type': 'string'}, - 'thumbnail_path': {'type': 'string'}, - 'title': {'type': 'string'}}, - 'required': ['path', 'thumbnail_path', 'title']}-
bbc.bed.record_count
-
11-
bbc.bed.report(record_identifier="78c0e4753d04b238fc07e4ebe5a02984", values={"name": "some_name"})
-
These results exist for '78c0e4753d04b238fc07e4ebe5a02984': ['name'] --
False-
Oops, name
for this bedfile has been reported already. BedBaseConf
, does not allow reporting results overwriting, unless it's explicitly forced with force_overwrite=True
.
Let's try reporting a different value:
- -bbc.bed.report(record_identifier="78c0e4753d04b238fc07e4ebe5a02984", values={"test": "some_value"})
-
---------------------------------------------------------------------------- -AssertionError Traceback (most recent call last) -<ipython-input-9-932799b3747a> in <module> -----> 1 bbc.bed.report(record_identifier="78c0e4753d04b238fc07e4ebe5a02984", values={"test": "some_value"}) - -/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pipestat/pipestat.py in report(self, values, record_identifier, force_overwrite, strict_type, return_id) - 764 raise SchemaNotFoundError("report results") - 765 result_identifiers = list(values.keys()) ---> 766 self.assert_results_defined(results=result_identifiers) - 767 existing = self._check_which_results_exist( - 768 rid=record_identifier, results=result_identifiers) - -/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pipestat/pipestat.py in assert_results_defined(self, results) - 1029 for r in results: - 1030 assert r in known_results, SchemaError( --> 1031 f"'{r}' is not a known result. Results defined in the " - 1032 f"schema are: {list(known_results)}.") - 1033 - -AssertionError: 'test' is not a known result. Results defined in the schema are: ['name', 'md5sum', 'bedfile', 'bigbedfile', 'regions_no', 'gc_content', 'mean_absolute_tss_dist', 'mean_region_width', 'exon_frequency', 'intron_frequency', 'promoterprox_frequency', 'intergenic_frequency', 'promotercore_frequency', 'fiveutr_frequency', 'threeutr_frequency', 'fiveutr_percentage', 'threeutr_percentage', 'promoterprox_percentage', 'exon_percentage', 'intron_percentage', 'intergenic_percentage', 'promotercore_percentage', 'tssdist', 'chrombins', 'gccontent', 'paritions', 'expected_partitions', 'cumulative_partitions', 'widths_histogram', 'neighbor_distances', 'open_chromatin', 'other'].-
Oops, the result test
is not allowed, since it hasn't been specified in the schema. Results that are allowed are prinded in the error message above.
Let's try reporting a new bedfile then:
- -bbc.bed.report(record_identifier="78c1e4111d04b238fc11e4ebe5a02984", values={"name": "some_name"})
-
Reported records for '78c1e4111d04b238fc11e4ebe5a02984' in 'bedfiles' namespace: - - name: some_name --
True-
Success, the name for the bedfile identified by 78c1e4111d04b238fc11e4ebe5a02984
has been reported.
Therefore, we can retrieve this result:
- -bbc.bed.retrieve(record_identifier="78c1e4111d04b238fc11e4ebe5a02984", result_identifier="name")
-
'some_name'-
Or all the reported results:
- -bbc.bed.retrieve(record_identifier="78c1e4111d04b238fc11e4ebe5a02984")
-
{'name': 'some_name'}-
Naturally, a record can be removed:
- -bbc.bed.remove(record_identifier="78c1e4111d04b238fc11e4ebe5a02984")
-
Removing '78c1e4111d04b238fc11e4ebe5a02984' record --
True-
Another useful feature of BedBaseConf
is conveninent many to many bedfile-bedset relationships handling. To report one use BedBaseConf.report_relationship
method:
bbc.report_relationship(bedfile_id=3, bedset_id=2)
-
Now we can select bedfiles that are part of a bedsets with name "bedsetOver1kRegions". Therefore they need to match the following query: name='bedsetOver1kRegions'
. With bedfile_col
argument we select the bedfile table columns we're interested in:
bbc.select_bedfiles_for_bedset(condition="name=%s", condition_val=["bedsetOver1kRegions"], bedfile_col=["id", "name"])
-
[[1, 'GSE105587_ENCFF018NNF_conservative_idr_thresholded_peaks_GRCh38'], - [2, 'GSE105977_ENCFF617QGK_optimal_idr_thresholded_peaks_GRCh38'], - [3, 'GSE105977_ENCFF793SZW_conservative_idr_thresholded_peaks_GRCh38'], - [4, 'GSE105977_ENCFF937CGY_peaks_GRCh38'], - [5, 'GSE91663_ENCFF316ASR_peaks_GRCh38'], - [6, 'GSE91663_ENCFF319TPR_conservative_idr_thresholded_peaks_GRCh38'], - [7, 'GSE91663_ENCFF553KIK_optimal_idr_thresholded_peaks_GRCh38'], - [8, 'GSM2423312_ENCFF155HVK_peaks_GRCh38'], - [9, 'GSM2423313_ENCFF722AOG_peaks_GRCh38'], - [10, 'GSM2827349_ENCFF196DNQ_peaks_GRCh38'], - [11, 'GSM2827350_ENCFF928JXU_peaks_GRCh38']]-
The unwanted relationships can be removed with BedBaseConf.remove_relationship
method:
bbc.remove_relationship(bedfile_ids=[3], bedset_id=2)
-