Merge pull request #34 from j-andrews7/v0.4.0

V0.4.0
j-andrews7 · Sep 30, 2024 · 3fcd145 · 3fcd145
2 parents 6b69fa2 + 811fddb
commit 3fcd145
Show file tree

Hide file tree

Showing 12 changed files with 82 additions and 129 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,18 @@
 # Changelog
 
+## v0.4.0
+
+**Release date: 09/30/2024**
+
+ - Restructured app to use subcommands:
+   - `strprofiler` is now `strprofiler compare`
+   - `strprofiler-app` is now `strprofiler app`
+   - `clastr` is now `strprofiler claster`
+   - Parameters remain the same for each.
+ - Tooltips added to inputs in Shiny application.
+ - More graceful handling of edge cases that returned unhelpful feedback.
+ - Better display of batch results in app that don't require download for viewing.
+
 ## v0.3.1
 
 **Release date: 07/29/2024**

diff --git a/README.md b/README.md
@@ -40,7 +40,7 @@ pip install strprofiler
 
 **STRprofiler** can be run directly from the command line. 
 
-`strprofiler -sm "SampleMap_exp.csv" -scol "Sample Name" -o ./strprofiler_output STR1.xlsx STR2.csv STR3.txt`
+`strprofiler compare -sm "SampleMap_exp.csv" -scol "Sample Name" -o ./strprofiler_output STR1.xlsx STR2.csv STR3.txt`
 
 Full usage information can be found by running `strprofiler --help`.
 
@@ -86,7 +86,7 @@ Full usage information can be found by running `strprofiler --help`.
 
 Additionally, the [Cellosaurus](https://www.cellosaurus.org/description.html) (Bairoch, 2018) cell line database can be queried via the [CLASTR](https://www.cellosaurus.org/str-search/) (Robin, Capes-Davis, and Bairoch, 2019) [REST API](https://www.cellosaurus.org/str-search/help.html#5).  
 
-`clastr -sm "SampleMap_exp.csv" -scol "Sample Name" -o ./strprofiler_output STR1.xlsx STR2.csv STR3.txt`
+`strprofiler clastr -sm "SampleMap_exp.csv" -scol "Sample Name" -o ./strprofiler_output STR1.xlsx STR2.csv STR3.txt`
 
 Full usage information can be found by running `clastr --help`.
 
@@ -204,13 +204,14 @@ In addition to the marker columns, this output contains the following columns:
 
 **clastr**
 
-Output for `clastr` is provided in XLSX format. Results follow the CLASTR format, documented here: https://www.cellosaurus.org/str-search/help.html#4
+Output for `strprofiler clastr` is provided in XLSX format. 
+Results follow the CLASTR format, documented here: https://www.cellosaurus.org/str-search/help.html#4
 
 ## Database Comparison
 
 **STRprofiler** can be also used to compare batches of samples against a larger database of samples. 
 
-`strprofiler -db ExampleSTR_database.csv -o ./strprofiler_output STR1.xlsx`
+`strprofiler compare -db ExampleSTR_database.csv -o ./strprofiler_output STR1.xlsx`
 
 In this mode, inputs are compared against the database samples only, and not among themselves. Outputs will be as described above for sample input(s).
 
@@ -227,7 +228,7 @@ Optionally, one may provide two metadata columns - "Center" and "Passage", which
 
 ## The STRprofiler App
 
-New in v0.2.0 is `strprofiler-app`, a command that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.
+New in v0.2.0 is `strprofiler app`, a command that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.
 
 This application can provide a convenient portal to a group's STR database and can be hosted on standard Shiny servers, Posit Connect instances, or ShinyApps.io. 
 
@@ -252,7 +253,7 @@ The database file uses same format as for the standard `strprofiler` command.
 Then create a requirements.txt file in the same directory with `strprofiler` listed:
 
 ```
-strprofiler>=0.3.0
+strprofiler>=0.4.0
 ```
 
 This app can then be deployed to any of the above endpoints as [one would with any other Shiny app](https://shiny.posit.co/py/docs/deploy.html), e.g.:
@@ -278,7 +279,7 @@ It is released with zero warranty for any purpose and the authors retain no liab
 
 If you use **STRprofiler** in your research, please cite the DOI:
 
-Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.3.0 (v0.3.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
+Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.4.0 (v0.4.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
 
 If you use the `clastr` command or functionality from the Shiny application, please cite the Cellosaurus and CLASTR publications:
 

diff --git a/docs/conf.py b/docs/conf.py
@@ -13,13 +13,14 @@
 project = 'strprofiler'
 copyright = '2024, Jared Andrews'
 author = 'Jared Andrews'
-release = '0.2.0'
+release = '0.4.0'
 
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
 
 extensions = ['sphinx.ext.autodoc',
-              'myst_parser']
+              'myst_parser',
+              'sphinx_click']
 
 templates_path = ['_templates']
 exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']

diff --git a/docs/index.rst b/docs/index.rst
@@ -44,29 +44,31 @@ Amelogenin is not included in the score computation by default but can be includ
 Installation
 ============
 
-**STRprofiler** is available on PyPI and can be installed with ``pip``::
+**STRprofiler** is available on PyPi and can be installed with ``pip``::
    
       pip install strprofiler
 
 Usage
 =====
 
-.. autofunction:: strprofiler.strprofiler.strprofiler
-
+.. click:: strprofiler.cli:cli
+  :prog: strprofiler
+  :nested: full
 
 Querying CLASTR
 ===============
 
 **STRprofiler** can also be used to directly query CLASTR via their API. 
-This can be done from within the Shiny application or from the command line via the ``clastr`` command or using the ``clastr_query`` function directly:
-
-.. autofunction:: strprofiler.clastr.clastr_query
+This can be done from within the Shiny application or from the command line via the ``strprofiler clastr`` subcommand or using the ``clastr_query`` function directly.
 
 Input Files(s)
 ~~~~~~~~~~~~~~
 
+An example `input file <https://raw.githubusercontent.com/j-andrews7/STRprofiler/refs/heads/main/tests/ExampleSTR_long.csv>`__ and `reference database <https://raw.githubusercontent.com/j-andrews7/STRprofiler/refs/heads/main/tests/ExampleSTR_database.csv>`__ are available on GitHub.
+
 **STRprofiler** can take either a single STR file or multiple STR files as input. 
-These files can be csv, tsv, tab-separated text, or xlsx (first sheet used) files. The STR file(s) should be in either 'wide' or 'long' format. The long format expects all columns to map to the markers except for the designated sample name column with each row reflecting a different profile, e.g.:
+These files can be csv, tsv, tab-separated text, or xlsx (first sheet used) files. The STR file(s) should be in either 'wide' or 'long' format. 
+The long format expects all columns to map to the markers except for the designated sample name column with each row reflecting a different profile, e.g.:
 
 +--------+---------+---------+---------+--------+---------+--------+
 | Sample | D1S1656 |  DYS391 | D3S1358 | D2S441 | D16S539 | D5S818 | 
@@ -148,12 +150,18 @@ The wide format expects a line for each marker for each sample, e.g.:
 | Sample2      |  FGA      | 21          | 294.67  | 11941       |             |         |             |             |
 +--------------+-----------+-------------+---------+-------------+-------------+---------+-------------+-------------+
 
-In this format, the ``marker_col`` must be specified. Only columns beginning with "Allele" will be used to parse the alleles for each sample/marker. Any other size or height columns will be ignored.
+In this format, the ``marker_col`` must be specified. 
+Only columns beginning with "Allele" will be used to parse the alleles for each sample/marker. 
+Any other size or height columns will be ignored.
+
 
 Output Files
 ~~~~~~~~~~~~
 
-**STRprofiler** generates two types of output files. The first is a summary file, which contains the top hits for each sample above the specified scoring thresholds. This file provides a useful overview in addition to a flag to identify samples with potential mixing for closer inspection. In the output directory, this file will be named `full_summary.strprofiler.YYYYMMDD.HH_MM_SS.csv` where the date and time are the time the program was run.
+**STRprofiler** generates two types of output files.
+The first is a summary file, which contains the top hits for each sample above the specified scoring thresholds.
+This file provides a useful overview in addition to a flag to identify samples with potential mixing for closer inspection.
+In the output directory, this file will be named `full_summary.strprofiler.YYYYMMDD.HH_MM_SS.csv` where the date and time are the time the program was run.
 
 In addition to the marker columns, the summary file contains the following columns:
 
@@ -173,7 +181,9 @@ In addition to the marker columns, the summary file contains the following colum
 | **masters_ref_matches**   | Name and Masters (vs. reference) score of matches above scoring threshold. |
 +---------------------------+----------------------------------------------------------------------------+
 
-The second is a sample-specific comparison file, which contains the results of the comparison between the query sample and all other provided samples. These files are generated for each STR profile provided in the input file(s) and named after the query sample in question. For example, if the input file contains a sample named `Sample1`, the output file will be named `Sample1.strprofiler.YYYYMMDD.HH_MM_SS.csv`.
+The second is a sample-specific comparison file, which contains the results of the comparison between the query sample and all other provided samples. 
+These files are generated for each STR profile provided in the input file(s) and named after the query sample in question. 
+For example, if the input file contains a sample named `Sample1`, the output file will be named `Sample1.strprofiler.YYYYMMDD.HH_MM_SS.csv`.
 
 In addition to the marker columns, this output contains the following columns:
 
@@ -206,9 +216,10 @@ Database Comparison
 
 .. code:: bash
 
-   strprofiler -db ExampleSTR_database.csv -o ./strprofiler_output STR1.xlsx
+   strprofiler compare -db ExampleSTR_database.csv -o ./strprofiler_output STR1.xlsx
 
-In this mode, inputs are compared against the database samples only, and not among themselves. Outputs will be as described above for sample input(s).
+In this mode, inputs are compared against the database samples only, and not among themselves. 
+Outputs will be as described above for sample input(s).
 
 Database Format
 ^^^^^^^^^^^^^^^
@@ -228,7 +239,7 @@ Optionally, one may provide two metadata columns - "Center" and "Passage", which
 The ``STRprofiler`` App
 =======================
 
-New in v0.2.0 is ``strprofiler-app``, a CLI command that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.
+New in v0.2.0 is ``strprofiler app``, a subcommand that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.
 
 This application can provide a convenient portal to a group's STR database and can be hosted on standard Shiny servers, Posit Connect instances, or ShinyApps.io. 
 
@@ -254,7 +265,7 @@ The database file should be a csv file with the same format as described above.
 Then create a requirements.txt file in the same directory with `strprofiler` listed:
 
 .. code:: bash
-   strprofiler>=0.3.0
+   strprofiler>=0.4.0
 
 This app can then be deployed to any of the above endpoints as `one would with any other Shiny app <https://shiny.posit.co/py/docs/deploy.html>`__.
 
@@ -286,7 +297,7 @@ Reference
 =========
 
 If you use **STRprofiler** in your research, please cite the following:
-Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.3.0 (v0.3.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
+Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.4.0 (v0.4.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
 
 Indices and tables
 ==================

diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -8,4 +8,5 @@ faicons
 requests
 flatten-json
 json
-requests
+requests
+sphinx-click
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "strprofiler"
-version = "0.3.1"
+version = "0.4.0"
 description = "A python package, CLI tool, and Shiny application to compare short tandem repeat (STR) profiles."
 authors = ["Jared Andrews <jared.andrews07@gmail.com>",
            "Mike Lloyd <mike.lloyd@jax.org>"]
@@ -24,9 +24,7 @@ flatten-json = "^0.1.14"
 [tool.poetry.dev-dependencies]
 
 [tool.poetry.scripts]
-strprofiler = 'strprofiler.strprofiler:strprofiler'
-clastr = 'strprofiler.clastr:clastr_query'
-strprofiler-app = 'strprofiler.strprofiler:local_shiny_app'
+strprofiler = 'strprofiler.cli:cli'
 
 [build-system]
 requires = ["poetry-core>=1.0.0"]

diff --git a/strprofiler/clastr.py b/strprofiler/clastr.py
@@ -8,7 +8,7 @@
 import strprofiler.utils as utils
 
 
-@click.command()
+@click.command(name="clastr")
 @click.option(
     "-sa",
     "--search_algorithm",
@@ -116,53 +116,7 @@ def clastr_query(
     penta_fix=True,
     score_amel=False,
 ):
-    """clastr_query compares STR profiles to the human Cellosaurus knowledge base using the CLASTR REST API.
-
-    :param input_files: List of input STR files in csv, xlsx, tsv, or txt format.
-    :type input_files: click.Path
-
-    :param sample_map: Path to sample map in csv format for renaming.
-        First column should be sample names as given in STR file(s),
-        second should be new names to assign. No header. Defaults to None
-    :type sample_map: str, optional
-
-    :param output_dir: Path to output directory, defaults to "./STRprofiler"
-    :type output_dir: str, optional
-
-    :param search_algorithm: Search algorithm to use in the Clastr query, Options: 1 - Tanabe, 2 - Masters (vs. query); 3 - Masters (vs. reference)
-        Defaults to 1 (tanabe).
-    :type search_algorithm: int
-
-    :param scoring_mode: Search mode to account for missing alleles in query or reference.
-        Options: 1 - Non-empty markers, 2 - Query markers, 3 - Reference markers.
-        Defaults to 1 ( Non-empty markers).
-    :type search_algorithm: int
-
-    :param score_filter: Minimum score to report as potential matches in summary table, defaults to 80
-    :type score_filter: int, optional
-
-    :param max_results: Filter defining the maximum number of results to be returned.
-        Note that in the case of conflicted cell lines, the Best and Worst versions are processed as pairs and only the best
-        score is affected by the threshold. Consequently, some Worst cases with a score below the threshold can still be present in the results.
-        Defaults to 200
-    :type mix_threshold: int, optional
-
-    :param min_markers: Filter defining the minimum number of markers for matches to be reported, defaults to 8.
-    :type mix_threshold: int, optional
-
-    :param sample_col: Name of sample column in STR file(s), defaults to "Sample Name"
-    :type sample_col: str, optional
-
-    :param marker_col: Name of marker column in STR file(s).
-        Only used if format is 'wide', defaults to "Marker"
-    :type marker_col: str, optional
-
-    :param penta_fix: Whether to try to harmonize PentaE/D allele spelling, defaults to True
-    :type penta_fix: bool, optional
-
-    :param score_amel: Use Amelogenin for similarity scoring, defaults to False
-    :type score_amel: bool, optional
-    """
+    """clastr compares STR profiles to the human Cellosaurus knowledge base via the CLASTR REST API."""
 
     # Make output directory and open file for logging.
     Path(output_dir).mkdir(parents=True, exist_ok=True)
@@ -172,7 +126,7 @@ def clastr_query(
 
     print("Search algorithm: " + str(search_algorithm), file=log_file)
     print("Scoring mode: " + str(scoring_mode), file=log_file)
-    print("Score filter: " + str(marker_col), file=log_file)
+    print("Score filter: " + str(score_filter), file=log_file)
     print("Max results: " + str(max_results), file=log_file)
     print("Min markers: " + str(min_markers), file=log_file)
     print("Sample map: " + str(sample_map), file=log_file)

diff --git a/strprofiler/cli.py b/strprofiler/cli.py
@@ -0,0 +1,12 @@
+import rich_click as click
+from strprofiler.strprofiler import strprofiler, app
+from strprofiler.clastr import clastr_query
+
+@click.group()
+@click.version_option()
+def cli():
+    pass
+
+cli.add_command(strprofiler)
+cli.add_command(app)
+cli.add_command(clastr_query)
diff --git a/strprofiler/shiny_app/shiny_app.py b/strprofiler/shiny_app/shiny_app.py
@@ -196,7 +196,7 @@ def create_app(db=None):
                                         value=80,
                                         width="100%",
                                     ),
-                                    "Score threshold used to filter results"
+                                    "Score threshold that must be met for result to be displayed"
                                 ),
                             ),
                             position="right",
@@ -245,7 +245,7 @@ def create_app(db=None):
                                             class_="btn-success",
                                             width="45%",
                                         ),
-                                        "Query STRprofiler Database",
+                                        "Submit query",
                                         id="tt_selected_search",
                                         placement="left",
                                     ),
@@ -319,7 +319,7 @@ def create_app(db=None):
                                                         value=80,
                                                         width="100%",
                                                     ),
-                                                    "Masters (vs. query) score threshold used to filter results"
+                                                    "Masters (vs. query) score that must be met for result to be displayed"
                                                 ),
                                             ),
                                             ui.column(
@@ -331,7 +331,7 @@ def create_app(db=None):
                                                         value=80,
                                                         width="100%",
                                                     ),
-                                                    "Tanabe score threshold used to filter results"
+                                                    "Tanabe score that must be met for result to be displayed"
                                                 ),
                                                 ui.tooltip(
                                                     ui.input_numeric(
@@ -340,7 +340,7 @@ def create_app(db=None):
                                                         value=80,
                                                         width="100%",
                                                     ),
-                                                    "Masters (vs. reference) score threshold used to filter results"
+                                                    "Masters (vs. reference) score that must be met for result to be displayed"
                                                 ),
                                             )
                                         )
@@ -373,7 +373,7 @@ def create_app(db=None):
                                                         value=80,
                                                         width="100%"
                                                     ),
-                                                    "Score threshold used to filter results"
+                                                    "Score threshold that must be met for result to be displayed"
                                                 )
                                             )
                                         )
@@ -444,7 +444,7 @@ def create_app(db=None):
                 ),
             ),
             ui.nav_panel(
-                "About",
+                "Usage Guide",
                 ui.panel_main(
                     ui.tags.iframe(
                         src="help.html",