Skip to content

Commit

Permalink
Merge pull request #34 from j-andrews7/v0.4.0
Browse files Browse the repository at this point in the history
V0.4.0
  • Loading branch information
j-andrews7 authored Sep 30, 2024
2 parents 6b69fa2 + 811fddb commit 3fcd145
Show file tree
Hide file tree
Showing 12 changed files with 82 additions and 129 deletions.
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
# Changelog

## v0.4.0

**Release date: 09/30/2024**

- Restructured app to use subcommands:
- `strprofiler` is now `strprofiler compare`
- `strprofiler-app` is now `strprofiler app`
- `clastr` is now `strprofiler claster`
- Parameters remain the same for each.
- Tooltips added to inputs in Shiny application.
- More graceful handling of edge cases that returned unhelpful feedback.
- Better display of batch results in app that don't require download for viewing.

## v0.3.1

**Release date: 07/29/2024**
Expand Down
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ pip install strprofiler

**STRprofiler** can be run directly from the command line.

`strprofiler -sm "SampleMap_exp.csv" -scol "Sample Name" -o ./strprofiler_output STR1.xlsx STR2.csv STR3.txt`
`strprofiler compare -sm "SampleMap_exp.csv" -scol "Sample Name" -o ./strprofiler_output STR1.xlsx STR2.csv STR3.txt`

Full usage information can be found by running `strprofiler --help`.

Expand Down Expand Up @@ -86,7 +86,7 @@ Full usage information can be found by running `strprofiler --help`.
Additionally, the [Cellosaurus](https://www.cellosaurus.org/description.html) (Bairoch, 2018) cell line database can be queried via the [CLASTR](https://www.cellosaurus.org/str-search/) (Robin, Capes-Davis, and Bairoch, 2019) [REST API](https://www.cellosaurus.org/str-search/help.html#5).
`clastr -sm "SampleMap_exp.csv" -scol "Sample Name" -o ./strprofiler_output STR1.xlsx STR2.csv STR3.txt`
`strprofiler clastr -sm "SampleMap_exp.csv" -scol "Sample Name" -o ./strprofiler_output STR1.xlsx STR2.csv STR3.txt`
Full usage information can be found by running `clastr --help`.
Expand Down Expand Up @@ -204,13 +204,14 @@ In addition to the marker columns, this output contains the following columns:
**clastr**
Output for `clastr` is provided in XLSX format. Results follow the CLASTR format, documented here: https://www.cellosaurus.org/str-search/help.html#4
Output for `strprofiler clastr` is provided in XLSX format.
Results follow the CLASTR format, documented here: https://www.cellosaurus.org/str-search/help.html#4
## Database Comparison
**STRprofiler** can be also used to compare batches of samples against a larger database of samples.
`strprofiler -db ExampleSTR_database.csv -o ./strprofiler_output STR1.xlsx`
`strprofiler compare -db ExampleSTR_database.csv -o ./strprofiler_output STR1.xlsx`
In this mode, inputs are compared against the database samples only, and not among themselves. Outputs will be as described above for sample input(s).
Expand All @@ -227,7 +228,7 @@ Optionally, one may provide two metadata columns - "Center" and "Passage", which
## The STRprofiler App
New in v0.2.0 is `strprofiler-app`, a command that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.
New in v0.2.0 is `strprofiler app`, a command that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.
This application can provide a convenient portal to a group's STR database and can be hosted on standard Shiny servers, Posit Connect instances, or ShinyApps.io.
Expand All @@ -252,7 +253,7 @@ The database file uses same format as for the standard `strprofiler` command.
Then create a requirements.txt file in the same directory with `strprofiler` listed:
```
strprofiler>=0.3.0
strprofiler>=0.4.0
```
This app can then be deployed to any of the above endpoints as [one would with any other Shiny app](https://shiny.posit.co/py/docs/deploy.html), e.g.:
Expand All @@ -278,7 +279,7 @@ It is released with zero warranty for any purpose and the authors retain no liab
If you use **STRprofiler** in your research, please cite the DOI:
Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.3.0 (v0.3.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.4.0 (v0.4.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
If you use the `clastr` command or functionality from the Shiny application, please cite the Cellosaurus and CLASTR publications:
Expand Down
5 changes: 3 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,14 @@
project = 'strprofiler'
copyright = '2024, Jared Andrews'
author = 'Jared Andrews'
release = '0.2.0'
release = '0.4.0'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = ['sphinx.ext.autodoc',
'myst_parser']
'myst_parser',
'sphinx_click']

templates_path = ['_templates']
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
Expand Down
41 changes: 26 additions & 15 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,29 +44,31 @@ Amelogenin is not included in the score computation by default but can be includ
Installation
============

**STRprofiler** is available on PyPI and can be installed with ``pip``::
**STRprofiler** is available on PyPi and can be installed with ``pip``::
pip install strprofiler

Usage
=====

.. autofunction:: strprofiler.strprofiler.strprofiler

.. click:: strprofiler.cli:cli
:prog: strprofiler
:nested: full

Querying CLASTR
===============

**STRprofiler** can also be used to directly query CLASTR via their API.
This can be done from within the Shiny application or from the command line via the ``clastr`` command or using the ``clastr_query`` function directly:

.. autofunction:: strprofiler.clastr.clastr_query
This can be done from within the Shiny application or from the command line via the ``strprofiler clastr`` subcommand or using the ``clastr_query`` function directly.

Input Files(s)
~~~~~~~~~~~~~~

An example `input file <https://raw.githubusercontent.com/j-andrews7/STRprofiler/refs/heads/main/tests/ExampleSTR_long.csv>`__ and `reference database <https://raw.githubusercontent.com/j-andrews7/STRprofiler/refs/heads/main/tests/ExampleSTR_database.csv>`__ are available on GitHub.

**STRprofiler** can take either a single STR file or multiple STR files as input.
These files can be csv, tsv, tab-separated text, or xlsx (first sheet used) files. The STR file(s) should be in either 'wide' or 'long' format. The long format expects all columns to map to the markers except for the designated sample name column with each row reflecting a different profile, e.g.:
These files can be csv, tsv, tab-separated text, or xlsx (first sheet used) files. The STR file(s) should be in either 'wide' or 'long' format.
The long format expects all columns to map to the markers except for the designated sample name column with each row reflecting a different profile, e.g.:

+--------+---------+---------+---------+--------+---------+--------+
| Sample | D1S1656 | DYS391 | D3S1358 | D2S441 | D16S539 | D5S818 |
Expand Down Expand Up @@ -148,12 +150,18 @@ The wide format expects a line for each marker for each sample, e.g.:
| Sample2 | FGA | 21 | 294.67 | 11941 | | | | |
+--------------+-----------+-------------+---------+-------------+-------------+---------+-------------+-------------+

In this format, the ``marker_col`` must be specified. Only columns beginning with "Allele" will be used to parse the alleles for each sample/marker. Any other size or height columns will be ignored.
In this format, the ``marker_col`` must be specified.
Only columns beginning with "Allele" will be used to parse the alleles for each sample/marker.
Any other size or height columns will be ignored.


Output Files
~~~~~~~~~~~~

**STRprofiler** generates two types of output files. The first is a summary file, which contains the top hits for each sample above the specified scoring thresholds. This file provides a useful overview in addition to a flag to identify samples with potential mixing for closer inspection. In the output directory, this file will be named `full_summary.strprofiler.YYYYMMDD.HH_MM_SS.csv` where the date and time are the time the program was run.
**STRprofiler** generates two types of output files.
The first is a summary file, which contains the top hits for each sample above the specified scoring thresholds.
This file provides a useful overview in addition to a flag to identify samples with potential mixing for closer inspection.
In the output directory, this file will be named `full_summary.strprofiler.YYYYMMDD.HH_MM_SS.csv` where the date and time are the time the program was run.

In addition to the marker columns, the summary file contains the following columns:

Expand All @@ -173,7 +181,9 @@ In addition to the marker columns, the summary file contains the following colum
| **masters_ref_matches** | Name and Masters (vs. reference) score of matches above scoring threshold. |
+---------------------------+----------------------------------------------------------------------------+

The second is a sample-specific comparison file, which contains the results of the comparison between the query sample and all other provided samples. These files are generated for each STR profile provided in the input file(s) and named after the query sample in question. For example, if the input file contains a sample named `Sample1`, the output file will be named `Sample1.strprofiler.YYYYMMDD.HH_MM_SS.csv`.
The second is a sample-specific comparison file, which contains the results of the comparison between the query sample and all other provided samples.
These files are generated for each STR profile provided in the input file(s) and named after the query sample in question.
For example, if the input file contains a sample named `Sample1`, the output file will be named `Sample1.strprofiler.YYYYMMDD.HH_MM_SS.csv`.

In addition to the marker columns, this output contains the following columns:

Expand Down Expand Up @@ -206,9 +216,10 @@ Database Comparison

.. code:: bash
strprofiler -db ExampleSTR_database.csv -o ./strprofiler_output STR1.xlsx
strprofiler compare -db ExampleSTR_database.csv -o ./strprofiler_output STR1.xlsx
In this mode, inputs are compared against the database samples only, and not among themselves. Outputs will be as described above for sample input(s).
In this mode, inputs are compared against the database samples only, and not among themselves.
Outputs will be as described above for sample input(s).

Database Format
^^^^^^^^^^^^^^^
Expand All @@ -228,7 +239,7 @@ Optionally, one may provide two metadata columns - "Center" and "Passage", which
The ``STRprofiler`` App
=======================

New in v0.2.0 is ``strprofiler-app``, a CLI command that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.
New in v0.2.0 is ``strprofiler app``, a subcommand that launches a Shiny application that allows for user queries against an uploaded or pre-defined database (provided with the `-db` parameter) of STR profiles.

This application can provide a convenient portal to a group's STR database and can be hosted on standard Shiny servers, Posit Connect instances, or ShinyApps.io.

Expand All @@ -254,7 +265,7 @@ The database file should be a csv file with the same format as described above.
Then create a requirements.txt file in the same directory with `strprofiler` listed:

.. code:: bash
strprofiler>=0.3.0
strprofiler>=0.4.0
This app can then be deployed to any of the above endpoints as `one would with any other Shiny app <https://shiny.posit.co/py/docs/deploy.html>`__.

Expand Down Expand Up @@ -286,7 +297,7 @@ Reference
=========

If you use **STRprofiler** in your research, please cite the following:
Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.3.0 (v0.3.0). Zenodo. https://doi.org/10.5281/zenodo.7348386
Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.4.0 (v0.4.0). Zenodo. https://doi.org/10.5281/zenodo.7348386

Indices and tables
==================
Expand Down
3 changes: 2 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ faicons
requests
flatten-json
json
requests
requests
sphinx-click
6 changes: 2 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "strprofiler"
version = "0.3.1"
version = "0.4.0"
description = "A python package, CLI tool, and Shiny application to compare short tandem repeat (STR) profiles."
authors = ["Jared Andrews <jared.andrews07@gmail.com>",
"Mike Lloyd <mike.lloyd@jax.org>"]
Expand All @@ -24,9 +24,7 @@ flatten-json = "^0.1.14"
[tool.poetry.dev-dependencies]

[tool.poetry.scripts]
strprofiler = 'strprofiler.strprofiler:strprofiler'
clastr = 'strprofiler.clastr:clastr_query'
strprofiler-app = 'strprofiler.strprofiler:local_shiny_app'
strprofiler = 'strprofiler.cli:cli'

[build-system]
requires = ["poetry-core>=1.0.0"]
Expand Down
52 changes: 3 additions & 49 deletions strprofiler/clastr.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import strprofiler.utils as utils


@click.command()
@click.command(name="clastr")
@click.option(
"-sa",
"--search_algorithm",
Expand Down Expand Up @@ -116,53 +116,7 @@ def clastr_query(
penta_fix=True,
score_amel=False,
):
"""clastr_query compares STR profiles to the human Cellosaurus knowledge base using the CLASTR REST API.
:param input_files: List of input STR files in csv, xlsx, tsv, or txt format.
:type input_files: click.Path
:param sample_map: Path to sample map in csv format for renaming.
First column should be sample names as given in STR file(s),
second should be new names to assign. No header. Defaults to None
:type sample_map: str, optional
:param output_dir: Path to output directory, defaults to "./STRprofiler"
:type output_dir: str, optional
:param search_algorithm: Search algorithm to use in the Clastr query, Options: 1 - Tanabe, 2 - Masters (vs. query); 3 - Masters (vs. reference)
Defaults to 1 (tanabe).
:type search_algorithm: int
:param scoring_mode: Search mode to account for missing alleles in query or reference.
Options: 1 - Non-empty markers, 2 - Query markers, 3 - Reference markers.
Defaults to 1 ( Non-empty markers).
:type search_algorithm: int
:param score_filter: Minimum score to report as potential matches in summary table, defaults to 80
:type score_filter: int, optional
:param max_results: Filter defining the maximum number of results to be returned.
Note that in the case of conflicted cell lines, the Best and Worst versions are processed as pairs and only the best
score is affected by the threshold. Consequently, some Worst cases with a score below the threshold can still be present in the results.
Defaults to 200
:type mix_threshold: int, optional
:param min_markers: Filter defining the minimum number of markers for matches to be reported, defaults to 8.
:type mix_threshold: int, optional
:param sample_col: Name of sample column in STR file(s), defaults to "Sample Name"
:type sample_col: str, optional
:param marker_col: Name of marker column in STR file(s).
Only used if format is 'wide', defaults to "Marker"
:type marker_col: str, optional
:param penta_fix: Whether to try to harmonize PentaE/D allele spelling, defaults to True
:type penta_fix: bool, optional
:param score_amel: Use Amelogenin for similarity scoring, defaults to False
:type score_amel: bool, optional
"""
"""clastr compares STR profiles to the human Cellosaurus knowledge base via the CLASTR REST API."""

# Make output directory and open file for logging.
Path(output_dir).mkdir(parents=True, exist_ok=True)
Expand All @@ -172,7 +126,7 @@ def clastr_query(

print("Search algorithm: " + str(search_algorithm), file=log_file)
print("Scoring mode: " + str(scoring_mode), file=log_file)
print("Score filter: " + str(marker_col), file=log_file)
print("Score filter: " + str(score_filter), file=log_file)
print("Max results: " + str(max_results), file=log_file)
print("Min markers: " + str(min_markers), file=log_file)
print("Sample map: " + str(sample_map), file=log_file)
Expand Down
12 changes: 12 additions & 0 deletions strprofiler/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import rich_click as click
from strprofiler.strprofiler import strprofiler, app
from strprofiler.clastr import clastr_query

@click.group()
@click.version_option()
def cli():
pass

cli.add_command(strprofiler)
cli.add_command(app)
cli.add_command(clastr_query)
14 changes: 7 additions & 7 deletions strprofiler/shiny_app/shiny_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ def create_app(db=None):
value=80,
width="100%",
),
"Score threshold used to filter results"
"Score threshold that must be met for result to be displayed"
),
),
position="right",
Expand Down Expand Up @@ -245,7 +245,7 @@ def create_app(db=None):
class_="btn-success",
width="45%",
),
"Query STRprofiler Database",
"Submit query",
id="tt_selected_search",
placement="left",
),
Expand Down Expand Up @@ -319,7 +319,7 @@ def create_app(db=None):
value=80,
width="100%",
),
"Masters (vs. query) score threshold used to filter results"
"Masters (vs. query) score that must be met for result to be displayed"
),
),
ui.column(
Expand All @@ -331,7 +331,7 @@ def create_app(db=None):
value=80,
width="100%",
),
"Tanabe score threshold used to filter results"
"Tanabe score that must be met for result to be displayed"
),
ui.tooltip(
ui.input_numeric(
Expand All @@ -340,7 +340,7 @@ def create_app(db=None):
value=80,
width="100%",
),
"Masters (vs. reference) score threshold used to filter results"
"Masters (vs. reference) score that must be met for result to be displayed"
),
)
)
Expand Down Expand Up @@ -373,7 +373,7 @@ def create_app(db=None):
value=80,
width="100%"
),
"Score threshold used to filter results"
"Score threshold that must be met for result to be displayed"
)
)
)
Expand Down Expand Up @@ -444,7 +444,7 @@ def create_app(db=None):
),
),
ui.nav_panel(
"About",
"Usage Guide",
ui.panel_main(
ui.tags.iframe(
src="help.html",
Expand Down
Loading

0 comments on commit 3fcd145

Please sign in to comment.