From d4ff0e691fe508f1b7e857506e3fefe5e2e2ca40 Mon Sep 17 00:00:00 2001 From: Yuta Nagano <52748151+yutanagano@users.noreply.github.com> Date: Sun, 26 Jan 2025 01:24:39 +0000 Subject: [PATCH 1/7] mention calc_residue_representations on home page --- docs/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index.rst b/docs/index.rst index ce82324..916b792 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -8,7 +8,7 @@ SCEPTR is a BERT-like transformer-based neural network implemented in `Pytorch < With the default model providing best-in-class performance with only 153,108 parameters (typical protein language models have tens or hundreds of millions), SCEPTR runs fast- even on a CPU! And if your computer does have a `CUDA- `_ or `MPS-enabled `_ GPU, the sceptr package will automatically detect and use it, giving you blazingly fast performance without the hassle. -sceptr's :ref:`API ` exposes three intuitive functions: :py:func:`~sceptr.calc_vector_representations`, :py:func:`~sceptr.calc_cdist_matrix`, and :py:func:`~sceptr.calc_pdist_vector`-- and it's all you need to make full use of the SCEPTR models. +sceptr's :ref:`API ` exposes four intuitive functions: :py:func:`~sceptr.calc_cdist_matrix`, :py:func:`~sceptr.calc_pdist_vector`, :py:func:`~sceptr.calc_vector_representations`, and :py:func:`~sceptr.calc_residue_representations` -- and it's all you need to make full use of the SCEPTR models. What's even better is that they are fully compliant with `pyrepseq `_'s `tcr_metric `_ API, so sceptr will fit snugly into the rest of your repertoire analysis toolkit. .. figure:: graphical_abstract.png From 267c7559af905055b2487dced24304387b75ac1c Mon Sep 17 00:00:00 2001 From: Yuta Nagano <52748151+yutanagano@users.noreply.github.com> Date: Sun, 26 Jan 2025 01:30:12 +0000 Subject: [PATCH 2/7] add graphical abstract to readme --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 70aeeaf..5704295 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,8 @@ --- +![Graphical abstract.](docs/graphical_abstract.png) + **SCEPTR** (**S**imple **C**ontrastive **E**mbedding of the **P**rimary sequence of **T** cell **R**eceptors) is a small, fast, and accurate TCR representation model that can be used for alignment-free TCR analysis, including for TCR-pMHC interaction prediction and TCR clustering (metaclonotype discovery). Our [manuscript](https://www.cell.com/cell-systems/fulltext/S2405-4712(24)00369-7) demonstrates that SCEPTR can be used for few-shot TCR specificity prediction with improved accuracy over previous methods. From 5ac446906e67aba44fa615a1517483d91f5c086f Mon Sep 17 00:00:00 2001 From: Yuta Nagano <52748151+yutanagano@users.noreply.github.com> Date: Sun, 26 Jan 2025 01:34:58 +0000 Subject: [PATCH 3/7] reference url for graphical abstract on readme --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 5704295..484ea50 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,9 @@ --- -![Graphical abstract.](docs/graphical_abstract.png) +| | +|---| +| Graphical abstract. Traditional protein language models that are trained purely on masked-language modelling underperform sequence alignment models on TCR specificity prediction. In contrast, our model SCEPTR is jointly trained on masked-language modelling and contrastive learning, allowing it to outperform other language models as well as the best sequence alignment models to achieve state-of-the-art performance. | **SCEPTR** (**S**imple **C**ontrastive **E**mbedding of the **P**rimary sequence of **T** cell **R**eceptors) is a small, fast, and accurate TCR representation model that can be used for alignment-free TCR analysis, including for TCR-pMHC interaction prediction and TCR clustering (metaclonotype discovery). Our [manuscript](https://www.cell.com/cell-systems/fulltext/S2405-4712(24)00369-7) demonstrates that SCEPTR can be used for few-shot TCR specificity prediction with improved accuracy over previous methods. From 6c3b72d916f1741d8a33649706748ca137b5ce57 Mon Sep 17 00:00:00 2001 From: Yuta Nagano <52748151+yutanagano@users.noreply.github.com> Date: Sun, 26 Jan 2025 01:41:12 +0000 Subject: [PATCH 4/7] wrap graphical abstract in figure --- README.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 484ea50..0e7bc6c 100644 --- a/README.md +++ b/README.md @@ -14,9 +14,14 @@ --- -| | -|---| -| Graphical abstract. Traditional protein language models that are trained purely on masked-language modelling underperform sequence alignment models on TCR specificity prediction. In contrast, our model SCEPTR is jointly trained on masked-language modelling and contrastive learning, allowing it to outperform other language models as well as the best sequence alignment models to achieve state-of-the-art performance. | +
+ +
+ Graphical abstract. + Traditional protein language models that are trained purely on masked-language modelling underperform sequence alignment models on TCR specificity prediction. + In contrast, our model SCEPTR is jointly trained on masked-language modelling and contrastive learning, allowing it to outperform other language models as well as the best sequence alignment models to achieve state-of-the-art performance. +
+
**SCEPTR** (**S**imple **C**ontrastive **E**mbedding of the **P**rimary sequence of **T** cell **R**eceptors) is a small, fast, and accurate TCR representation model that can be used for alignment-free TCR analysis, including for TCR-pMHC interaction prediction and TCR clustering (metaclonotype discovery). Our [manuscript](https://www.cell.com/cell-systems/fulltext/S2405-4712(24)00369-7) demonstrates that SCEPTR can be used for few-shot TCR specificity prediction with improved accuracy over previous methods. From 80af54ac65cc4c83e78e07b14462c4d487f04f5c Mon Sep 17 00:00:00 2001 From: Yuta Nagano <52748151+yutanagano@users.noreply.github.com> Date: Sun, 26 Jan 2025 01:44:21 +0000 Subject: [PATCH 5/7] revert readme to using markdown table for graphical abstract --- README.md | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 0e7bc6c..484ea50 100644 --- a/README.md +++ b/README.md @@ -14,14 +14,9 @@ --- -
- -
- Graphical abstract. - Traditional protein language models that are trained purely on masked-language modelling underperform sequence alignment models on TCR specificity prediction. - In contrast, our model SCEPTR is jointly trained on masked-language modelling and contrastive learning, allowing it to outperform other language models as well as the best sequence alignment models to achieve state-of-the-art performance. -
-
+| | +|---| +| Graphical abstract. Traditional protein language models that are trained purely on masked-language modelling underperform sequence alignment models on TCR specificity prediction. In contrast, our model SCEPTR is jointly trained on masked-language modelling and contrastive learning, allowing it to outperform other language models as well as the best sequence alignment models to achieve state-of-the-art performance. | **SCEPTR** (**S**imple **C**ontrastive **E**mbedding of the **P**rimary sequence of **T** cell **R**eceptors) is a small, fast, and accurate TCR representation model that can be used for alignment-free TCR analysis, including for TCR-pMHC interaction prediction and TCR clustering (metaclonotype discovery). Our [manuscript](https://www.cell.com/cell-systems/fulltext/S2405-4712(24)00369-7) demonstrates that SCEPTR can be used for few-shot TCR specificity prediction with improved accuracy over previous methods. From b15ac97bf40cef61f9f6ffc294566c2acef8cf97 Mon Sep 17 00:00:00 2001 From: Yuta Nagano <52748151+yutanagano@users.noreply.github.com> Date: Sun, 26 Jan 2025 01:50:38 +0000 Subject: [PATCH 6/7] make graphical abstract larger --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 484ea50..1444466 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ --- -| | +| | |---| | Graphical abstract. Traditional protein language models that are trained purely on masked-language modelling underperform sequence alignment models on TCR specificity prediction. In contrast, our model SCEPTR is jointly trained on masked-language modelling and contrastive learning, allowing it to outperform other language models as well as the best sequence alignment models to achieve state-of-the-art performance. | From c9d817a5e82633b0f6e894baf2c3101622aa300a Mon Sep 17 00:00:00 2001 From: Yuta Nagano <52748151+yutanagano@users.noreply.github.com> Date: Sun, 26 Jan 2025 02:09:48 +0000 Subject: [PATCH 7/7] update readme details to match current software spec --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 1444466..40ef39a 100644 --- a/README.md +++ b/README.md @@ -18,14 +18,14 @@ |---| | Graphical abstract. Traditional protein language models that are trained purely on masked-language modelling underperform sequence alignment models on TCR specificity prediction. In contrast, our model SCEPTR is jointly trained on masked-language modelling and contrastive learning, allowing it to outperform other language models as well as the best sequence alignment models to achieve state-of-the-art performance. | -**SCEPTR** (**S**imple **C**ontrastive **E**mbedding of the **P**rimary sequence of **T** cell **R**eceptors) is a small, fast, and accurate TCR representation model that can be used for alignment-free TCR analysis, including for TCR-pMHC interaction prediction and TCR clustering (metaclonotype discovery). +**SCEPTR** (**S**imple **C**ontrastive **E**mbedding of the **P**rimary sequence of **T** cell **R**eceptors) is a small, fast, and informative TCR representation model that can be used for alignment-free TCR analysis, including for TCR-pMHC interaction prediction and TCR clustering (metaclonotype discovery). Our [manuscript](https://www.cell.com/cell-systems/fulltext/S2405-4712(24)00369-7) demonstrates that SCEPTR can be used for few-shot TCR specificity prediction with improved accuracy over previous methods. SCEPTR is a BERT-like transformer-based neural network implemented in [Pytorch](https://pytorch.org). With the default model providing best-in-class performance with only 153,108 parameters (typical protein language models have tens or hundreds of millions), SCEPTR runs fast- even on a CPU! And if your computer does have a [CUDA-enabled GPU](https://en.wikipedia.org/wiki/CUDA), the sceptr package will automatically detect and use it, giving you blazingly fast performance without the hassle. -sceptr's API exposes three intuitive functions: `calc_vector_representations`, `calc_cdist_matrix`, and `calc_pdist_vector`- and it's all you need to make full use of the SCEPTR models. +sceptr's API exposes four intuitive functions: `calc_cdist_matrix`, `calc_pdist_vector`, `calc_vector_representations`, and `calc_residue_representations` -- and it's all you need to make full use of the SCEPTR models. What's even better is that they are fully compliant with [pyrepseq](https://pyrepseq.readthedocs.io)'s [tcr_metric](https://pyrepseq.readthedocs.io/en/latest/api.html#pyrepseq.metric.tcr_metric.TcrMetric) API, so sceptr will fit snugly into the rest of your repertoire analysis workflow. ## Installation