From 355bfc14eabe2e2fb0498cfd67fa437afe09c0d1 Mon Sep 17 00:00:00 2001
From: DominikRafacz <dominikrafacz@gmail.com>
Date: Thu, 26 Sep 2024 17:25:05 +0200
Subject: [PATCH 1/3] fix typos

---
 NEWS.md                   |   4 +-
 vignettes/quick-start.Rmd | 300 +++++++++++++++++++-------------------
 2 files changed, 152 insertions(+), 152 deletions(-)

diff --git a/NEWS.md b/NEWS.md
index 13af3c1..237b5cb 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -9,8 +9,8 @@
 * `write_fasta()` and `find_motifs()` accept `data.frame` arguments now; sequences and their names are taken from specified two columns
 * more descriptive error messages for non-existing generics that print out classes of the first parameter
 
-## Fixed-ish:
-* return to autoexported `Rcpp` catch declaration
+## Fixed:
+* return to automatically exported `Rcpp` catch declaration
 
 ## Quality of code stuff:
 * added tests and adjusted vignettes for the changes
diff --git a/vignettes/quick-start.Rmd b/vignettes/quick-start.Rmd
index e5a768c..6225947 100644
--- a/vignettes/quick-start.Rmd
+++ b/vignettes/quick-start.Rmd
@@ -1,150 +1,150 @@
----
-title: "Quick Start"
-output: rmarkdown::html_vignette
-vignette: >
-  %\VignetteIndexEntry{Quick Start}
-  %\VignetteEngine{knitr::rmarkdown}
-  %\VignetteEncoding{UTF-8}
----
-
-```{r, include = FALSE}
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>"
-)
-```
-
-`tidysq` package is meant to store and conduct operations on biological sequences. This vignette provides a guide to basic usage of `tidysq`, i.e. reading, manipulating and writing sequences to file.
-
-The most recent version of `tidysq` can be installed with `install_github()` function from `devtools`.
-
-```{r setup}
-# devtools::install_github("BioGenies/tidysq")
-library(tidysq)
-```
-
-## Sequence creation
-
-Biological sequences can be and often are represented as strings -- sequences of letters. For example, a DNA sequence can take the form of `"TAGGCCCTAGACCTG"`, where `A` means adenine, `C` -- cytosine, `G` -- guanine and `T` -- thymine. Exact IUPAC recommendations for one-letter codes can be found [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC341218/).
-
-Within `tidysq` package sequence data is stored in `sq` objects, that is, vectors of biological sequences. They can be created from string vectors as above:
-
-```{r sq_from_string}
-sq_dna <- sq(c("TAGGCCCTAGACCTG", "TAGGCCCTGGGCATG"))
-sq_dna
-```
-
-There are several thing to note. First, each sequence is an element of `sq` object. Many operations are vectorized --- they are applied to all sequences of a vector --- and `sq` objects are no different in this regard. Second, the first line of output says: `basic DNA sequences list`. This means that all sequences of this object are of DNA type and do not use ambiguous letters (more about that in "Advanced alphabet techniques" vignette).
-
-## Subsetting sequences
-
-Manipulating sequence objects is an integral part of `tidysq`. `sq` objects can be easily subsetted using usual R syntax:
-
-```{r sq_subset}
-sq_dna[1]
-```
-
-Extracting subsequences is a bit more complicated than that --- because it uses designated function `bite()`. Its syntax, however, closely resembles that of base R --- indexing starts with one and negative indices are interpreted as "anything except that". It returns an `sq` object with all sequences subsetted:
-
-```{r sq_bite}
-bite(sq_dna, 5:10)
-bite(sq_dna, c(-9, -11, -13))
-```
-
-It's possible to reverse sequences using this function:
-
-```{r sq_bite_reversing}
-# Don't do it like that!
-bite(sq_dna, 15:1)
-```
-
-However, this usage is strongly discouraged, because it's both ineffective and works badly with sequences of different lengths. Instead, there is a designated function `reverse()`:
-
-```{r sq_reverse}
-reverse(sq_dna)
-```
-
-Note that it is very different to base `rev()`, which reverses only the order of sequences, not letters:
-
-```{r sq_rev}
-rev(sq_dna)
-```
-
-We can combine two or more `sq` objects using base `c()` function:
-
-```{r sq_c}
-sq_dna <- c(sq_dna, reverse(sq_dna))
-sq_dna
-```
-
-## Biological interpretation
-
-`tidysq` offers two functions specific to DNA/RNA sequences, namely `complement()` and `translate()`. The former creates sequences with complementary bases, that is, replaces `A` with `T`, `C` with `G` and *vice versa*. The latter translates input to amino acid sequences using [the translation table with three-letter codons](https://en.wikipedia.org/wiki/DNA_and_RNA_codon_tables).
-
-These functions can be called as shown below:
-
-```{r sq_complement_translate}
-complement(sq_dna)
-translate(sq_dna)
-```
-
-One noteworthy feature here is that translation can be done with any genetic code table of those listed [on this Wikipedia page](https://en.wikipedia.org/wiki/List_of_genetic_codes):
-
-```{r sq_translate_other_table}
-translate(sq_dna, table = 6)
-```
-
-## Finding motifs
-
-Motifs are short subsequences. These are often searched for in biological sequences. `tidysq` has two distinct functions that allow the user to perform such search.
-
-One of them is a `%has%` operator that takes `sq` object and character vector as parameters respectively. It returns a logical vector of the same length as `sq` object, where each element says whether all motifs passed as strings were found in given sequence:
-
-```{r sq_has}
-sq_dna %has% "ATC"
-# It can be used to subset sq
-sq_dna[sq_dna %has% c("AG", "CC")]
-```
-
-It says nothing about motif placement within sequence nor it exact form, however. In this case, there is `find_motifs()` function that returns a whole `tibble` (from `tibble` package; basically improved version of `data.frame`) with various info about found motifs. Important thing to note here is that the second argument is a character vector of sequence names to avoid embedding potentially long sequences in resulting `tibble` potentially many times:
-
-```{r sq_find_motifs}
-find_motifs(sq_dna, c("seq1", "seq2", "rev1", "rev2"), c("ATC", "TAG"))
-```
-
-You can also provide this function with a `data.frame` (or, what we recommend, `tibble`) containing one column called `sq`, containing the sequences and the other colum `name` containing the names.
-
-```{r sqibble_find_motifs}
-sqibble <- tibble::tibble(sq = sq_dna, 
-                          name = c("seq1", "seq2", "rev1", "rev2"))
-
-# does the same as the call from previous chunk of code
-find_motifs(sqibble, c("ATC", "TAG"))
-```
-
-There are ambiguous DNA bases in IUPAC codes and these can be used in motifs. One of them is `"N"` --- its meaning is "any of `A`, `C`, `G` or `T`:
-
-```{r sq_find_motifs_amb}
-find_motifs(sqibble, "GNCC")
-```
-
-This example displays the difference between `"sought"` and `"found"` columns. The former contains the string representation of motif that the user was looking for, while the latter contains a `tidysq`-encoded sequence with an "instance" of motif.
-
-Two additional characters are reserved because of their special meaning in motifs. `"^"` means that this motif must be found at the start of a sequence, while `"$"` means the same, but with the end instead. They can be mixed with ambiguous letters, of course:
-
-```{r sq_find_motifs_start_end}
-find_motifs(sqibble, c("^TAG", "ATN$"))
-```
-
-## Exporting sq objects
-
-After doing computations the user might wish to save their sequences for future use. One of the most popular formats for storing biological sequences is FASTA. `tidysq` allows the user to write sequences to FASTA file with `write_fasta()` function. Important thing to remember here that the arguments for the function are analogous to those used in `find_motifs()` -- either `sq` object and a vector of names or a `tibble` with columns of sequences and names:
-
-```{r write_fasta, eval=FALSE}
-write_fasta(sq_dna,
-            c("seq1", "seq2", "rev1", "rev2"),
-            "just_your_ordinary_fasta_file.fasta")
-# or
-write_fasta(sqibble,
-            "just_your_ordinary_fasta_file.fasta")
-```
+---
+title: "Quick Start"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Quick Start}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+```
+
+`tidysq` package is meant to store and conduct operations on biological sequences. This vignette provides a guide to basic usage of `tidysq`, i.e. reading, manipulating and writing sequences to file.
+
+The most recent version of `tidysq` can be installed with `install_github()` function from `devtools`.
+
+```{r setup}
+# devtools::install_github("BioGenies/tidysq")
+library(tidysq)
+```
+
+## Sequence creation
+
+Biological sequences can be and often are represented as strings -- sequences of letters. For example, a DNA sequence can take the form of `"TAGGCCCTAGACCTG"`, where `A` means adenine, `C` -- cytosine, `G` -- guanine and `T` -- thymine. Exact IUPAC recommendations for one-letter codes can be found [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC341218/).
+
+Within `tidysq` package sequence data is stored in `sq` objects, that is, vectors of biological sequences. They can be created from string vectors as above:
+
+```{r sq_from_string}
+sq_dna <- sq(c("TAGGCCCTAGACCTG", "TAGGCCCTGGGCATG"))
+sq_dna
+```
+
+There are several thing to note. First, each sequence is an element of `sq` object. Many operations are vectorized --- they are applied to all sequences of a vector --- and `sq` objects are no different in this regard. Second, the first line of output says: `basic DNA sequences list`. This means that all sequences of this object are of DNA type and do not use ambiguous letters (more about that in "Advanced alphabet techniques" vignette).
+
+## Subsetting sequences
+
+Manipulating sequence objects is an integral part of `tidysq`. `sq` objects can be easily subsetted using usual R syntax:
+
+```{r sq_subset}
+sq_dna[1]
+```
+
+Extracting subsequences is a bit more complicated than that --- because it uses designated function `bite()`. Its syntax, however, closely resembles that of base R --- indexing starts with one and negative indices are interpreted as "anything except that". It returns an `sq` object with all sequences subsetted:
+
+```{r sq_bite}
+bite(sq_dna, 5:10)
+bite(sq_dna, c(-9, -11, -13))
+```
+
+It's possible to reverse sequences using this function:
+
+```{r sq_bite_reversing}
+# Don't do it like that!
+bite(sq_dna, 15:1)
+```
+
+However, this usage is strongly discouraged, because it's both ineffective and works badly with sequences of different lengths. Instead, there is a designated function `reverse()`:
+
+```{r sq_reverse}
+reverse(sq_dna)
+```
+
+Note that it is very different to base `rev()`, which reverses only the order of sequences, not letters:
+
+```{r sq_rev}
+rev(sq_dna)
+```
+
+We can combine two or more `sq` objects using base `c()` function:
+
+```{r sq_c}
+sq_dna <- c(sq_dna, reverse(sq_dna))
+sq_dna
+```
+
+## Biological interpretation
+
+`tidysq` offers two functions specific to DNA/RNA sequences, namely `complement()` and `translate()`. The former creates sequences with complementary bases, that is, replaces `A` with `T`, `C` with `G` and *vice versa*. The latter translates input to amino acid sequences using [the translation table with three-letter codons](https://en.wikipedia.org/wiki/DNA_and_RNA_codon_tables).
+
+These functions can be called as shown below:
+
+```{r sq_complement_translate}
+complement(sq_dna)
+translate(sq_dna)
+```
+
+One noteworthy feature here is that translation can be done with any genetic code table of those listed [on this Wikipedia page](https://en.wikipedia.org/wiki/List_of_genetic_codes):
+
+```{r sq_translate_other_table}
+translate(sq_dna, table = 6)
+```
+
+## Finding motifs
+
+Motifs are short subsequences. These are often searched for in biological sequences. `tidysq` has two distinct functions that allow the user to perform such search.
+
+One of them is a `%has%` operator that takes `sq` object and character vector as parameters respectively. It returns a logical vector of the same length as `sq` object, where each element says whether all motifs passed as strings were found in given sequence:
+
+```{r sq_has}
+sq_dna %has% "ATC"
+# It can be used to subset sq
+sq_dna[sq_dna %has% c("AG", "CC")]
+```
+
+It says nothing about motif placement within sequence nor it exact form, however. In this case, there is `find_motifs()` function that returns a whole `tibble` (from `tibble` package; basically improved version of `data.frame`) with various info about found motifs. Important thing to note here is that the second argument is a character vector of sequence names to avoid embedding potentially long sequences in resulting `tibble` potentially many times:
+
+```{r sq_find_motifs}
+find_motifs(sq_dna, c("seq1", "seq2", "rev1", "rev2"), c("ATC", "TAG"))
+```
+
+You can also provide this function with a `data.frame` (or, what we recommend, `tibble`) containing one column called `sq`, containing the sequences and the other column `name` containing the names.
+
+```{r sqibble_find_motifs}
+sqibble <- tibble::tibble(sq = sq_dna, 
+                          name = c("seq1", "seq2", "rev1", "rev2"))
+
+# does the same as the call from previous chunk of code
+find_motifs(sqibble, c("ATC", "TAG"))
+```
+
+There are ambiguous DNA bases in IUPAC codes and these can be used in motifs. One of them is `"N"` --- its meaning is "any of `A`, `C`, `G` or `T`:
+
+```{r sq_find_motifs_amb}
+find_motifs(sqibble, "GNCC")
+```
+
+This example displays the difference between `"sought"` and `"found"` columns. The former contains the string representation of motif that the user was looking for, while the latter contains a `tidysq`-encoded sequence with an "instance" of motif.
+
+Two additional characters are reserved because of their special meaning in motifs. `"^"` means that this motif must be found at the start of a sequence, while `"$"` means the same, but with the end instead. They can be mixed with ambiguous letters, of course:
+
+```{r sq_find_motifs_start_end}
+find_motifs(sqibble, c("^TAG", "ATN$"))
+```
+
+## Exporting sq objects
+
+After doing computations the user might wish to save their sequences for future use. One of the most popular formats for storing biological sequences is FASTA. `tidysq` allows the user to write sequences to FASTA file with `write_fasta()` function. Important thing to remember here that the arguments for the function are analogous to those used in `find_motifs()` -- either `sq` object and a vector of names or a `tibble` with columns of sequences and names:
+
+```{r write_fasta, eval=FALSE}
+write_fasta(sq_dna,
+            c("seq1", "seq2", "rev1", "rev2"),
+            "just_your_ordinary_fasta_file.fasta")
+# or
+write_fasta(sqibble,
+            "just_your_ordinary_fasta_file.fasta")
+```

From e2fac88f61e857b8b06f24bbd370de2d0ed7853b Mon Sep 17 00:00:00 2001
From: DominikRafacz <dominikrafacz@gmail.com>
Date: Thu, 26 Sep 2024 17:34:12 +0200
Subject: [PATCH 2/3] add cran comments

---
 cran-comments.md | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/cran-comments.md b/cran-comments.md
index 4363b7f..72f33d2 100644
--- a/cran-comments.md
+++ b/cran-comments.md
@@ -1,11 +1,6 @@
-## Test environments
-* local R installation, R 4.1.0
-* ubuntu 16.04 (on travis-ci), R 4.1.0
-* win-builder (devel)
-
 ## R CMD check results
 
-0 errors | 0 warnings | 1 note
+0 errors | 0 warnings | 0 notes
 
 * This is a resubmission.
-* Fixed the problem with deprecated usage of iterator
+* Fixed issues related to new implementations of set operations on R-devel

From d8a368ddf8f2bf173893c2b02aa4a1f8aad760ea Mon Sep 17 00:00:00 2001
From: DominikRafacz <dominikrafacz@gmail.com>
Date: Sun, 29 Sep 2024 17:28:42 +0200
Subject: [PATCH 3/3] fix unavailable URLs

---
 README.Rmd                |  1 -
 README.md                 | 13 ++++++-------
 docs/index.html           | 11 +++++------
 vignettes/quick-start.Rmd |  2 +-
 4 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/README.Rmd b/README.Rmd
index 36fdb08..c013d84 100644
--- a/README.Rmd
+++ b/README.Rmd
@@ -17,7 +17,6 @@ knitr::opts_chunk$set(
 <!-- badges: start -->
 [![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/tidysq)](https://cran.r-project.org/package=tidysq)
   [![Github Actions Build Status](https://github.com/BioGenies/tidysq/workflows/R-CMD-check-bioc/badge.svg)](https://github.com/BioGenies/tidysq/actions)
-  [![codecov.io](https://codecov.io/github/BioGenies/tidysq/coverage.svg?branch=master)](https://codecov.io/github/BioGenies/tidysq?branch=master) 
   [![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
 <!-- badges: end -->
 
diff --git a/README.md b/README.md
index e2aa744..dc733f7 100644
--- a/README.md
+++ b/README.md
@@ -6,7 +6,6 @@
 [![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/tidysq)](https://cran.r-project.org/package=tidysq)
 [![Github Actions Build
 Status](https://github.com/BioGenies/tidysq/workflows/R-CMD-check-bioc/badge.svg)](https://github.com/BioGenies/tidysq/actions)
-[![codecov.io](https://codecov.io/github/BioGenies/tidysq/coverage.svg?branch=master)](https://codecov.io/github/BioGenies/tidysq?branch=master)
 [![Lifecycle:
 experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
 <!-- badges: end -->
@@ -17,11 +16,11 @@ experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](h
 sequences (including amino acid and nucleic acid – e.g. RNA, DNA –
 sequences). Two major features of this package are:
 
--   effective compression of sequence data, allowing to fit larger
-    datasets in **R**,
+- effective compression of sequence data, allowing to fit larger
+  datasets in **R**,
 
--   compatibility with most of `tidyverse` universe, especially `dplyr`
-    and `vctrs` packages, making analyses *tidier*.
+- compatibility with most of `tidyverse` universe, especially `dplyr`
+  and `vctrs` packages, making analyses *tidier*.
 
 ## Getting started
 
@@ -70,7 +69,7 @@ sqibble
 #>  8 VHPQKLVFF <15> AMY24|HABP2|Amyloid beta A4 peptide
 #>  9 VHHPKLVFF <15> AMY25|HABP3|Amyloid beta A4 peptide
 #> 10 VHHQPLVFF <15> AMY26|HABP4|Amyloid beta A4 peptide
-#> # … with 411 more rows
+#> # ℹ 411 more rows
 
 sq_ami <- sqibble$sq
 sq_ami
@@ -156,7 +155,7 @@ sqibble %>%
 #>  8 VHHQEKLVF <16> AMY35|HABP13|Amyloid beta A4 peptide     16
 #>  9 VHHQEKLVF <16> AMY36|HABP14|Amyloid beta A4 peptide     16
 #> 10 KKLVFFAED  <9> AMY37|HABP15|Amyloid beta A4 peptide      9
-#> # … with 14 more rows
+#> # ℹ 14 more rows
 ```
 
 ## Citation
diff --git a/docs/index.html b/docs/index.html
index b531d47..f808ade 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -47,7 +47,7 @@
 <li>
   <a href="index.html">
     <span class="fas fa-home fa-lg"></span>
-     
+
   </a>
 </li>
 <li>
@@ -56,7 +56,7 @@
 <li class="dropdown">
   <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">
     Articles
-     
+
     <span class="caret"></span>
   </a>
   <ul class="dropdown-menu" role="menu">
@@ -76,7 +76,7 @@
 <li>
   <a href="https://github.com/BioGenies/tidysq/">
     <span class="fab fa-github fa-lg"></span>
-     
+
   </a>
 </li>
       </ul>
@@ -87,7 +87,7 @@
 </div>
 <!--/.navbar -->
 
-      
+
 
       </header><div class="row">
   <div class="contents col-md-9">
@@ -274,7 +274,6 @@ <h2>Dev status</h2>
 <ul class="list-unstyled">
 <li><a href="https://cran.r-project.org/package=tidysq"><img src="http://www.r-pkg.org/badges/version/tidysq" alt="CRAN_Status_Badge"></a></li>
 <li><a href="https://github.com/BioGenies/tidysq/actions"><img src="https://github.com/BioGenies/tidysq/workflows/R-CMD-check-bioc/badge.svg" alt="Github Actions Build Status"></a></li>
-<li><a href="https://codecov.io/github/BioGenies/tidysq?branch=master"><img src="https://codecov.io/github/BioGenies/tidysq/coverage.svg?branch=master" alt="codecov.io"></a></li>
 <li><a href="https://lifecycle.r-lib.org/articles/stages.html#experimental"><img src="https://img.shields.io/badge/lifecycle-experimental-orange.svg" alt="Lifecycle: experimental"></a></li>
 </ul>
 </div>
@@ -293,7 +292,7 @@ <h2>Dev status</h2>
       </footer>
 </div>
 
-  
+
 
 
   </body>
diff --git a/vignettes/quick-start.Rmd b/vignettes/quick-start.Rmd
index 6225947..1442093 100644
--- a/vignettes/quick-start.Rmd
+++ b/vignettes/quick-start.Rmd
@@ -25,7 +25,7 @@ library(tidysq)
 
 ## Sequence creation
 
-Biological sequences can be and often are represented as strings -- sequences of letters. For example, a DNA sequence can take the form of `"TAGGCCCTAGACCTG"`, where `A` means adenine, `C` -- cytosine, `G` -- guanine and `T` -- thymine. Exact IUPAC recommendations for one-letter codes can be found [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC341218/).
+Biological sequences can be and often are represented as strings -- sequences of letters. For example, a DNA sequence can take the form of `"TAGGCCCTAGACCTG"`, where `A` means adenine, `C` -- cytosine, `G` -- guanine and `T` -- thymine. Exact IUPAC recommendations for one-letter codes can be found in *Cornish-Bowden A. Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. 1985 May 10;13(9):3021-30. doi: 10.1093/nar/13.9.3021. PMID: 2582368; PMCID: PMC341218*.
 
 Within `tidysq` package sequence data is stored in `sq` objects, that is, vectors of biological sequences. They can be created from string vectors as above: