Update documentation (#85)

MrHedmad · Mar 19, 2024 · 7be0281 · 7be0281
2 parents 32a2355 + 1044421
commit 7be0281
Show file tree

Hide file tree

Showing 11 changed files with 96 additions and 82 deletions.
diff --git a/docs/src/about.md b/docs/src/about.md
@@ -1,15 +1,15 @@
 # About
 
-This package aggregates a series of meta information about Kerblam!.
+This page aggregates a series of meta information about Kerblam!.
 
 ### License
 The project is licensed with the [MIT License](https://github.com/MrHedmad/kerblam/blob/main/LICENSE).
 Read [here](https://choosealicense.com/licenses/mit/) for the [choose a license](https://choosealicense.com)
 entry of the license.
 
 ### Citing
-If you want to cite Kerblam!, provide a link to the Github repository or use
-the following Zenodo DOI: [doi.org/10.5281/zenodo.10664806](https://zenodo.org/doi/10.5281/zenodo.10664806)
+If you want or need to cite Kerblam!, provide a link to the Github repository or use
+the following Zenodo DOI: [doi.org/10.5281/zenodo.10664806](https://zenodo.org/doi/10.5281/zenodo.10664806).
 
 ### Naming
 This project is named after the fictitious online shop/delivery company in
@@ -22,14 +22,14 @@ The Kerblam! logo is written in the [Kwark Font](https://www.1001fonts.com/kwark
 
 This book is rendered by [`mdbook`](https://github.com/rust-lang/mdBook), and
 is written as a series of markdown files.
-Its source code is available in [the Kerblam! repo](https://github.com/MrHedmad/kerblam).
+Its source code is available in [the Kerblam! repo](https://github.com/MrHedmad/kerblam)
+under the `./docs/` folder.
 
 The book hosted online always refers to the
 [latest Kerblam! release](https://github.com/MrHedmad/kerblam/releases).
-
 If you are looking for older or newer versions of this book, you should
 read the markdown files directly [on Github](https://github.com/MrHedmad/kerblam/tree/main/docs),
 where you can select which tag to view from the top bar, or clone the repository
-locally, checkout to the commit you like, and rebuiding from source.
+locally, checkout to the commit you like, and rebuild from source.
 If you're interested, read [the development guide](dev/contributing.html) to
 learn more.
diff --git a/docs/src/install.md b/docs/src/install.md
@@ -4,14 +4,15 @@ You have a few options when installing Kerblam!.
 ### Requirements
 Currently, Kerblam! only supports mac OS (both intel and apple chips) and GNU linux.
 Other unix/linux versions *may* work, but are untested.
-It also uses binaries that it assumes are already installed:
+It also uses binaries that it assumes are already installed and visible from your `$PATH`:
 - GNU `make`: [gnu.org/software/make](https://gnu.org/software/make);
 - `git`: [git-scm.com](https://git-scm.com/)
 - Docker (as `docker`) and/or Podman (as `podman`):
   [docker.com](https://docker.com/) and/or [podman.io](https://podman.io);
 - `tar`: [gnu.org/software/tar](https://www.gnu.org/software/tar/).
+- `bash`: [gnu.org/software/bash](https://www.gnu.org/software/bash/).
 
-If you can use `git`, `make`, `tar` and `docker` or `podman` from your CLI,
+If you can use `git`, `make`, `tar`, `bash` and `docker` or `podman` from your CLI,
 you're good to go!
 
 Most if not all of these tools come pre-packaged in most linux distros.
@@ -27,8 +28,10 @@ You can always install or update to the latest version with:
 ```bash
 curl --proto '=https' --tlsv1.2 -LsSf https://github.com/MrHedmad/kerblam/releases/latest/download/kerblam-installer.sh | sh
 ```
+Be warned that the above command executes a script downloaded from the internet.
 You can [click here](https://github.com/MrHedmad/kerblam/releases/latest/download/kerblam-installer.sh)
-to download the same installer script and inspect it before you run it, if you'd like.
+or manually follow the fetched URL above to download the same installer script
+and inspect it before you run it, if you'd like.
 
 ### Install from source
 If you want to install the latest version from source, install Rust and `cargo`, then run:

diff --git a/docs/src/landing.md b/docs/src/landing.md
@@ -1,25 +1,16 @@
+![If you want it, kerblam it!](https://gist.github.com/MrHedmad/a5c719bbc22d982425fcd23f9e1d448c/raw/53833514b7701501c1c6b696156504b39b668f81/kerblam.dev_fig.png)
 
-![If you want it, Kerblam it!](https://raw.githubusercontent.com/MrHedmad/kerblam/main/docs/images/logo.png)
-
-**Kerblam! is a Rust command line tool to manage the execution of scientific data
+Kerblam! is a Rust command line tool to manage the execution of scientific data
 analysis, where having reproducible results and sharing the executed pipelines
 is important. It makes it easy to write multiple analysis pipelines and select
-what data is analysed.**
+what data is analysed.
 
 With Kerblam! your analyses will be less bloated, more organized, and more
 reproducible.
 
-##### Click on the images to see the same videos on asciinema.org!
-
-[![If you see this, open an issue. The GIF is dead.](https://s9.gifyu.com/images/SFNkp.gif)](https://asciinema.org/a/641448)
-
-After you execute your pipelines, you can export them to others for reproduction:
-
-[![If you see this, open an issue. The GIF is dead.](https://s9.gifyu.com/images/SF6tA.gif)](https://asciinema.org/a/643038)
-
 Kerblam! is a Free and Open Source Software, hosted on Github at 
 [MrHedmad/kerblam](https://github.com/MrHedmad/kerblam).
-The code is licensed with the [MIT License](https://github.com/MrHedmad/kerblam/blob/main/LICENSE).
+The code is licensed under the [MIT License](https://github.com/MrHedmad/kerblam/blob/main/LICENSE).
 
 Use the sidebar to jump to a specific section.
 If you have never used Kerblam! before, you can read the documentation from start

diff --git a/docs/src/quickstart.md b/docs/src/quickstart.md
@@ -8,29 +8,27 @@ Kerblam! is a *project manager*. It helps you write clean, concise data analysis
 pipelines, and takes care of chores for you.
 
 Every Kerblam! project has a `kerblam.toml` file in its root.
-Kerblam! looks for files in different folders relative to the `kerblam.toml`
-file to manage your project.
+When Kerblam! looks for files, it does it relative to the position of the
+`kerblam.toml` file and in specific, pre-determined folders.
 This helps you keep everything in its place, so that others that are unfamiliar
-with your project can understand it if they ever need to review it.
+with your project can understand it if they ever need to look at it.
 
-These folders are as follows:
-- `kerblam.toml`: This file contains the options for Kerblam!.
-  It is often empty for simple projects.
-- `data/`: Where all the project's data is saved.
+These folders, relative to where the `kerblam.toml` file is, are:
+- `./data/`: Where all the project's data is saved.
   Intermediate data files are specifically saved here.
-- `data/in/`: Input data files are saved and should be looked for in here.
-- `data/out/`: Output data files are saved and should be looked for in here.
-- `src/`: Code you want to be executed should be saved here.
-- `src/pipes/`: Makefiles and bash build scripts should be saved here.
+- `./data/in/`: Input data files are saved and should be looked for in here.
+- `./data/out/`: Output data files are saved and should be looked for in here.
+- `./src/`: Code you want to be executed should be saved here.
+- `./src/pipes/`: Makefiles and bash build scripts should be saved here.
   They have to be written as if they were saved in `./`.
-- `src/dockerfiles/`: Container build scripts should be saved here.
+- `./src/dockerfiles/`: Container build scripts should be saved here.
 
 > Any sub-folder of one of these specific folders (with the exception of
 > `src/pipes` and `src/dockerfiles`) contains the same type of files as the
 > parent directory. For instance, `data/in/fastq` is treated as if it contains
 > input data by Kerblam! just as the `data/in` directory is.
 
-You can configure almost all of these paths in the `kerblam.toml`, if you so desire.
+You can configure almost all of these paths in the `kerblam.toml` file, if you so desire.
 This is mostly done for compatibility reasons with non-kerblam! projects.
 New projects that wish to use Kerblam! are strongly encouraged to follow the
 standard folder structure, however.
@@ -40,21 +38,21 @@ standard folder structure, however.
 > your choices in the `kerblam.toml` file.
 
 If you want to convert an existing project to use Kerblam!, you can take a look
-at [the `kerblam.toml` section of the documentation](kerblam.toml.html).
+at [the `kerblam.toml` section of the documentation](kerblam.toml.html) to
+learn how to configure these paths.
 
-If you follow this standard (or you write proper configuration), Kerblam! gives
-you a bunch of benefits:
+If you follow this standard (or you write proper configuration), you can use
+Kerblam! to do a bunch of things:
 - You can run pipelines written in `make` or arbitrary shell files in `src/pipes/`
   as if you ran them from the root directory of your project by simply using
-  `kerblam run <pipe>`.
+  `kerblam run <pipe>`;
 - You can wrap your pipelines in docker containers by just writing new
   dockerfiles in `src/dockerfiles`, with essentially just the installation
-  of the dependencies.
+  of the dependencies, letting Kerblam! take care of the rest;
 - If you have wrapped up pipelines, you can export them for later execution
   (or to send them to a reviewer) with `kerblam package <pipe>` without needing
-  to edit your dockerfiles.
-  - If you have a package from someone else, you can run it with
-    `kerblam replay`.
+  to edit your dockerfiles;
+- If you have a package from someone else, you can run it with `kerblam replay`.
 - You can fetch remote data from the internet with `kerblam data fetch`, see
   how much disk space your project's data is using with `kerblam data` and
   safely cleanup all the files that are not needed to re-run your project with
@@ -67,3 +65,6 @@ The rest of this tutorial walks you through every feature.
 
 I hope you enjoy Kerblam! and that it makes your projects easier to understand,
 run and reproduce!
+
+> If you like Kerblam!, please consider [leaving a star on Github](https://github.com/MrHedmad/kerblam/stargazers).
+> Thank you for supporting Kerblam!
diff --git a/docs/src/tutorial/dockerfiles.md b/docs/src/tutorial/dockerfiles.md
@@ -21,20 +21,28 @@ using `COPY . .`.
 ### The `data` directory is excluded from packages
 If you have a `COPY . .` directive in the dockerfile, it will behave differently
 when you `kerblam run` versus when you `kerblam package`.
-In a run, **the current, local directory is used as-is as a build context**.
+
+When you run `kerblam package`, Kerblam! will create a temporary build context
+with no input data.
+This is what you want: Kerblam! needs to separately package your (precious)
+input data on the side, and copy in the container only code and other execution-specific
+files.
+
+In a run, the current local project directory is used as-is as a build context.
 This means that the `data` directory will be copied over.
 At the same time, Kerblam! will also *mount* the same directory to the running
 container, so the copied files will be "overwritten" by the live mountpoint
 while to container is running.
 
-This generally means that copying the whole data directory is useless. 
+This generally means that copying the whole data directory is useless in a run,
+and that it cannot be done during packaging.
 
 Therefore, a best practice is to ignore the contents of the data folders in the
 `.dockerignore` file.
 This makes no difference while packaging containers but a big difference when
 running them, as docker skips copying the useless data files.
 
-To do this in a standard Kerblam! project, add this to your `.dockerignore`:
+To do this in a standard Kerblam! project, simply add this to your `.dockerignore`:
 ```
 # Ignore the intermediate/output directory
 data
@@ -68,4 +76,4 @@ you place the `COPY . .` directive near the bottom of the dockerfile.
 This way, you can essentially work exclusively in docker and never install
 anything locally.
 
-Kerblam! will name the pipelines as `<pipeline name>_kerblam_runtime`.
+Kerblam! will name the containers for the pipelines as `<pipeline name>_kerblam_runtime`.
diff --git a/docs/src/tutorial/intro_data.md b/docs/src/tutorial/intro_data.md
@@ -5,7 +5,7 @@ project.
 If you follow open science guidelines, chances are that a lot of your data is
 FAIR, and you can fetch it remotely.
 
-Kerblam! is perfect to work with such data. The next sections outline what
+Kerblam! is perfect to work with such data. The next tutorial sections outline what
 Kerblam! can do to help you work with data.
 
 Remember that Kerblam! recognizes what data is what by the location where you 
@@ -31,6 +31,3 @@ The total size of all the files in the `./data/` folder is then broken down
 between categories: the `Total` data size, how much data can be removed with
 `kerblam data clean` or `kerblam data pack`, and how many files are specified
 to be downloaded but are not yet present locally.
-
-You can manipulate your data with `kerblam data` in several ways.
-In the following sections we explain every one of these ways.
diff --git a/docs/src/tutorial/package_data.md b/docs/src/tutorial/package_data.md
@@ -12,7 +12,3 @@ non-remotely-available `.data/in` files and the files in `./data/out`.
 You can also pass the `--cleanup` flag to also delete them after packing.
 
 You can then share the data pack with others.
-
-This is pretty useful if you have [packaged a pipeline](package_pipes.html) and 
-would like to send just the precious input data to whomever needs to reproduce
-your work.
diff --git a/docs/src/tutorial/package_pipes.md b/docs/src/tutorial/package_pipes.md
@@ -5,23 +5,23 @@ It allows you to package everything needed to execute a pipeline in a docker
 container and export it for execution later.
 
 You must have a matching dockerfile for every pipeline that you want to package,
-or Kerblam! wont know what to package your pipeline into.
+or Kerblam! won't know what to package your pipeline into.
 
 For example, say that you have a `process` pipe that uses `make` to run, and 
 requires both a remotely-downloaded `remote.txt` file and a local-only
 `precious.txt` file.
 
-If you execute
+If you execute:
 ```bash
 kerblam package process --tag my_process_package
 ```
 Kerblam! will:
-- Create a temporary context;
+- Create a temporary build context;
 - Copy all non-data files to the temporary context;
 - Build the specified dockerfile as normal, but using this temporary context;
 - Create a new `Dockerfile` that:
   - Inherits from the image built before;
-  - Copies the Kerblam! executable to the root of the dockerfile;
+  - Copies the Kerblam! executable to the root of the container;
   - Configure the default execution command to something suitable for execution
     (just like `kerblam run` does, but "baked in").
 - Build the docker container and tag it with `my_process_package`;
@@ -54,7 +54,7 @@ The responsibility of having the resulting docker work in the long-term is
 up to you, not Kerblam!
 For most cases, just having `kerblam run` work is enough for the resulting
 package made by `kerblam package` to work, but depending on your docker
-files this might not be the case.\
+files this might not be the case.
 Kerblam! does not test the resulting package - it's up to you to do that.
 It's best to try your packaged pipeline once before shipping it off.
 

diff --git a/docs/src/tutorial/pipe_docstrings.md b/docs/src/tutorial/pipe_docstrings.md
@@ -15,7 +15,7 @@ in the makefile/shellfile itself. Using the same example as above:
 #? Calculate the sums of the input metrics
 #?
 #? The script takes the input metrics, then calculates the row-wise sums.
-#? These are important since the metrics refer to the calculation.
+#? These are useful since we can refer to this calculation later.
 
 ./data/out/output.csv: ./data/in/input.csv ./src/calc_sum.py
     cat $< | ./src/calc_sum.py > $@

diff --git a/docs/src/tutorial/run.md b/docs/src/tutorial/run.md
@@ -12,7 +12,6 @@ installed on your system this way, e.g. `snakemake` or `nextflow`.
 
 Make has a special execution policy to allow it to work with as little boilerplate
 as possible.
-
 You can read more on Make [in the GNU Make book](https://www.gnu.org/software/make/manual/make.pdf).
 
 `kerblam run` supports the following flags:
@@ -22,11 +21,19 @@ You can read more on Make [in the GNU Make book](https://www.gnu.org/software/ma
 - `--local` (`-l`): Skip [running in a container](run_containers.html), if a
   container is available, preferring a local run.
 
+In short, `kerblam run` does something similar to this:
+- Move your `pipe.sh` or `pipe.makefile` file in the root of the project,
+  under the name `executor`;
+- Launch `make -f executor` or `bash executor` for you.
+
+This is why pipelines are written as if they are executed in the root of the
+project, because they are.
+
 ## Data Profiles - Running the same pipelines on different data
 
 You can run your same pipelines, *as-is*, on different data thanks to data profiles.
 
-By default, Kerblam! will use your `./data/in/` folder as-is when executing pipes.
+By default, Kerblam! will use your untouched `./data/in/` folder when executing pipes.
 If you want the same pipes to run on different sets of input data, Kerblam! can
 temporarily swap out your real data with this 'substitute' data during execution.
 
@@ -37,8 +44,7 @@ to this alternative one.
 However, you then have to maintain two essentially identical
 pipelines, and you are prone to adding errors while you modify it (what if you
 forget to change one reference to the original file?).
-You can use `kerblam` to do the same, but in a declarative, less-error prone and
-easy way.
+You can use `kerblam` to do the same, but in an easy, declarative and less-error-prone way.
 
 Define in your `kerblam.toml` file a new section under `data.profiles`:
 ```toml
@@ -51,15 +57,19 @@ You can then run the same makefile with the new data with:
 ```
 kerblam run process_csv --profile alternate
 ```
+
+> Paths under every profile section are relative to the input data directory,
+> by default `data/in`.
+
 Under the hood, Kerblam! will:
 - Rename `input.csv` to `input.csv.original`;
 - Move `different_input.csv` to `input.csv`;
 - Run the analysis as normal;
-- When the run ends (or the analysis crashes), Kerblam! will undo the move
-  and rename `input.csv.original` back to `input.csv`.
+- When the run ends (it finishes, it crashes or you kill it), Kerblam! will undo both actions:
+  it moves `different_input.csv` back to its original place and
+  renames `input.csv.original` back to `input.csv`.
 
-This effectively causes the makefile run with different input data in this
-alternate run.
+This effectively causes the makefile to run with different input data.
 
 > Careful that the *output* data will (most likely) be saved as the
 > same file names as a "normal" run!
@@ -69,7 +79,7 @@ alternate run.
 > If you really want to, use the `KERBLAM_PROFILE` environment variable
 > described below and change the output paths accordingly.
 
-This is most commonly useful to run the pipelines on test data that is faster to
+Profiles are most commonly useful to run the pipelines on test data that is faster to
 process or that produces pre-defined outputs. For example, you could define
 something similar to:
 ```toml
@@ -82,17 +92,22 @@ And execute your test run with `kerblam run pipe --profile test`.
 The profiles feature is used so commonly for test data that Kerblam! will
 automatically make a `test` profile for you, swapping all input files in the
 `./data/in` folder that start with `test_xxx` with their "regular" counterparts `xxx`.
-For example, the profile above is redundant!\
+For example, the profile above is redundant!
+
 If you write a `[data.profiles.test]` profile yourself, Kerblam! will not
 modify it in any way, effectively disabling the automatic test profile feature.
 
-All file paths specified under the `profiles` tab must be relative to the `./data/in/`
-folder.
-
 Kerblam! tries its best to cleanup after itself (e.g. undo profiles,
 delete temporary files, etc...) when you use `kerblam run`, even if the pipe
 fails, and even if you kill your pipe with `CTRL-C`.
 
+> If your pipeline is unresponsive to a `CTRL-C`, pressing it twice (two
+> `SIGTERM` signals in a row) will kill Kerblam! instead, leaving the child
+> process to be cleaned up by the OS and the (eventual) profile not cleaned up.
+>
+> This is to allow you to stop whatever Kerblam! or the pipe is doing in
+> case of emergency.
+
 Kerblam! will run the pipelines with the environment variable `KERBLAM_PROFILE`
 set to whatever the name of the profile is.
 In this way, you can detect from inside the pipeline if you are in a profile or not.