Skip to content

A Rust CLI to manage your data analysis projects in a streamlined, easy and simple way, getting reproducibility for free.

License

Notifications You must be signed in to change notification settings

MrHedmad/kerblam

Folders and files

NameName
Last commit message
Last commit date
Jan 29, 2024
Jan 24, 2024
Jan 30, 2024
Dec 6, 2023
Nov 18, 2023
Nov 30, 2023
Jan 30, 2024
Jan 29, 2024
Nov 29, 2023
Jan 24, 2024
Nov 27, 2023

Repository files navigation

If you want it, Kerblam it!

GitHub issues GitHub License GitHub Repo stars All Contributors

Warning

kerblam run and kerblam package are complete but still without sigificant testing. Always have a backup of your data and code! Report any problems in the issues.

Kerblam! is a tool that can help you manage data analysis projects.

A Kerblam! project has a kerblam.toml file in its root. Kerblam! allows you to:

  • Access remote data quickly, by just specifying URLs to fetch from;
  • Package and export data in order to share the project with colleagues;
  • Manage and run multiple makefiles for different tasks;
  • Leverage git to isolate, rollback and run the project at a different point in time;
  • Clean up intermediate and output files quickly;
  • Manage Docker environments and run code in them for you.
  • Manage the content of your .gitignore for you, allowing to add files, directories and even whole languages in one command.
  • Make it easy to use pre-commit by managing .pre-commit-hooks.
  • Specify test data to run and quickly use it instead of real data.

To transform a project to a Kerblam! project just make the kerblam.toml file yourself. To learn how, look at the section below.

Overview

  • kerblam new can be used to create a new kerblam! project. Kerblam! asks you if you want to use some common programming languages and sets up a proper .gitignore and pre-commit hooks for you.
  • kerblam data fetches remote data and saves it locally, manages local data and can clean it up, preserving only files that must be preserved. It also shows you how much local data is on the disk, how much data is remote and how much disk space you can free without losing anything important.
  • kerblam package packages your pipeline and exports a docker image for execution later. It's useful for reproducibility purposes as the docker image is primed for execution, bundling the kerblam! executable, execution files and non-remote data in the blob itself.
  • kerblam run executes the analysis for you, by choosing your makefiles and dockerfiles appropriately and building docker containers as needed. Optionally, allows test data or alternative data to be used instead of real data, in order to test your pipelines.
  • kerblam ignore can edit your .gitignore file by adding files, folders and GitHub's recommended ignores for specific languages in just one command.

Kerblam! is not and does not want to be:

  • A pipeline manager like snakemake and nextflow: It supports and helps you execute make, but it does not interfere from then on;
  • A replacement for any of the tools it leverages (e.g. git, docker, pre-commit);
  • Something that insulates you from the nuances of writing good, correct pipelines and Dockerfiles.
    Specifically, Kerblam! will never:
    • Parse your .gitignore, .dockerignore, pipes or Dockerfiles to check for errors or potential issues;
    • Edit code for you (with the exception of a tiny bit of wrapping to allow kerblam package to work);
    • Handle any errors produced by the pipelines or containers.
  • A tool that covers every edge case. Implementing more features for popular and widespread tasks is perfectly fine, but Kerblam! will never have a wall of options for you to choose from. If you need more advanced control on what is done, you should directly use the tools that Kerblam! leverages.

Tip

Kerblam! works with you, not for you!

Opinions

Tip

If you wish to learn more on why these design choices were made, please take a look at the kerblam! philosophy.

Kerblam! projects are opinionated:

  • The folder structure of your project adheres to the Kerblam! standard, although you may configure it in kerblam.toml. Read about it below.
  • You use make or bash scripts as your pipeline manager.
  • You use docker as your virtualisation service.
  • You use git as your version control system. Additionally, you create tags with git to record important previous versions of your project.
  • You execute your pipelines in a Docker container, and not in your development environment.
  • Most of your input data is remotely downloadable, especially for large and bulky files.

If you don't like this setup, Kerblam! is not for you.

Folder structure

Kerblam!, by default, requires the following folder structure (relative to the root of the project, ./):

  • ./kerblam.toml: This file contains the options for Kerblam!. It is usually empty.
  • ./data/: This is a directory for the data. Intermediate data files are held here.
  • ./data/in/: Input data files are saved and should be looked for, in here.
  • ./data/out/: Output data files are saved and should be looked for, in here.
  • ./src/: Code you want to be executed should be saved here.
  • ./src/pipes/: Makefiles and bash build scripts should be saved here. They have to be written as if they were saved in ./.
  • ./src/dockerfiles/: Dockerfiles should be saved here.

You can configure almost all of these paths in kerblam.toml, if you so desire. This is mostly done for compatibility reasons with non-kerblam! projects. New projects that wish to use Kerblam! are strongly encouraged to follow the standard folder structure.

Contributing

To contribute, please take a look at the contributing guide.

Code is not the only thing that you can contribute. Written a guide? Considered a new feature? Wrote some docstrings? Found a bug? All of these are meaningful and important contributions. For this reason, all contributors are listed in the contributing guide.

If you use Kerblam! or want to add your opinion to the direction it is taking, take a look at the issues labelled with RFC. They are requests for comments where you can say your opinion on new features.

Thank you for taking an interest in Kerblam! Any help is really appreciated.

Licensing and citation

Kerblam! is licensed under the MIT License. If you wish to cite Kerblam!, please provide a link to this repository.

Naming

This project is named after the fictitious online shop/delivery company in S11E07 of Doctor Who. Kerblam! might be referred to as Kerblam!, Kerblam or Kerb!am, interchangeably, although Kerblam! is preferred. The Kerblam! logo is written in the Kwark Font by tup wanders.

Installation

You can find and download a Kerblam! binary for your operating system in the releases tab.

Currently, Kerblam! only supports mac OS (both intel and apple chips) and GNU linux. Other linux versions may work. Install them from source with the command below.

Pre-compiled binary (recommended)

There are also helpful scripts that automatically download the correct version for your specific operating system thanks to cargo-dist. You can install the latest version with:

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/MrHedmad/kerblam/releases/latest/download/kerblam-installer.sh | sh

You can click here to download the same installer script and inspect it before you run it, if you'd like.

Install from source

If you want to install from source, install Rust and cargo, then run:

cargo install --git https://github.com/MrHedmad/kerblam.git

The main branch should always compile on supported platforms with the above command. If it does not, please open an issue.

Requirements

Kerblam! requires a Linux (or generally unix-like) OS. It also uses binaries that it assumes are already installed:

If you can use git, make, tar and docker from your CLI, you should be good.

Documentation

The Kerblam! documentation is in the /docs folder. Please take a look there for more information on what Kerblam! can do. For example, you might find the tutorial interesting.

Adding the Kerblam! badge

You can add a Kerblam! badge in the README of your project to show that you use Kerblam! Just copy the following code and add it to the README:

![Kerblam!](https://img.shields.io/badge/Kerblam!-v0.3.0-blue?logo=&link=https%3A%2F%2Fgithub.com%2FMrHedmad%2Fkerblam)

Warning

The code is very long - the kerblam! logo is baked in as a base64 image.


And remember! If you want it...

Kerblam it!

About

A Rust CLI to manage your data analysis projects in a streamlined, easy and simple way, getting reproducibility for free.

Topics

Resources

License

Citation

Stars

Watchers

Forks