CWL on Kubernetes
Calrissian is a CWL implementation designed to run inside a Kubernetes cluster. Its goal is to be highly efficient and scalable, taking advantage of high capacity clusters to run many steps in parallel.
Calrissian requires a Kubernetes or Openshift/OKD cluster, configured to provision PersistentVolumes with the ReadWriteMany
access mode. Kubernetes installers and cloud providers don't usually include this type of storage, so it may require additional configuration.
Calrissian has been tested with NFS using the nfs-client-provisioner and with GlusterFS using OKD Containerized GlusterFS. Many cloud providers have an NFS offering, which integrates easily using the nfs-client-provisioner.
Calrissian is designed to issue tasks in parallel if they are independent, and thanks to Kubernetes, should be able to run very large parallel workloads.
When running calrissian
, you must provide a limit the the number of CPU cores (--max-cores
) and RAM megabytes (--max-ram
) to use concurrently. Calrissian will use CWL ResourceRequirements to track usage and stay within the limits provided. We highly recommend using accurate ResourceRequirements in your workloads, so that they can be scheduled efficiently and are less likely to be terminated or refused by the cluster.
calrissian
parameters can be provided via a JSON configuration file either stored under ~/.calrissian/default.json
or provided via the --conf
option.
Below an example of such a file:
{
"max_ram": "16G",
"max_cores": "10",
"outdir": "/calrissian",
"tmpdir_prefix": "/calrissian/tmp"
}
Calrissian leverages cwltool heavily and most conformance tests for CWL v1.0. Please see conformance for further details and processes.
To view open issues related to conformance, see the conformance label on the issue tracker.
Please see examples for installation and setup instructions.
Calrissian's behaviors can be customized by setting the following environment variables in the container specification.
By default, pods for a job step will be deleted after termination
CALRISSIAN_DELETE_PODS
: Defaulttrue
. Iffalse
, job step pods will not be deleted.
When encountering a Kubernetes API exception, Calrissian uses a library to retry API calls with an exponential backoff. See the tenacity documentation for details.
RETRY_MULTIPLIER
: Default5
. Unit for multiplying the exponent interval.RETRY_MIN
: Default5
. Minimum interval between retries.RETRY_MAX
: Default1200
. Maximum interval between retries.RETRY_ATTEMPTS
: Default10
. Max number of retries before giving up.
Note that for development you can just use [Hatch] directly as described below.
The main tool that is used for development is [Hatch]. It manages dependencies (in a virtualenv that is created on the fly) and is also the command runner.
So first, [install it][install Hatch]. Ideally in an isolated way with pipx install hatch
(after [installing pipx
]), or just pip install hatch
as a more well-known way.
hatch run test:test
Verbose:
hatch run test:testv
hatch run test:cov
hatch run calrissian
hatch run docs:serve