Skip to content

Latest commit

 

History

History
1465 lines (1179 loc) · 40.1 KB

env-vars.adoc

File metadata and controls

1465 lines (1179 loc) · 40.1 KB

VDI Stack Environment Configuration

Environment Variables

The following is an index of all the possible environment variables that may be used to configure the VDI service stack.

Examples

Minimal Example

This example is the minimal configuration necessary to spin up the service.

Important

This configuration does not include the database configuration for any application databases or plugin handler service configurations.

DATASET_DIRECTORY_SOURCE_PATH=/var/www/Common/userDatasets/vdi_datasets_feat_s/
DATASET_DIRECTORY_TARGET_PATH=/datasets

AUTH_SECRET_KEY=
ADMIN_AUTH_TOKEN=
LDAP_SERVER=
ORACLE_BASE_DN=ou=applications,dc=apidb,dc=org

USER_DB_TNS_NAME=apicommn
USER_DB_USER=
USER_DB_PASS=
USER_DB_POOL_SIZE=5

GLOBAL_RABBIT_USERNAME=someUser
GLOBAL_RABBIT_PASSWORD=somePassword
GLOBAL_RABBIT_HOST=rabbit-external
GLOBAL_RABBIT_VDI_EXCHANGE_NAME=vdi-bucket-notifications
GLOBAL_RABBIT_VDI_QUEUE_NAME=vdi-bucket-notifications
GLOBAL_RABBIT_VDI_ROUTING_KEY=vdi-bucket-notifications

KAFKA_SERVERS=kafka:9092
KAFKA_PRODUCER_CLIENT_ID=vdi-event-router
KAFKA_CONSUMER_GROUP_ID=vdi-kafka-consumers
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092

S3_HOST=minio-external
S3_PORT=9000
S3_USE_HTTPS=true
S3_ACCESS_TOKEN=someToken
S3_SECRET_KEY=someSecretKey
S3_BUCKET_NAME=some-other-bucket

CACHE_DB_USERNAME=someUser
CACHE_DB_PASSWORD=somePassword
CACHE_DB_NAME=vdi
CACHE_DB_HOST=cache-db

SITE_BUILD=build-65

PLUGIN_HANDLER_NOOP_NAME=noop
PLUGIN_HANDLER_NOOP_DISPLAY_NAME="Example Plugin"
PLUGIN_HANDLER_NOOP_VERSION=1.0
PLUGIN_HANDLER_NOOP_ADDRESS=plugin-example:80

DB_CONNECTION_ENABLED_{SOME_PROJECT}=true
DB_CONNECTION_NAME_{SOME_PROJECT}=ProjectDB
DB_CONNECTION_LDAP_{SOME_PROJECT}=dbTnsName
DB_CONNECTION_PASS_{SOME_PROJECT}=someDBPass
DB_CONNECTION_DATA_SCHEMA_{SOME_PROJECT}=vdi_datasets_dev_n
DB_CONNECTION_CONTROL_SCHEMA_{SOME_PROJECT}=vdi_control_dev_n

Modules

Rest Service

Req. Name Type Description

SERVER_PORT

uint16

Port exposed and used by the VDI REST API service.

LDAP_SERVER

List<HostAddress>

ORACLE_BASE_DN

String

AUTH_SECRET_KEY

String

Secret key value used to decode and validate WDK user tokens for user authentication.

ADMIN_AUTH_TOKEN

String

Auth token value used to authenticate requests to administration endpoints.

ENABLE_CORS

boolean

Enable cross origin requests (used for development)

MAX_UPLOAD_FILE_SIZE

uint64

Max file size allowed for a single upload in bytes.

USER_UPLOAD_QUOTA

uint64

Quota cap for an individual user’s total uploads in bytes.

Hard Delete Trigger Handler

Req. Name Type Description

HARD_DELETE_HANDLER_WORKER_POOL_SIZE

uint8

Number of workers to use while processing hard-delete events.

HARD_DELETE_HANDLER_WORK_QUEUE_SIZE

uint16

Size the worker pool job queue is allowed to fill to before blocking.

HARD_DELETE_HANDLER_KAFKA_CONSUMER_CLIENT_ID

String

Kafka client ID for the KafkaConsumer that will be used to receive messages from the VDI Kafka instance.

THIS VALUE MUST BE UNIQUE ACROSS ALL KAFKA CLIENT IDS

Import Trigger Handler

Req. Name Type Description

IMPORT_HANDLER_WORKER_POOL_SIZE

uint8

Number of workers to use while processing import events.

IMPORT_HANDLER_WORK_QUEUE_SIZE

uint16

Size the worker pool job queue is allowed to fill to before blocking.

IMPORT_HANDLER_KAFKA_CONSUMER_CLIENT_ID

String

Kafka client ID for the KafkaConsumer that will be used to receive messages from the VDI Kafka instance.

THIS VALUE MUST BE UNIQUE ACROSS ALL KAFKA CLIENT IDS

Install Data Trigger Handler

Req. Name Type Description

INSTALL_DATA_HANDLER_WORKER_POOL_SIZE

uint8

Number of workers to use while processing install-data events.

INSTALL_DATA_HANDLER_WORK_QUEUE_SIZE

uint16

Size the worker pool job queue is allowed to fill to before blocking.

INSTALL_DATA_HANDLER_KAFKA_CONSUMER_CLIENT_ID

String

Kafka client ID for the KafkaConsumer that will be used to receive messages from the VDI Kafka instance.

THIS VALUE MUST BE UNIQUE ACROSS ALL KAFKA CLIENT IDS

Pruner

Req. Name Type Description

DATASET_PRUNING_DELETION_THRESHOLD

Duration

Age at which a soft-deleted dataset becomes a candidate for pruning from the VDI system

DATASET_PRUNING_INTERVAL

Duration

Frequency at which the pruner will run automatically.

DATASET_PRUNING_WAKEUP_INTERVAL

Duration

Frequency at which the pruner module will wake up and check for a service shutdown signal.

Dataset Reconciler

Req. Name Type Description

RECONCILER_FULL_ENABLED

boolean

Whether the full dataset reconciliation process is enabled.

RECONCILER_FULL_RUN_INTERVAL

Duration

Interval at which the full reconciliation process will run.

RECONCILER_SLIM_RUN_INTERVAL

Duration

Interval at which the slim reconciliation process will run.

RECONCILER_DELETES_ENABLED

boolean

Whether the reconciler should perform delete operations.

Reconciliation Event Handler

Req. Name Type Description

RECONCILIATION_HANDLER_WORKER_POOL_SIZE

uint8

Number of workers to use while processing reconciliation events.

RECONCILIATION_HANDLER_WORK_QUEUE_SIZE

Duration

Size the worker pool job queue is allowed to fill to before blocking.

RECONCILIATION_HANDLER_KAFKA_CONSUMER_CLIENT_ID

String

Kafka client ID for the KafkaConsumer that will be used to receive messages from the VDI Kafka instance.

THIS VALUE MUST BE UNIQUE ACROSS ALL KAFKA CLIENT IDS

Share Trigger Handler

Req. Name Type Description

SHARE_HANDLER_WORKER_POOL_SIZE

uint8

Number of workers to use while processing share events.

SHARE_HANDLER_WORK_QUEUE_SIZE

uint16

Size the worker pool job queue is allowed to fill to before blocking.

SHARE_HANDLER_KAFKA_CONSUMER_CLIENT_ID

String

Kafka client ID for the KafkaConsumer that will be used to receive messages from the VDI Kafka instance.

THIS VALUE MUST BE UNIQUE ACROSS ALL KAFKA CLIENT IDS

Soft Delete Trigger Handler

Req. Name Type Description

SOFT_DELETE_HANDLER_WORKER_POOL_SIZE

uint8

Number of workers to use while processing soft-delete events.

SOFT_DELETE_HANDLER_WORK_QUEUE_SIZE

uint16

Size the worker pool job queue is allowed to fill to before blocking.

SOFT_DELETE_HANDLER_KAFKA_CONSUMER_CLIENT_ID

String

Kafka client ID for the KafkaConsumer that will be used to receive messages from the VDI Kafka instance.

THIS VALUE MUST BE UNIQUE ACROSS ALL KAFKA CLIENT IDS

Update Meta Trigger Handler

Req. Name Type Description

UPDATE_META_HANDLER_WORKER_POOL_SIZE

uint8

Number of workers to use while processing update-meta events.

UPDATE_META_HANDLER_WORK_QUEUE_SIZE

uint16

Size the worker pool job queue is allowed to fill to before blocking.

UPDATE_META_HANDLER_KAFKA_CONSUMER_CLIENT_ID

String

Kafka client ID for the KafkaConsumer that will be used to receive messages from the VDI Kafka instance.

THIS VALUE MUST BE UNIQUE ACROSS ALL KAFKA CLIENT IDS

Components

Cache DB

Req. Name Type Description

CACHE_DB_HOST

String

Hostname of the cache db instance.

CACHE_DB_PORT

uint16

Port number for the cache db instance.

CACHE_DB_NAME

String

Name of the postgres database in the cache db instance to use.

CACHE_DB_USERNAME

String

Database credentials username.

CACHE_DB_PASSWORD

String

Database credentials password.

CACHE_DB_POOL_SIZE

uint8

Database connection pool size.

Kafka

Req. Name Type Description

KAFKA_SERVERS

List<HostAddress>

Kafka server(s) to connect to publish and consume message topics.

Consumer Client

Kafka consumer client tuning and configuration.

Req. Name Type Description

KAFKA_CONSUMER_AUTO_COMMIT_INTERVAL

Duration

The frequency that the consumer offsets are auto-committed to Kafka if KAFKA_CONSUMER_ENABLE_AUTO_COMMIT is set to true.

KAFKA_CONSUMER_AUTO_OFFSET_RESET

"earliest"
"latest"
"none"

What to do when there is no initial offset in Kafka, or if the current offset does not exist anymore on the server.

  • earliest = Automatically reset the offset to the earliest offset.

  • latest = Automatically reset the offset to the latest offset.

  • none = Throw an exception if no previous offset is found for the consumer’s group.

KAFKA_CONSUMER_CONNECTIONS_MAX_IDLE

Duration

Close idle connections after this duration.

KAFKA_CONSUMER_DEFAULT_API_TIMEOUT

Duration

Specifies the timeout for client APIs. This configuration is used as the default timeout for all client operations that do not specify a timeout parameter.

KAFKA_CONSUMER_ENABLE_AUTO_COMMIT

boolean

If true, the consumer’s offset will be periodically committed in the background.

KAFKA_CONSUMER_FETCH_MAX_BYTES

uint32

The maximum amount of data the server should return for a fetch request. Records are fetched in batches by the consumer, and if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that the consumer can make progress. As such, this is not an absolute maximum. Note that the consumer performs multiple fetches in parallel.

KAFKA_CONSUMER_FETCH_MIN_BYTES

uint32

The minimum amount of data the server should return for a fetch request. If insufficient data is available the request will wait for that much data to accumulate before answering the request. The default setting of 1 byte means that fetch requests are answered as soon as a single byte of data is available or the fetch request times out waiting for data to arrive. Setting this to something greater than 1 will cause the server to wait for larger amounts of data to accumulate which can improve server throughput a bit at the cost of some additional latency.

KAFKA_CONSUMER_GROUP_ID

String

A unique string that identifies the consumer group this consumer belongs to.

KAFKA_CONSUMER_GROUP_INSTANCE_ID

String

A unique identifier of the consumer instance provided by the end user. Only non-empty strings are permitted. If set, the consumer is treated as a static member, which means that only one instance with this ID is allowed in the consumer group at any time. This can be used in combination with a larger session timeout to avoid group rebalances caused by transient unavailability (e.g. process restarts). If not set, the consumer will join the group as a dynamic member, which is the traditional behavior.

KAFKA_CONSUMER_HEARTBEAT_INTERVAL

Duration

The expected time between heartbeats to the consumer coordinator when using Kafka’s group management facilities. Heartbeats are used to ensure that the consumer’s session stays active and to facilitate rebalancing when new consumers join or leave the group. The value must be set lower than KAFKA_CONSUMER_SESSION_TIMEOUT, but typically should be set no higher than 1/3 of that value. It can be adjusted even lower to control the expected time for normal rebalances.

KAFKA_CONSUMER_MAX_POLL_INTERVAL

Duration

The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time that the consumer can be idle before fetching more records. If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member. For consumers using a non-null KAFKA_CONSUMER_GROUP_INSTANCE_ID which reach this timeout, partitions will not be immediately reassigned. Instead, the consumer will stop sending heartbeats and partitions will be reassigned after expiration of KAFKA_CONSUMER_SESSION_TIMEOUT. This mirrors the behavior of a static consumer which has shutdown.

KAFKA_CONSUMER_MAX_POLL_RECORDS

uint32

The maximum number of records returned in a single call to poll(). Note, that this value does not impact the underlying fetching behavior. The consumer will cache the records from each fetch request and returns them incrementally from each poll.

KAFKA_CONSUMER_POLL_DURATION

Duration

The amount of time to block waiting for input.

KAFKA_CONSUMER_RECEIVE_BUFFER_SIZE_BYTES

uint32

The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the value is -1, the OS default will be used.

KAFKA_CONSUMER_RECONNECT_BACKOFF_MAX_TIME

Duration

The maximum amount of time in milliseconds to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the backoff per host will increase exponentially for each consecutive connection failure, up to this maximum. After calculating the backoff increase, 20% random jitter is added to avoid connection storms.

KAFKA_CONSUMER_RECONNECT_BACKOFF_TIME

Duration

The base amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all connection attempts by the client to a broker.

KAFKA_CONSUMER_REQUEST_TIMEOUT

Duration

The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted.

KAFKA_CONSUMER_RETRY_BACKOFF_TIME

Duration

The amount of time to wait before attempting to retry a failed request to a given topic partition. This avoids repeatedly sending requests in a tight loop under some failure scenarios.

KAFKA_CONSUMER_SEND_BUFFER_SIZE_BYTES

uint32

The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value is -1, the OS default will be used.

KAFKA_CONSUMER_SESSION_TIMEOUT

Duration

The timeout used to detect worker failures. The worker sends periodic heartbeats to indicate its liveness to the broker. If no heartbeats are received by the broker before the expiration of this session timeout, then the broker will remove the worker from the group and initiate a rebalance. Note that the value must be in the allowable range as configured in the broker configuration by group.min.session.timeout.ms and group.max.session.timeout.ms.

Producer Client

Kafka message producer client tuning and configuration.

Req. Name Type Description

KAFKA_PRODUCER_BATCH_SIZE

uint32

The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server. This configuration controls the default batch size in bytes.

No attempt will be made to batch records larger than this size.

Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent.

A small batch size will make batching less common and may reduce throughput (a batch size of zero will disable batching entirely). A very large batch size may use memory a bit more wastefully as we will always allocate a buffer of the specified batch size in anticipation of additional records.

Note: This setting gives the upper bound of the batch size to be sent. If we have fewer than this many bytes accumulated for this partition, we will 'linger' for the KAFKA_PRODUCER_LINGER_TIME time waiting for more records to show up. This KAFKA_PRODUCER_LINGER_TIME setting defaults to 0, which means we’ll immediately send out a record even the accumulated batch size is under this KAFKA_PRODUCER_BATCH_SIZE setting.

KAFKA_PRODUCER_BUFFER_MEMORY_BYTES

uint32

The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are sent faster than they can be delivered to the server the producer will block for KAFKA_PRODUCER_MAX_BLOCKING_TIMEOUT after which it will throw an exception.

This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if compression is enabled) as well as for maintaining in-flight requests.

KAFKA_PRODUCER_CLIENT_ID

String

An id string to pass to the server when making requests. The purpose of this is to be able to track the source of requests beyond just ip/port by allowing a logical application name to be included in server-side request logging.

KAFKA_PRODUCER_COMPRESSION_TYPE

none
gzip
snappy
lz4
zstd

The compression type for all data generated by the producer. The default is none (i.e. no compression). Valid values are none, gzip, snappy, lz4, or zstd. Compression is of full batches of data, so the efficacy of batching will also impact the compression ratio (more batching means better compression).

KAFKA_PRODUCER_CONNECTIONS_MAX_IDLE

Duration

Close idle connections after the number of milliseconds specified by this config.

KAFKA_PRODUCER_DELIVERY_TIMEOUT

Duration

An upper bound on the time to report success or failure after a call to send() returns. This limits the total time that a record will be delayed prior to sending, the time to await acknowledgement from the broker (if expected), and the time allowed for retriable send failures. The producer may report failure to send a record earlier than this config if either an unrecoverable error is encountered, the retries have been exhausted, or the record is added to a batch which reached an earlier delivery expiration deadline. The value of this config should be greater than or equal to the sum of KAFKA_PRODUCER_REQUEST_TIMEOUT and KAFKA_PRODUCER_LINGER_TIME.

KAFKA_PRODUCER_LINGER_TIME

Duration

The producer groups together any records that arrive in between request transmissions into a single batched request. Normally this occurs only under load when records arrive faster than they can be sent out. However, in some circumstances the client may want to reduce the number of requests even under moderate load. This setting accomplishes this by adding a small amount of artificial delay—that is, rather than immediately sending out a record, the producer will wait for up to the given delay to allow other records to be sent so that the sends can be batched together. This can be thought of as analogous to Nagle’s algorithm in TCP. This setting gives the upper bound on the delay for batching: once we get KAFKA_PRODUCER_BATCH_SIZE worth of records for a partition it will be sent immediately regardless of this setting, however if we have fewer than this many bytes accumulated for this partition we will 'linger' for the specified time waiting for more records to show up. This setting defaults to 0 (i.e. no delay). Setting KAFKA_PRODUCER_LINGER_TIME=5, for example, would have the effect of reducing the number of requests sent but would add up to 5ms of latency to records sent in the absence of load.

KAFKA_PRODUCER_MAX_BLOCKING_TIMEOUT

Duration

The configuration controls how long the KafkaProducer's send(), partitionsFor(), initTransactions(), sendOffsetsToTransaction(), commitTransaction() and abortTransaction() methods will block. For send() this timeout bounds the total time waiting for both metadata fetch and buffer allocation (blocking in the user-supplied serializers or partitioner is not counted against this timeout). For partitionsFor() this timeout bounds the time spent waiting for metadata if it is unavailable. The transaction-related methods always block, but may time out if the transaction coordinator could not be discovered or did not respond within the timeout.

KAFKA_PRODUCER_MAX_REQUEST_SIZE_BYTES

uint32

The maximum size of a request in bytes. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. This is also effectively a cap on the maximum uncompressed record batch size. Note that the server has its own cap on the record batch size (after compression if compression is enabled) which may be different from this.

KAFKA_PRODUCER_RECEIVE_BUFFER_SIZE_BYTES

uint32

The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the value is -1, the OS default will be used.

KAFKA_PRODUCER_RECONNECT_BACKOFF_MAX_TIME

Duration

The maximum amount of time in milliseconds to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the backoff per host will increase exponentially for each consecutive connection failure, up to thisz maximum. After calculating the backoff increase, 20% random jitter is added to avoid connection storms.

KAFKA_PRODUCER_RECONNECT_BACKOFF_TIME

Duration

The base amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all connection attempts by the client to a broker.

KAFKA_PRODUCER_REQUEST_TIMEOUT

Duration

The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted. This should be larger than replica.lag.time.max.ms (a broker configuration) to reduce the possibility of message duplication due to unnecessary producer retries.

KAFKA_PRODUCER_RETRY_BACKOFF_TIME

Duration

The amount of time to wait before attempting to retry a failed request to a given topic partition. This avoids repeatedly sending requests in a tight loop under some failure scenarios.

KAFKA_PRODUCER_SEND_BUFFER_SIZE_BYTES

uint32

The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value is -1, the OS default will be used.

KAFKA_PRODUCER_SEND_RETRIES

uint32

Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error. Produce requests will be failed before the number of retries has been exhausted if the timeout configured by delivery.timeout.ms expires first before successful acknowledgement. Users should generally prefer to leave this config unset and instead use KAFKA_PRODUCER_DELIVERY_TIMEOUT to control retry behavior.

Enabling idempotence requires this config value to be greater than 0. If conflicting configurations are set and idempotence is not explicitly enabled, idempotence is disabled.

Trigger Topics

Names of the topics that various trigger events will be published to.

Req. Name Type Description

KAFKA_TOPIC_HARD_DELETE_TRIGGERS

String

Name of the hard-delete trigger topic that messages will be routed to for object hard-delete events from MinIO.

A hard-delete event is the removal of a VDI dataset object in MinIO. Presently these events do not trigger any behavior in the VDI service.

KAFKA_TOPIC_IMPORT_TRIGGERS

String

Name of the import trigger topic that messages will be routed to for import events from MinIO.

An import event is the creation or overwriting of a user upload object in MinIO. These events will trigger a call to the plugin handler server to process the user upload to prepare it for installation.

KAFKA_TOPIC_INSTALL_TRIGGERS

String

Name of the install-data trigger topic that messages will be routed to for data installation triggers from MinIO.

An install-data event is the creation or overwriting of a VDI dataset data object in MinIO. These events will trigger a call to the plugin handler server to install the data that has just landed in MinIO.

KAFKA_TOPIC_SHARE_TRIGGERS

String

Name of the share trigger topic that messages will be routed to for share events from MinIO.

A share event is the creation or overwriting of a "share" object in MinIO. These events will trigger an update to the share/visibility configuration for the target dataset.

KAFKA_TOPIC_SOFT_DELETE_TRIGGERS

String

Name of the soft-delete trigger topic that messages will be routed to for soft-delete events from MinIO.

A soft-delete event is the creation or overwriting of a soft-delete flag object in MinIO. These events will trigger a call to the plugin handler server to uninstall the data from the target application databases.

KAFKA_TOPIC_UPDATE_META_TRIGGERS

String

Name of the update-meta trigger topic that messages will be routed to for metadata update events from MinIO.

An update-meta event is the creation or overwriting of the dataset metadata object in MinIO. These events will trigger a call to the plugin handler server to install or update the metadata for the dataset in the target application databases.

KAFKA_TOPIC_RECONCILIATION_TRIGGERS

String

Name of the reconciliation trigger topic that messages will be routed to for events fired by the dataset reconciler.

Message Keys

Names of the message key values that events will be keyed on when published to the various Kafka topics. Event messages that are not keyed on the appropriate value will be ignored by the VDI service.

Req. Name Type Description

KAFKA_MESSAGE_KEY_HARD_DELETE_TRIGGERS

String

Message key for hard-delete trigger events.

KAFKA_MESSAGE_KEY_IMPORT_TRIGGERS

String

Message key for import trigger events.

KAFKA_MESSAGE_KEY_INSTALL_TRIGGERS

String

Message key for install-data trigger events.

KAFKA_MESSAGE_KEY_SHARE_TRIGGERS

String

Message key for share trigger events.

KAFKA_MESSAGE_KEY_SOFT_DELETE_TRIGGERS

String

Message key for soft-delete trigger events.

KAFKA_MESSAGE_KEY_UPDATE_META_TRIGGERS

String

Message key for update-meta trigger events.

KAFKA_MESSAGE_KEY_RECONCILIATION_TRIGGERS

String

Message key for reconciliation trigger events.

Rabbit

Req. Name Type Description

GLOBAL_RABBIT_CONNECTION_NAME

String

Optional name of the connection to the RabbitMQ service. This value will show in the RabbitMQ logs and in the management console to identify the VDI service’s connection.

GLOBAL_RABBIT_HOST

String

Hostname of the global RabbitMQ instance that the VDI service will connect to.

GLOBAL_RABBIT_PORT

uint16

Port to use when connecting to the global RabbitMQ instance.

GLOBAL_RABBIT_USERNAME

String

Credentials username used to authenticate with the global RabbitMQ instance.

GLOBAL_RABBIT_PASSWORD

String

Credentials password used to authenticate with the global RabbitMQ instance.

GLOBAL_RABBIT_VDI_POLLING_INTERVAL

Duration

Frequency that the global RabbitMQ instance will be polled for new messages from MinIO.

GLOBAL_RABBIT_USE_TLS

boolean

Whether the connection to the target RabbitMQ instance should use TLS.

Defaults to false.

GLOBAL_RABBIT_CONNECTION_TIMEOUT

Duration

TCP connection timeout.

Exchange Config
Req. Name Type Description

GLOBAL_RABBIT_VDI_EXCHANGE_NAME

String

Name of the target RabbitMQ exchange that will be declared by both the MinIO instance and the VDI service.

GLOBAL_RABBIT_VDI_EXCHANGE_TYPE

direct
fanout
topic
match

Exchange type as declared bt the MinIO connection to the global RabbitMQ instance.

GLOBAL_RABBIT_VDI_EXCHANGE_AUTO_DELETE

boolean

Whether the exchange should be auto deleted when the connections from MinIO and the VDI service are closed.

GLOBAL_RABBIT_VDI_EXCHANGE_DURABLE

boolean

Whether the exchange should be durable (persisted to disk).

This value must align with the exchange configuration as set by MinIO.

GLOBAL_RABBIT_VDI_EXCHANGE_ARGUMENTS

Map<String, String>

Additional arguments to pass to the exchange declaration.

Queue Config
Req. Name Type Description

GLOBAL_RABBIT_VDI_QUEUE_NAME

String

Name of the RabbitMQ queue to declare.

This value must align with the queue name as configured in MinIO.

GLOBAL_RABBIT_VDI_QUEUE_AUTO_DELETE

boolean

Whether the queue should be auto deleted when the connections from MinIO and the VDI service are closed.

GLOBAL_RABBIT_VDI_QUEUE_EXCLUSIVE

boolean

Whether the queue should be exclusive to the VDI service.

See: Exclusive Queues

GLOBAL_RABBIT_VDI_QUEUE_DURABLE

boolean

Whether the queue should be durable (persisted to disk).

This value must align with the queue configuration as set by MinIO.

GLOBAL_RABBIT_VDI_QUEUE_ARGUMENTS

Map<String, String>

Additional arguments to pass to the queue declaration.

Routing
Req. Name Type Description

GLOBAL_RABBIT_VDI_ROUTING_KEY

String

GLOBAL_RABBIT_VDI_ROUTING_ARGUMENTS

Map<String, String>

S3 (MinIO)

Req. Name Type Description

S3_HOST

String

MinIO hostname.

S3_PORT

uint16

MinIO connection port.

S3_USE_HTTPS

boolean

Whether HTTPS should be used when connecting to the MinIO instance.

S3_BUCKET_NAME

String

Name of the MinIO bucket that will be used by the VDI service.

S3_ACCESS_TOKEN

String

MinIO username/access token to use when authenticating with the MinIO instance.

S3_SECRET_KEY

String

MinIO password/secret key to use when authenticating with the MinIO instance.

General Plugin Variables

Environment variables used by all plugins.

Req. Name Type Description

SITE_BUILD

String

Site build number string (e.g. "build-65")

DATASET_INSTALL_ROOT

String

Mount path in the plugin containers for the dataset install directory tree.

Wildcard Environment Variables

Plugin Handler Environment Key Components.

Registers a VDI plugin with the service.

PLUGIN_HANDLER_<NAME>_NAME
PLUGIN_HANDLER_<NAME>_DISPLAY_NAME
PLUGIN_HANDLER_<NAME>_VERSION
PLUGIN_HANDLER_<NAME>_ADDRESS
PLUGIN_HANDLER_<NAME>_PROJECT_IDS
PLUGIN_HANDLER_<NAME>_CUSTOM_PATH
PLUGIN_HANDLER_<NAME>_SERVER_PORT
PLUGIN_HANDLER_<NAME>_SERVER_HOST
PLUGIN_HANDLER_<NAME>_IMPORT_SCRIPT_PATH
PLUGIN_HANDLER_<NAME>_IMPORT_SCRIPT_MAX_DURATION
PLUGIN_HANDLER_<NAME>_CHECK_COMPAT_SCRIPT_PATH
PLUGIN_HANDLER_<NAME>_CHECK_COMPAT_SCRIPT_MAX_DURATION
PLUGIN_HANDLER_<NAME>_INSTALL_DATA_SCRIPT_PATH
PLUGIN_HANDLER_<NAME>_INSTALL_DATA_SCRIPT_MAX_DURATION
PLUGIN_HANDLER_<NAME>_INSTALL_META_SCRIPT_PATH
PLUGIN_HANDLER_<NAME>_INSTALL_META_SCRIPT_MAX_DURATION
PLUGIN_HANDLER_<NAME>_UNINSTALL_SCRIPT_PATH
PLUGIN_HANDLER_<NAME>_UNINSTALL_SCRIPT_MAX_DURATION

Unlike most of the other environment key values defined here, these values define components of wildcard environment keys which may be specified with any arbitrary <NAME> value between the defined prefix value and suffix options.

The environment variables set using the prefix and suffixes defined below must appear in groups that contain the indicated suffixes. For example, given the <NAME> value "RNASEQ" the following environment variables must be present:

PLUGIN_HANDLER_RNASEQ_NAME
PLUGIN_HANDLER_RNASEQ_DISPLAY_NAME
PLUGIN_HANDLER_RNASEQ_VERSION
PLUGIN_HANDLER_RNASEQ_ADDRESS
Req. Name Type Description

PLUGIN_HANDLER_<NAME>_NAME

String

Name of the plugin handler. This will typically be the type name of the dataset type that the plugin handles.

PLUGIN_HANDLER_<NAME>_DISPLAY_NAME

String

Display name for the plugin handler. This will be shown to the end users as the type of their datasets.

PLUGIN_HANDLER_<NAME>_VERSION

String

Version for the plugin handler.

PLUGIN_HANDLER_<NAME>_ADDRESS

HostAddress

Address and port of the plugin handler service.

PLUGIN_HANDLER_<NAME>_PROJECT_IDS

List<String>

List of project IDs for which the plugin is relevant. If this value is omitted or set to a blank value, the plugin will be considered relevant to all projects.

PLUGIN_HANDLER_<NAME>_CUSTOM_PATH

String

Custom $PATH variable additions to pass to plugin scripts.

PLUGIN_HANDLER_<NAME>_SERVER_PORT

uint16

Port the plugin handler HTTP server will bind to.

PLUGIN_HANDLER_<NAME>_SERVER_HOST

String

Address the plugin handler HTTP server will bind to.

PLUGIN_HANDLER_<NAME>_IMPORT_SCRIPT_PATH

String

Path to the import script or binary in the plugin container.

PLUGIN_HANDLER_<NAME>_IMPORT_SCRIPT_MAX_DURATION

Duration

Max duration the import script will be permitted to run before being killed.

PLUGIN_HANDLER_<NAME>_CHECK_COMPAT_SCRIPT_PATH

String

Path to the compatibility check script or binary in the plugin container.

PLUGIN_HANDLER_<NAME>_CHECK_COMPAT_SCRIPT_MAX_DURATION

Duration

Max duration the compatibility check script will be permitted to run before being killed.

PLUGIN_HANDLER_<NAME>_INSTALL_DATA_SCRIPT_PATH

String

Path to the data install script or binary in the plugin container.

PLUGIN_HANDLER_<NAME>_INSTALL_DATA_SCRIPT_MAX_DURATION

Duration

Max duration the data install script will be permitted to run before being killed.

PLUGIN_HANDLER_<NAME>_INSTALL_META_SCRIPT_PATH

String

Path to the metadata install script or binary in the plugin container.

PLUGIN_HANDLER_<NAME>_INSTALL_META_SCRIPT_MAX_DURATION

Duration

Max duration the metadata install script will be permitted to run before being killed.

PLUGIN_HANDLER_<NAME>_UNINSTALL_SCRIPT_PATH

String

Path to the uninstall script or binary in the plugin container.

PLUGIN_HANDLER_<NAME>_UNINSTALL_SCRIPT_MAX_DURATION

Duration

Max duration the uninstall script will be permitted to run before being killed.

Application Database Key Components

DB_CONNECTION_ENABLED_<NAME>
DB_CONNECTION_NAME_<NAME>
DB_CONNECTION_USER_<NAME>
DB_CONNECTION_PASS_<NAME>
DB_CONNECTION_DATA_SCHEMA_<NAME>
DB_CONNECTION_CONTROL_SCHEMA_<NAME>
DB_CONNECTION_POOL_SIZE_<NAME>

# restrict to plugins
DB_CONNECTION_DATA_TYPES_<NAME>

# for LDAP
DB_CONNECTION_LDAP_<NAME>
# else, for manual connection
DB_CONNECTION_HOST_<NAME>
DB_CONNECTION_PORT_<NAME>
DB_CONNECTION_PLATFORM_<NAME>

Unlike most of the other environment key values defined here, these values define components of wildcard environment keys which may be specified with any arbitrary <NAME> value following the defined prefix option.

The environment variables set using the prefixes defined below must appear in groups that contain all prefixes. For example, given the <NAME> value "PLASMO", the following environment variables must all be present:

DB_CONNECTION_ENABLED_PLASMO
DB_CONNECTION_NAME_PLASMO
DB_CONNECTION_LDAP_PLASMO
DB_CONNECTION_USER_PLASMO
DB_CONNECTION_PASS_PLASMO
DB_CONNECTION_DATA_SCHEMA_PLASMO
DB_CONNECTION_CONTROL_SCHEMA_PLASMO
DB_CONNECTION_POOL_SIZE_PLASMO

Database connections detail sets MUST provide either an LDAP lookup name for a database OR a host and port.

Variable Definitions
Req. Name Type Description

DB_CONNECTION_ENABLED_<NAME>

bool

Whether the database connection should be enabled for use by VDI.

DB_CONNECTION_NAME_<NAME>

String

Name for the connection, typically the project ID or identifier for the application database.

DB_CONNECTION_LDAP_<NAME>

String

LDAP distinguished name for the database connection OrclNetDesc entry containing the connection details for the target database.

DB_CONNECTION_HOST_<NAME>

String

Host URI to use when providing manual database connection details.

WARNING: This variable will be ignored if DB_CONNECTION_LDAP is provided.

WARNING: This variable is required if DB_CONNECTION_LDAP is absent.

DB_CONNECTION_PORT_<NAME>

uint16

Port to use when providing manual database connection details.

WARNING: This variable will be ignored if DB_CONNECTION_LDAP is provided.

WARNING: This variable is required if DB_CONNECTION_LDAP is absent.

DB_CONNECTION_USER_<NAME>

String

Database credentials username.

DB_CONNECTION_PASS_<NAME>

String

Database credentials password.

DB_CONNECTION_DATA_SCHEMA_<NAME>

String

Database schema where user dataset data is installed to.

DB_CONNECTION_CONTROL_SCHEMA_<NAME>

String

Database schema where the VDI control tables are installed to.

DB_CONNECTION_POOL_SIZE_<NAME>

uint8

Connection pool size for the JDBC DataSource.

DB_CONNECTION_DATA_TYPES_<NAME>

List<String>

Dataset type names that align with plugins registered in the VDI environment configuration.

If provided, VDI will only use this connection for datasets whose type name matches an item in the given list.

If omitted, VDI will use this connection for all datasets whose type name does not match another DB connection with declared types.

One to Many Project to DB Configs

A single project may have multiple target databases registered provided that each connection has a unique (name, dataset type) pairing.

By default, if a database connection detail set does not contain a dataset type list via DB_CONNECTION_DATA_TYPES_<NAME> it will be used as a fallback for all dataset types that do not match a configured connection.

If multiple database connection detail sets omit the data types restriction variable, VDI will refuse to start. If multiple database connection detail sets specify the same data type, VDI will refuse to start.

Example
# This DB connection is a fallback because it provides no data type list.
# It is NOT used for datasets of type foo and bar.
DB_CONNECTION_ENABLED_PLASMO_1=true
DB_CONNECTION_NAME_PLASMO_1=PlasmoDB
#DB_CONNECTION_DATA_TYPES_PLASMO_1='*'  # Wildcard is implied by omission

# This DB connection is only used for datasets of type foo and bar.
DB_CONNECTION_ENABLED_PLASMO_2=true
DB_CONNECTION_NAME_PLASMO_2=PlasmoDB
DB_CONNECTION_DATA_TYPES_PLASMO_2=foo,bar

Environment Variable Types

Duration

Durations are a string representation of a time interval. Durations are represented as one or more numeric values followed by a shorthand notation of the time unit.

Time Unit Notations:

ns

Nanoseconds

5ns

us

Microseconds

5us

ms

Milliseconds

5ms

s

Seconds

5s

m

Minutes

5m

h

Hours

5h

d

Days

5d

Durations may also be a combination of multiple values such as 1d 12h, 1h 0m 30.340s

Important

Only the last segment of a duration may have a fractional part.

HostAddress

A HostAddress is a hostname port pair in the form {host}:{port}, for example google.com:443.

List<T>

A list is a comma separated set of values that may be of any type that does not itself contain a comma, for example, a list may be of Durations or HostAddresses.

Example:

SOME_VARIABLE=item1,item2,item3
Map<K, V>

A map is a list of key/value pairs with the keys separated from values by a colon and the pairs separated by commas. Keys may only be simple types, and values may be of any type that does not contain a comma.

Example:

SOME_VARIABLE=key1:value,key2:value,key3:value