Skip to content

Commit

Permalink
Update module readmes
Browse files Browse the repository at this point in the history
  • Loading branch information
jefflester committed Nov 15, 2024
1 parent 1c70279 commit 5608718
Show file tree
Hide file tree
Showing 20 changed files with 255 additions and 168 deletions.
18 changes: 18 additions & 0 deletions src/lib/modules/admin/cache-service/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ For troubleshooting, the bootstrap script enables debug logging for
with the cache service. The JMX dump tables can be queried in the `jmx.history`
schema.

## Usage

minitrino --env STARBURST_VER=<ver> provision --module cache-service

## Table Scan Redirections (TSRs)

The `rules.json` file configures two tables for TSRs: `postgres.public.customer`
Expand All @@ -29,3 +33,17 @@ cache service operations as they occur.

An example MV is created in `hive_mv_tsr.mvs.example`. Any number of MVs can be
added to this catalog, and MVs can pull data from any data source.

## Editing the `hive_mv_tsr.properties` File

This module uses a roundabout way to mount the `hive_mv_tsr.properties` file
that allows for edits to be made to the file inside the Trino container without
the source file being modified on the host. To edit the file, exec into the
Trino container, make the desired changes, and then restart the container for
the changes to take effect:

docker exec -it trino bash
vi /etc/starburst/catalog/hive_mv_tsr.properties
exit

docker restart trino
7 changes: 6 additions & 1 deletion src/lib/modules/admin/data-products/readme.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Data Products Module

A module which deploys the Starburst [data
A module which configures the [data
products](https://docs.starburst.io/latest/data-products.html) feature.

The `hive` and `insights` modules are dependencies of this module.
Expand All @@ -11,3 +11,8 @@ The `hive` and `insights` modules are dependencies of this module.
docker exec -it trino bash
trino-cli
trino> show schemas from backend_svc;

For configuring data product domains, use this `s3a` path, which is from a
bucket auto-provisioned in the related MinIO container:

s3a://sample-bucket/<domain>
6 changes: 5 additions & 1 deletion src/lib/modules/admin/file-group-provider/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,17 @@ provider](https://docs.starburst.io/latest/security/group-file.html).

## Usage

# View group definitions
docker exec trino sh -c 'cat /etc/starburst/groups.txt'

# Get into the container and connect as a user tied to a group
minitrino --env STARBURST_VER=<ver> provision --module file-group-provider
docker exec -it trino bash
trino-cli --user admin
trino> show schemas from tpch;

You will need to supply a username to the Trino CLI in order to map to a group
(see `lib/modules/security/file-access-control/resources/trino/group.txt` for
(see `lib/modules/security/file-access-control/resources/trino/groups.txt` for
which users belong to which groups). Example:

trino-cli --user admin # maps to group sepadmins
18 changes: 7 additions & 11 deletions src/lib/modules/admin/insights/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,20 @@ This module configures and deploys the necessary components for
[Insights](https://docs.starburst.io/latest/insights/configuration.html)
features in the SEP web UI, including the required [backend
service](https://docs.starburst.io/latest/admin/backend-service.html) database
which persists the data to provide information needed for Insights features.

This module is a prerequisite for Built-in access control (BIAC).
which persists the data to provide information needed for the Insights UI.

## Usage

The backend service database can be queried directly, as it is exposed as a
catalog. For example:

minitrino --env STARBURST_VER=<ver> provision --module insights
docker exec -it trino bash
trino-cli
trino> show schemas from backend_svc;

## Accessing Insights Web UI

Open a web browser and go to [https://localhost:8080](https://localhost:8080)
and log in with a user that is authorized to access insights.

Note: `insights.authorized-*` properties cannot be used in conjunction with
SEP's built-in access control properties (`starburst.access-control`). If you
need to access Insights features in the UI without enabling BIAC, you will need
to uncomment the `insights.authorized-users=.*` property in the coordinator's
`/etc/starburst/config.properties` file.
Open a web browser, navigate to
[https://localhost:8080](https://localhost:8080), and log in with any user. The
Insights UI should be enabled.
5 changes: 4 additions & 1 deletion src/lib/modules/catalog/db2/readme.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# Db2 Connector Module
# Db2 Catalog Module

**Note**: this module doesn't really work on newer Macs with M chips. I will
look into fixing that.

This module provisions a standalone Db2 service. Note that the Db2 service can
take a long time to start (10-20+ minutes), so ensure you are viewing the Db2
Expand Down
52 changes: 40 additions & 12 deletions src/lib/modules/catalog/delta-lake/readme.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
# Delta-Lake Module
# Delta Lake Catalog Module

This module uses Minio as a local S3 service. You can write data to this service
and the files will be written to your machine. You can read more about Minio
[here](https://docs.min.io/docs/minio-docker-quickstart-guide.html). This module
also uses a Hive metastore (HMS) container along with a Postgres container for
the HMS's backend storage. The HMS image is based off of naushadh's repository
[here](https://github.com/naushadh/hive-metastore) (refer to his repository for
additional documentation on the HMS image and configuration options).
This module deploys the necessary components for a Delta Lake environment.

You can access the Minio UI at `http://localhost:9002` with `access-key` and
- **Object storage**: served via MinIO (`minio-delta-lake` container and
bootstrapped by `create-minio-delta-lake-buckets`)
- **Metastore**: served via a Hive metastore (`metastore-delta-lake` container
backed by `postgres-delta-lake` for storage)
- The HMS image is based off of naushadh's repository
[here](https://github.com/naushadh/hive-metastore) (refer to his repository
for additional documentation on the HMS image and configuration options)

The MinIO UI can be viewed at `http://localhost:9002` using `access-key` and
`secret-key` for credentials.

This module uses the Delta Lake connector. There is no Spark backend, so tables
Expand All @@ -21,7 +23,7 @@ need to be created via `CREATE TABLE AS ...` queries from Trino. Example:
AS SELECT * FROM tpch.tiny.customer;

This will create the table `delta.default.customer` and a corresponding
`_delta_log` directory in the backing MinIO object storage.
`_delta_log` directory in MinIO object storage.

## Usage

Expand All @@ -30,9 +32,9 @@ This will create the table `delta.default.customer` and a corresponding
trino-cli
trino> show schemas from delta;

## Cleanup
## Persistent Storage

This module uses named volumes to persist MinIO and HMS data:
This module uses named volumes to persist MinIO and metastore data:

volumes:
postgres-delta-lake-data:
Expand All @@ -44,6 +46,32 @@ This module uses named volumes to persist MinIO and HMS data:
- com.starburst.tests=minitrino
- com.starburst.tests.module.delta-lake=catalog-delta-lake

The user-facing implication is that the data in the Hive metastore and the data
files stored in MinIO are retained even after shutting down and/or removing the
environment's containers. Minitrino issues a warning about this whenever a
module with named volumes is deployed––be sure to look out for these warnings:

[w] Module '<module>' has persistent volumes associated with it. To delete these volumes, remember to run `minitrino remove --volumes`.

To remove these volumes, run:

minitrino -v remove --volumes --label com.starburst.tests.module.delta-lake=catalog-delta-lake

Or, remove them directly using the Docker CLI:

docker volume rm minitrino_postgres-delta-lake-data \
minitrino_minio-delta-lake-data

## Editing the `delta.properties` File

This module uses a roundabout way to mount the `delta.properties` file that
allows for edits to be made to the file inside the Trino container without the
source file being modified on the host. To edit the file, exec into the Trino
container, make the desired changes, and then restart the container for the
changes to take effect:

docker exec -it trino bash
vi /etc/starburst/catalog/delta.properties
exit

docker restart trino
15 changes: 11 additions & 4 deletions src/lib/modules/catalog/elasticsearch/readme.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
# Elasticsearch Connector Module
# Elasticsearch Catalog Module

This module contains an ES container with some preloaded data. It contains: a
schema (ES mapping), a table (ES doc mapping), and data (ES docs).
schema (ES mapping), a table (ES doc mapping), and 500 rows of fake data (ES
docs).

## Loading your own data
## Loading Data

Since port 9200 is exposed on localhost you can add your own data like this:
Elasticsearch is exposed on `localhost:9200`, so additional data can be loaded
as follows:

# Create user index
curl -XPUT http://localhost:9200/user?pretty=true;
Expand Down Expand Up @@ -41,6 +43,11 @@ Since port 9200 is exposed on localhost you can add your own data like this:
}
';

If scripting fake data is preferable, reference the bootstrap script leveraged
by this module, located at:

lib/modules/catalog/elasticsearch/resources/bootstrap/bootstrap-elasticsearch.sh

## Usage

minitrino --env STARBURST_VER=<ver> provision --module elasticsearch
Expand Down
53 changes: 40 additions & 13 deletions src/lib/modules/catalog/hive/readme.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,27 @@
# Hive Module
# Hive Catalog Module

This module uses Minio as a local S3 service. You can write data to this service
and the files will be written to your machine. You can read more about Minio
[here](https://docs.min.io/docs/minio-docker-quickstart-guide.html). This module
also uses a Hive metastore (HMS) container along with a Postgres container for
the HMS's backend storage. The HMS image is based off of naushadh's repository
[here](https://github.com/naushadh/hive-metastore) (refer to his repository for
additional documentation on the HMS image and configuration options).
This module deploys the necessary components for a Delta Lake environment.

You can access the Minio UI at `http://localhost:9001` with `access-key` and
- **Object storage**: served via MinIO (`minio` container and bootstrapped by
`create-minio-buckets`)
- **Metastore**: served via a Hive metastore (`metastore-hive` container backed
by `postgres-hive` for storage)
- The HMS image is based off of naushadh's repository
[here](https://github.com/naushadh/hive-metastore) (refer to his repository
for additional documentation on the HMS image and configuration options)

The MinIO UI can be viewed at `http://localhost:9001` using `access-key` and
`secret-key` for credentials.

You can create a table with ORC data with Trino very quickly:
Tables backed by ORC data files can be easily created:

trino> create schema hive.tiny with (location='s3a://sample-bucket/wh/tiny/');
CREATE SCHEMA

trino> create table hive.tiny.customer as select * from tpch.tiny.customer;
CREATE TABLE: 1500 rows

You will see the ORC data stored in your local Minio bucket.
The ORC data files can be viewed directly in the MinIO bucket via the MinIO UI.

## Usage

Expand All @@ -28,9 +30,9 @@ You will see the ORC data stored in your local Minio bucket.
trino-cli
trino> show schemas from hive;

## Cleanup
## Persistent Storage

This module uses named volumes to persist MinIO and HMS data:
This module uses named volumes to persist MinIO and metastore data:

volumes:
postgres-hive-data:
Expand All @@ -42,6 +44,31 @@ This module uses named volumes to persist MinIO and HMS data:
- com.starburst.tests=minitrino
- com.starburst.tests.module.hive=catalog-hive

The user-facing implication is that the data in the Hive metastore and the data
files stored in MinIO are retained even after shutting down and/or removing the
environment's containers. Minitrino issues a warning about this whenever a
module with named volumes is deployed––be sure to look out for these warnings:

[w] Module '<module>' has persistent volumes associated with it. To delete these volumes, remember to run `minitrino remove --volumes`.

To remove these volumes, run:

minitrino -v remove --volumes --label com.starburst.tests.module.hive=catalog-hive

Or, remove them directly using the Docker CLI:

docker volume rm minitrino_postgres-hive-data minitrino_minio-hive-data

## Editing the `hive.properties` File

This module uses a roundabout way to mount the `hive.properties` file that
allows for edits to be made to the file inside the Trino container without the
source file being modified on the host. To edit the file, exec into the Trino
container, make the desired changes, and then restart the container for the
changes to take effect:

docker exec -it trino bash
vi /etc/starburst/catalog/hive.properties
exit

docker restart trino
41 changes: 37 additions & 4 deletions src/lib/modules/catalog/iceberg/readme.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Iceberg Module
# Iceberg Catalog Module

This module deploys infrastructure for an Iceberg catalog leveraging the Iceberg
REST catalog.
Expand All @@ -9,17 +9,50 @@ requests](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.h
## Usage

minitrino --env STARBURST_VER=<ver> provision --module iceberg
docker exec -it trino bash
trino-cli
trino> show schemas from hive;

## Cleanup
Create a schema and a table:

This module uses a named volume to persist MinIO data:
create schema iceberg.test with (location = 's3a://sample-bucket/wh/test');
create table iceberg.test.test_tbl as select * from tpch.tiny.customer;

## Persistent Storage

This module uses named volumes to persist MinIO data:

volumes:
minio-iceberg-data:
labels:
- com.starburst.tests=minitrino
- com.starburst.tests.module.iceberg=catalog-iceberg

To remove this volume, run:
The user-facing implication is that the data files stored in MinIO are retained
even after shutting down and/or removing the environment's containers. Minitrino
issues a warning about this whenever a module with named volumes is deployed––be
sure to look out for these warnings:

[w] Module '<module>' has persistent volumes associated with it. To delete these volumes, remember to run `minitrino remove --volumes`.

To remove these volumes, run:

minitrino -v remove --volumes --label com.starburst.tests.module.iceberg=catalog-iceberg

Or, remove them directly using the Docker CLI:

docker volume rm minitrino_minio-iceberg-data

## Editing the `iceberg.properties` File

This module uses a roundabout way to mount the `iceberg.properties` file that
allows for edits to be made to the file inside the Trino container without the
source file being modified on the host. To edit the file, exec into the Trino
container, make the desired changes, and then restart the container for the
changes to take effect:

docker exec -it trino bash
vi /etc/starburst/catalog/iceberg.properties
exit

docker restart trino
27 changes: 11 additions & 16 deletions src/lib/modules/catalog/mariadb/readme.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,23 @@
# MariaDB Connector Module
# MariaDB Catalog Module

This module provisions a standalone MariaDB service. By default, it is exposed
to the internal Docker network only via:

```yaml
ports:
- :3306
```
ports:
- :3306

To expose it at the host level, add a port to the left of the colon, e.g.:

```yaml
ports:
- 3307:3306
```
ports:
- 3307:3306

This will allow you to connect to the service from any SQL client that supports
MariaDB drivers on `localhost:3307`.
MariaDB drivers on `localhost:3307`. Note that a unique port (`3307`) was used
here as the MySQL module already claims the host port `3306`.

## Usage

```sh
minitrino provision -m mariadb
docker exec -it trino bash
trino-cli
trino> show schemas from mariadb;
```
minitrino --env STARBURST_VER=<ver> provision --module mariadb
docker exec -it trino bash
trino-cli
trino> show schemas from mariadb;
Loading

0 comments on commit 5608718

Please sign in to comment.