Skip to content

Latest commit

 

History

History
163 lines (115 loc) · 4.2 KB

README.md

File metadata and controls

163 lines (115 loc) · 4.2 KB

Apache Iceberg Lakehouse with MinIO, Trino and Nessie

This project sets up a local environment for building an Apache Iceberg Lakehouse using MinIO, Trino and Nessie. It is designed for on-premise, open-source and development purposes.

Enjoy building your Apache Iceberg Lakehouse! 🚀


Table of Contents

  1. Tech Stack
  2. Prerequisites
  3. Setup and Installation
  4. Configuration
  5. Usage
  6. Acknowledgments

Tech Stack

  • Apache Iceberg: Open table format for large datasets.
  • MinIO: S3-compatible object storage.
  • Trino: Distributed SQL query engine.
  • Nessie: Git-like versioning for data lakes.
  • Docker: Containerization for easy setup and deployment.
  • Docker Compose: Orchestration of multi-container Docker applications.

Prerequisites

Before you begin, ensure you have Docker and Docker Compose installed on your system.


Setup and Installation

  1. Clone the Repository:

    git clone https://github.com/Jsarde/iceberg-lakehouse.git
    cd iceberg-lakehouse
  2. Start the Services:

    docker-compose up -d
  3. Verify the Services:

    Ensure all services are running correctly:


Configuration

Apache Iceberg

  • Table format: parquet
  • Catalog type: nessie

MinIO

  • Access key: minioadmin
  • Secret key: minioadmin
  • Endpoint: http://localhost:9000
  • Bucket: warehouse (used as the default warehouse directory for Iceberg)

Trino

  • Catalog file: catalog/iceberg.properties
  • Nessie catalog URI: http://nessie:19120/api/v1
  • Default warehouse directory: s3a://warehouse/

Nessie

  • API endpoint: http://localhost:19120/api/v1
  • Default branch: main

Environment Variables

The following environment variables are used in the docker-compose.yml file:

Variable Description Default Value
MINIO_ROOT_USER MinIO root username minioadmin
MINIO_ROOT_PASSWORD MinIO root password minioadmin
NESSIE_URI Nessie catalog URI http://nessie:19120/api/v1
S3_ENDPOINT MinIO S3 endpoint http://minio:9000
S3_ACCESS_KEY MinIO access key minioadmin
S3_SECRET_KEY MinIO secret key minioadmin

Usage

Connect to the Trino container

docker exec -it <trino-container-id> trino

Show catalogs

SHOW CATALOGS;

Create a schema

CREATE SCHEMA iceberg.my_schema;

Show all the schemas in the catalog

SHOW SCHEMAS FROM iceberg;

Create a table

CREATE TABLE iceberg.my_schema.films (
    id bigint,
    name varchar,
    year integer,
    director varchar,
    rating double
) WITH (
    format = 'PARQUET'
);

Insert data into the table

INSERT INTO iceberg.my_schema.films (id, name, year, director, rating)
VALUES
(1, 'Schindler''s List', 1993, 'Steven Spielberg', 8.9),
(2, 'The Lord of the Rings: The Return of the King', 2003, 'Peter Jackson', 8.9),
(3, 'Pulp Fiction', 1994, 'Quentin Tarantino', 8.9),
(4, 'Fight Club', 1999, 'David Fincher', 8.8),
(5, 'Forrest Gump', 1994, 'Robert Zemeckis', 8.8),
(6, 'Inception', 2010, 'Christopher Nolan', 8.7),
(7, 'The Matrix', 1999, 'Lana Wachowski, Lilly Wachowski', 8.7),
(8, 'Star Wars: Episode V - The Empire Strikes Back', 1980, 'Irvin Kershner', 8.7),
(9, 'Interstellar', 2014, 'Christopher Nolan', 8.6);

Acknowledgments

A big shoutout to DeepSeek for its awesome support and help in making this project happen!