diff --git a/website/docs/assets/fluss-quickstart-observability.zip b/website/docs/assets/fluss-quickstart-observability.zip
index 3ebcb67a3..8a1ecda7e 100644
Binary files a/website/docs/assets/fluss-quickstart-observability.zip and b/website/docs/assets/fluss-quickstart-observability.zip differ
diff --git a/website/docs/install-deploy/overview.md b/website/docs/install-deploy/overview.md
index 5bbf187db..6f03c63eb 100644
--- a/website/docs/install-deploy/overview.md
+++ b/website/docs/install-deploy/overview.md
@@ -124,8 +124,8 @@ We have listed them in the table below the figure.
CoordinatorServer/TabletServer report internal metrics and Fluss client (e.g., connector in Flink jobs) can report additional, client specific metrics as well.
- [JMX](/docs/maintenance/metric-reporters#jmx)
- [Prometheus](/docs/maintenance/metric-reporters#prometheus)
+ [JMX](/docs/maintenance/observability/metric-reporters#jmx)
+ [Prometheus](/docs/maintenance/observability/metric-reporters#prometheus)
|
diff --git a/website/docs/maintenance/observability/_category_.json b/website/docs/maintenance/observability/_category_.json
new file mode 100644
index 000000000..d6a5d582e
--- /dev/null
+++ b/website/docs/maintenance/observability/_category_.json
@@ -0,0 +1,4 @@
+{
+ "label": "Observability",
+ "position": 4
+}
diff --git a/website/docs/maintenance/logging.md b/website/docs/maintenance/observability/logging.md
similarity index 95%
rename from website/docs/maintenance/logging.md
rename to website/docs/maintenance/observability/logging.md
index 636075156..7676cc7dd 100644
--- a/website/docs/maintenance/logging.md
+++ b/website/docs/maintenance/observability/logging.md
@@ -1,6 +1,6 @@
---
sidebar_label: Logging
-sidebar_position: 6
+sidebar_position: 4
---
# Logging
@@ -21,7 +21,7 @@ Log4j periodically scans this file for changes and adjusts the logging behavior
### Log4j 2 configuration
-The following [logging-related configuration options](./configuration.md) are available:
+The following [logging-related configuration options](../configuration.md) are available:
| Configuration | Description | Default |
|---------------------------------|-------------------------------------------------------------------------|--------------------------------|
@@ -56,6 +56,10 @@ For Fluss distributions this means you have to:
* remove the `log4j-slf4j-impl` jar from the lib directory.
* add the `logback-core`, and `logback-classic` jars to the lib directory.
+:::info
+Fluss currently uses SLF4J 1.7.x, which is _incompatible_ with logback 1.3.0 and higher.
+:::
+
The Fluss distribution ships with the following logback configuration files in the conf directory, which are used automatically if logback is enabled:
* `logback-console.xml`: used for CoordinatorServer/TabletServer if they are run in the foreground (e.g., Kubernetes).
* `logback.xml`: used for CoordinatorServer/TabletServer by default.
diff --git a/website/docs/maintenance/metric-reporters.md b/website/docs/maintenance/observability/metric-reporters.md
similarity index 99%
rename from website/docs/maintenance/metric-reporters.md
rename to website/docs/maintenance/observability/metric-reporters.md
index f4e796cbb..2e387adc2 100644
--- a/website/docs/maintenance/metric-reporters.md
+++ b/website/docs/maintenance/observability/metric-reporters.md
@@ -1,6 +1,6 @@
---
sidebar_label: Metric Reporters
-sidebar_position: 4
+sidebar_position: 2
---
# Metric Reporters
diff --git a/website/docs/maintenance/monitor-metrics.md b/website/docs/maintenance/observability/monitor-metrics.md
similarity index 81%
rename from website/docs/maintenance/monitor-metrics.md
rename to website/docs/maintenance/observability/monitor-metrics.md
index 795d02b2d..8087692bd 100644
--- a/website/docs/maintenance/monitor-metrics.md
+++ b/website/docs/maintenance/observability/monitor-metrics.md
@@ -1,6 +1,6 @@
---
sidebar_label: Monitor Metrics
-sidebar_position: 5
+sidebar_position: 3
---
# Monitor Metrics
@@ -710,175 +710,4 @@ How to use flink metrics, you can see [flink metrics](https://nightlies.apache.o
Meter |
-
-
-## Observability (Prometheus + Grafana)
-
-We provide a minimal quickstart configuration for application observability with Prometheus and
-Grafana [here](../assets/fluss-quickstart-observability.zip). The quickstart configuration comes with 2 dashboards.
-
-- `Fluss – overview`: Selected metrics to observe the overall cluster status
-- `Fluss – detail`: Majority of metrics listed in [metrics list](#metrics-list)
-
-
-### Quickstart
-
-Based on the [Flink quickstart guide](/docs/quickstart/flink), you can add observability capabilities as follows.
-
-1. Download the [observability quickstart configuration](../assets/fluss-quickstart-observability.zip) and extract the ZIP archive in your working directory.
-After extracting the archive, the contents of the working directory should be as follows.
-
-```
-├── docker-compose.yml # docker compose manifest from quickstart guide
-└── fluss-quickstart-observability # downloaded and extracted ZIP archive
- ├── grafana
- │ ├── grafana.ini
- │ └── provisioning
- │ ├── dashboards
- │ │ ├── default.yml
- │ │ └── fluss
- │ │ └── ...
- │ └── datatsources
- │ └── default.yml
- └── prometheus
- └── prometheus.yml
-```
-
-
-2. Next, you need to adapt the `docker-compose.yml` manifest and
-
-- add containers for Prometheus and Grafana and mount the corresponding configuration directories, and
-- configure Fluss to expose metrics via Prometheus
-```
-metrics.reporters: prometheus
-metrics.reporter.prometheus.port: 9250
-```
-- configure Flink to expose metrics via Prometheus
-```
-metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
-metrics.reporter.prom.port: 9250
-```
-
-You can simply copy the manifest below into your `docker-compose.yml`
-
-
-```yaml
-services:
- #begin Flink cluster
- coordinator-server:
- image: fluss/fluss:0.5.0
- command: coordinatorServer
- depends_on:
- - zookeeper
- environment:
- - |
- FLUSS_PROPERTIES=
- zookeeper.address: zookeeper:2181
- coordinator.host: coordinator-server
- remote.data.dir: /tmp/fluss/remote-data
- lakehouse.storage: paimon
- paimon.catalog.metastore: filesystem
- paimon.catalog.warehouse: /tmp/paimon
- metrics.reporters: prometheus
- metrics.reporter.prometheus.port: 9250
- tablet-server:
- image: fluss/fluss:0.5.0
- command: tabletServer
- depends_on:
- - coordinator-server
- environment:
- - |
- FLUSS_PROPERTIES=
- zookeeper.address: zookeeper:2181
- tablet-server.host: tablet-server
- data.dir: /tmp/fluss/data
- remote.data.dir: /tmp/fluss/remote-data
- kv.snapshot.interval: 0s
- lakehouse.storage: paimon
- paimon.catalog.metastore: filesystem
- paimon.catalog.warehouse: /tmp/paimon
- metrics.reporters: prometheus
- metrics.reporter.prometheus.port: 9250
- zookeeper:
- restart: always
- image: zookeeper:3.9.2
- #end
- #begin Flink cluster
- jobmanager:
- image: fluss/quickstart-flink:1.20-0.5
- ports:
- - "8083:8081"
- command: jobmanager
- environment:
- - |
- FLINK_PROPERTIES=
- jobmanager.rpc.address: jobmanager
- metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
- metrics.reporter.prom.port: 9250
- volumes:
- - shared-tmpfs:/tmp/paimon
- taskmanager:
- image: fluss/quickstart-flink:1.20-0.5
- depends_on:
- - jobmanager
- command: taskmanager
- environment:
- - |
- FLINK_PROPERTIES=
- jobmanager.rpc.address: jobmanager
- taskmanager.numberOfTaskSlots: 10
- taskmanager.memory.process.size: 2048m
- taskmanager.memory.framework.off-heap.size: 256m
- metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
- metrics.reporter.prom.port: 9250
- volumes:
- - shared-tmpfs:/tmp/paimon
- #end
- #begin observability
- prometheus:
- image: bitnami/prometheus:2.55.1-debian-12-r0
- ports:
- - 9092:9090
- volumes:
- - ./fluss-quickstart-observability/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- grafana:
- image:
- grafana/grafana:11.4.0
- ports:
- - 3002:3000
- depends_on:
- - prometheus
- volumes:
- - ./fluss-quickstart-observability/grafana:/etc/grafana:ro
- #end
-
-volumes:
- shared-tmpfs:
- driver: local
- driver_opts:
- type: "tmpfs"
- device: "tmpfs"
-```
-
-and run
-
-```shell
-docker compose up -d
-```
-
-to apply the changes.
-
-:::warning
-This recreates `shared-tmpfs` and all data is lost (created tables, running jobs, etc.)
-:::
-
-Make sure that the Prometheus and Grafana container are up and running using
-
-```shell
-docker ps
-```
-
-3. Now you are all set! You can visit
-
-- [Grafana](http://localhost:3002/dashboards) to observe the cluster status of the Fluss and Flink cluster with the provided dashboards, or
-- the [Prometheus Web UI](http://localhost:9092) to directly query Prometheus with [PromQL](https://prometheus.io/docs/prometheus/2.55/getting_started/).
\ No newline at end of file
+
\ No newline at end of file
diff --git a/website/docs/maintenance/observability/quickstart.md b/website/docs/maintenance/observability/quickstart.md
new file mode 100644
index 000000000..0952becb4
--- /dev/null
+++ b/website/docs/maintenance/observability/quickstart.md
@@ -0,0 +1,220 @@
+---
+sidebar_label: Quickstart Guides
+sidebar_position: 1
+---
+
+# Observability Quickstart Guides
+
+On this page, you can find the following guides to set up and observability stack based on the instructions in the [Flink quickstart guide](/docs/quickstart/flink):
+
+- [Observability with Prometheus, Loki and Grafana](#observability-with-prometheus-loki-and-grafana)
+
+## Observability with Prometheus, Loki and Grafana
+
+We provide a minimal quickstart configuration for application observability with Prometheus (metric aggregation system), Loki (log aggregation sytem) and Grafana (dashboard system)
+ [here](../../assets/fluss-quickstart-observability.zip).
+
+The quickstart configuration comes with 2 metric dashboards.
+
+- `Fluss – overview`: Selected metrics to observe the overall cluster status
+- `Fluss – detail`: Majority of metrics listed in [metrics list](./monitor-metrics.md#metrics-list)
+
+Follow the instructions below to add observability capabilities to your setup.
+
+1. Download the [observability quickstart configuration](../../assets/fluss-quickstart-observability.zip) and extract the ZIP archive in your working directory.
+After extracting the archive, the contents of the working directory should be as follows.
+
+```
+├── docker-compose.yml # docker compose manifest from quickstart guide
+└── fluss-quickstart-observability # downloaded and extracted ZIP archive
+ ├── grafana
+ │ ├── grafana.ini
+ │ └── provisioning
+ │ ├── dashboards
+ │ │ ├── default.yml
+ │ │ └── fluss
+ │ │ └── ...
+ │ └── datatsources
+ │ └── default.yml
+ ├── prometheus
+ │ └── prometheus.yml
+ └── slf4j
+ └── ...
+```
+
+2. Next, you need to configure Fluss to expose logs to Loki. We will use [Loki4j](https://loki4j.github.io/loki-logback-appender/) which uses Logback as logging backend.
+The container manifest below configures Fluss to use Logback and Loki4j. Save it to a file named `fluss-slf4j-logback.Dockerfile` in your working directory.
+
+```dockerfile
+# should be the same Fluss image as in the Flink quickstart guide
+FROM fluss/fluss:0.5.0
+
+ENV ENV_BASE_DIR /opt/fluss
+
+# remove default logging backend from classpath and add logback to classpath
+RUN rm -rf ${ENV_BASE_DIR}/lib/log4j-slf4j-impl-*.jar && \
+ wget https://repo1.maven.org/maven2/ch/qos/logback/logback-classic/1.2.13/logback-classic-1.2.13.jar -P ${ENV_BASE_DIR}/lib/ && \
+ wget https://repo1.maven.org/maven2/ch/qos/logback/logback-core/1.2.13/logback-core-1.2.13.jar -P ${ENV_BASE_DIR}/lib/
+
+# add loki4j logback appender to classpath
+RUN wget https://repo1.maven.org/maven2/com/github/loki4j/loki-logback-appender/1.4.2/loki-logback-appender-1.4.2.jar -P ${ENV_BASE_DIR}/lib/
+
+# logback configuration that exposes metrics to loki
+COPY fluss-quickstart-observability/slf4j/logback-loki-console.xml ${ENV_BASE_DIR}/conf/logback-console.xml
+```
+
+:::note
+Detailed configuration instructions for Fluss and Logback can be found [here](./logging.md#configuring-logback).
+:::
+
+3. Additionally, you need to adapt the `docker-compose.yml` manifest and
+
+- add containers for Prometheus, Loki and Grafana and mount the corresponding configuration directories, and
+- build and use the new Fluss image manifest (`fluss-sfl4j-logback.Dockerfile`), configure Fluss to expose metrics via Prometheus
+```
+metrics.reporters: prometheus
+metrics.reporter.prometheus.port: 9250
+```
+and add the desired application name that should be used when displaying logs in Grafana as environment variable (`APP_NAME`).
+- configure Flink to expose metrics via Prometheus
+```
+metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
+metrics.reporter.prom.port: 9250
+```
+
+You can simply copy the manifest below into your `docker-compose.yml`
+
+
+```yaml
+services:
+ #begin Fluss cluster
+ coordinator-server:
+ image: fluss-slf4j-logback:0.5.0
+ build:
+ dockerfile: fluss-slf4j-logback.Dockerfile
+ command: coordinatorServer
+ depends_on:
+ - zookeeper
+ environment:
+ - |
+ FLUSS_PROPERTIES=
+ zookeeper.address: zookeeper:2181
+ coordinator.host: coordinator-server
+ remote.data.dir: /tmp/fluss/remote-data
+ lakehouse.storage: paimon
+ paimon.catalog.metastore: filesystem
+ paimon.catalog.warehouse: /tmp/paimon
+ metrics.reporters: prometheus
+ metrics.reporter.prometheus.port: 9250
+ - APP_NAME=coordinator-server
+ tablet-server:
+ image: fluss-slf4j-logback:0.5.0
+ build:
+ dockerfile: fluss-slf4j-logback.Dockerfile
+ command: tabletServer
+ depends_on:
+ - coordinator-server
+ environment:
+ - |
+ FLUSS_PROPERTIES=
+ zookeeper.address: zookeeper:2181
+ tablet-server.host: tablet-server
+ data.dir: /tmp/fluss/data
+ remote.data.dir: /tmp/fluss/remote-data
+ kv.snapshot.interval: 0s
+ lakehouse.storage: paimon
+ paimon.catalog.metastore: filesystem
+ paimon.catalog.warehouse: /tmp/paimon
+ metrics.reporters: prometheus
+ metrics.reporter.prometheus.port: 9250
+ logback.configurationFile=logback-loki-console.xml
+ - APP_NAME=tablet-server
+ zookeeper:
+ restart: always
+ image: zookeeper:3.9.2
+ #end
+ #begin Flink cluster
+ jobmanager:
+ image: fluss/quickstart-flink:1.20-0.5
+ ports:
+ - "8083:8081"
+ command: jobmanager
+ environment:
+ - |
+ FLINK_PROPERTIES=
+ jobmanager.rpc.address: jobmanager
+ metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
+ metrics.reporter.prom.port: 9250
+ volumes:
+ - shared-tmpfs:/tmp/paimon
+ taskmanager:
+ image: fluss/quickstart-flink:1.20-0.5
+ depends_on:
+ - jobmanager
+ command: taskmanager
+ environment:
+ - |
+ FLINK_PROPERTIES=
+ jobmanager.rpc.address: jobmanager
+ taskmanager.numberOfTaskSlots: 10
+ taskmanager.memory.process.size: 2048m
+ taskmanager.memory.framework.off-heap.size: 256m
+ metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
+ metrics.reporter.prom.port: 9250
+ volumes:
+ - shared-tmpfs:/tmp/paimon
+ #end
+ #begin observability
+ prometheus:
+ image: bitnami/prometheus:2.55.1-debian-12-r0
+ ports:
+ - "9092:9090"
+ volumes:
+ - ./fluss-quickstart-observability/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
+ loki:
+ image: grafana/loki:3.3.2
+ ports:
+ - "3102:3100"
+ grafana:
+ image:
+ grafana/grafana:11.4.0
+ ports:
+ - "3002:3000"
+ depends_on:
+ - prometheus
+ - loki
+ volumes:
+ - ./fluss-quickstart-observability/grafana:/etc/grafana:ro
+ #end
+
+volumes:
+ shared-tmpfs:
+ driver: local
+ driver_opts:
+ type: "tmpfs"
+ device: "tmpfs"
+```
+
+and run
+
+```shell
+# note the --build flag!
+docker compose up -d --build
+```
+
+to apply the changes.
+
+:::warning
+This recreates `shared-tmpfs` and all data is lost (created tables, running jobs, etc.)
+:::
+
+Make sure that the modified and added containers are up and running using
+
+```shell
+docker ps
+```
+
+4. Now you are all set! You can visit
+
+- Grafana to view Fluss logs with the [log explorer](http://localhost:3002/a/grafana-lokiexplore-app/) and observe metrics of the Fluss and Flink cluster with the [provided dashboards](http://localhost:3002/dashboards)
+- the [Prometheus Web UI](http://localhost:9092) to directly query Prometheus with [PromQL](https://prometheus.io/docs/prometheus/2.55/getting_started/).
\ No newline at end of file
diff --git a/website/docs/quickstart/flink.md b/website/docs/quickstart/flink.md
index 94f62d0fc..711fc7be0 100644
--- a/website/docs/quickstart/flink.md
+++ b/website/docs/quickstart/flink.md
@@ -37,11 +37,11 @@ cd fluss-quickstart-flink
```yaml
services:
- #begin Flink cluster
+ #begin Fluss cluster
coordinator-server:
image: fluss/fluss:0.5.0
command: coordinatorServer
@@ -135,6 +135,7 @@ to check whether all containers are running properly.
You can also visit http://localhost:8083/ to see if Flink is running normally.
:::note
+- If you want to additionally use an observability stack, follow one of the provided quickstart guides [here](../maintenance/observability/quickstart.md) and then continue with this guide.
- If you want to run with your own Flink environment, remember to download the [fluss-connector-flink](/downloads), [flink-connector-faker](https://github.com/knaufk/flink-faker/releases), [paimon-flink](https://paimon.apache.org/docs/0.8/flink/quick-start/) connector jars and then put them to `FLINK_HOME/lib/`.
- All the following commands involving `docker compose` should be executed in the created working directory that contains the `docker-compose.yml` file.
:::
@@ -493,6 +494,4 @@ docker compose down -v
to stop all containers.
## Learn more
-Now that you're up an running with Fluss and Flink, check out
-- the [Apache Flink Engine](engine-flink/getting-started.md) docs to learn more features with Flink
-- [this guide](/docs/maintenance/monitor-metrics/#observability-prometheus--grafana) to learn how to set up an observability stack for Fluss and Flink.
\ No newline at end of file
+Now that you're up an running with Fluss and Flink, check out the [Apache Flink Engine](engine-flink/getting-started.md) docs to learn more features with Flink.
\ No newline at end of file