diff --git a/website/docs/assets/fluss-quickstart-observability.zip b/website/docs/assets/fluss-quickstart-observability.zip index 3ebcb67a3..8a1ecda7e 100644 Binary files a/website/docs/assets/fluss-quickstart-observability.zip and b/website/docs/assets/fluss-quickstart-observability.zip differ diff --git a/website/docs/install-deploy/overview.md b/website/docs/install-deploy/overview.md index 5bbf187db..6f03c63eb 100644 --- a/website/docs/install-deploy/overview.md +++ b/website/docs/install-deploy/overview.md @@ -124,8 +124,8 @@ We have listed them in the table below the figure. CoordinatorServer/TabletServer report internal metrics and Fluss client (e.g., connector in Flink jobs) can report additional, client specific metrics as well. -
  • [JMX](/docs/maintenance/metric-reporters#jmx)
  • -
  • [Prometheus](/docs/maintenance/metric-reporters#prometheus)
  • +
  • [JMX](/docs/maintenance/observability/metric-reporters#jmx)
  • +
  • [Prometheus](/docs/maintenance/observability/metric-reporters#prometheus)
  • diff --git a/website/docs/maintenance/observability/_category_.json b/website/docs/maintenance/observability/_category_.json new file mode 100644 index 000000000..d6a5d582e --- /dev/null +++ b/website/docs/maintenance/observability/_category_.json @@ -0,0 +1,4 @@ +{ + "label": "Observability", + "position": 4 +} diff --git a/website/docs/maintenance/logging.md b/website/docs/maintenance/observability/logging.md similarity index 95% rename from website/docs/maintenance/logging.md rename to website/docs/maintenance/observability/logging.md index 636075156..7676cc7dd 100644 --- a/website/docs/maintenance/logging.md +++ b/website/docs/maintenance/observability/logging.md @@ -1,6 +1,6 @@ --- sidebar_label: Logging -sidebar_position: 6 +sidebar_position: 4 --- # Logging @@ -21,7 +21,7 @@ Log4j periodically scans this file for changes and adjusts the logging behavior ### Log4j 2 configuration -The following [logging-related configuration options](./configuration.md) are available: +The following [logging-related configuration options](../configuration.md) are available: | Configuration | Description | Default | |---------------------------------|-------------------------------------------------------------------------|--------------------------------| @@ -56,6 +56,10 @@ For Fluss distributions this means you have to: * remove the `log4j-slf4j-impl` jar from the lib directory. * add the `logback-core`, and `logback-classic` jars to the lib directory. +:::info +Fluss currently uses SLF4J 1.7.x, which is _incompatible_ with logback 1.3.0 and higher. +::: + The Fluss distribution ships with the following logback configuration files in the conf directory, which are used automatically if logback is enabled: * `logback-console.xml`: used for CoordinatorServer/TabletServer if they are run in the foreground (e.g., Kubernetes). * `logback.xml`: used for CoordinatorServer/TabletServer by default. diff --git a/website/docs/maintenance/metric-reporters.md b/website/docs/maintenance/observability/metric-reporters.md similarity index 99% rename from website/docs/maintenance/metric-reporters.md rename to website/docs/maintenance/observability/metric-reporters.md index f4e796cbb..2e387adc2 100644 --- a/website/docs/maintenance/metric-reporters.md +++ b/website/docs/maintenance/observability/metric-reporters.md @@ -1,6 +1,6 @@ --- sidebar_label: Metric Reporters -sidebar_position: 4 +sidebar_position: 2 --- # Metric Reporters diff --git a/website/docs/maintenance/monitor-metrics.md b/website/docs/maintenance/observability/monitor-metrics.md similarity index 81% rename from website/docs/maintenance/monitor-metrics.md rename to website/docs/maintenance/observability/monitor-metrics.md index 795d02b2d..8087692bd 100644 --- a/website/docs/maintenance/monitor-metrics.md +++ b/website/docs/maintenance/observability/monitor-metrics.md @@ -1,6 +1,6 @@ --- sidebar_label: Monitor Metrics -sidebar_position: 5 +sidebar_position: 3 --- # Monitor Metrics @@ -710,175 +710,4 @@ How to use flink metrics, you can see [flink metrics](https://nightlies.apache.o Meter - - -## Observability (Prometheus + Grafana) - -We provide a minimal quickstart configuration for application observability with Prometheus and -Grafana [here](../assets/fluss-quickstart-observability.zip). The quickstart configuration comes with 2 dashboards. - -- `Fluss – overview`: Selected metrics to observe the overall cluster status -- `Fluss – detail`: Majority of metrics listed in [metrics list](#metrics-list) - - -### Quickstart - -Based on the [Flink quickstart guide](/docs/quickstart/flink), you can add observability capabilities as follows. - -1. Download the [observability quickstart configuration](../assets/fluss-quickstart-observability.zip) and extract the ZIP archive in your working directory. -After extracting the archive, the contents of the working directory should be as follows. - -``` -├── docker-compose.yml # docker compose manifest from quickstart guide -└── fluss-quickstart-observability # downloaded and extracted ZIP archive - ├── grafana - │ ├── grafana.ini - │ └── provisioning - │ ├── dashboards - │ │ ├── default.yml - │ │ └── fluss - │ │ └── ... - │ └── datatsources - │ └── default.yml - └── prometheus - └── prometheus.yml -``` - - -2. Next, you need to adapt the `docker-compose.yml` manifest and - -- add containers for Prometheus and Grafana and mount the corresponding configuration directories, and -- configure Fluss to expose metrics via Prometheus -``` -metrics.reporters: prometheus -metrics.reporter.prometheus.port: 9250 -``` -- configure Flink to expose metrics via Prometheus -``` -metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory -metrics.reporter.prom.port: 9250 -``` - -You can simply copy the manifest below into your `docker-compose.yml` - - -```yaml -services: - #begin Flink cluster - coordinator-server: - image: fluss/fluss:0.5.0 - command: coordinatorServer - depends_on: - - zookeeper - environment: - - | - FLUSS_PROPERTIES= - zookeeper.address: zookeeper:2181 - coordinator.host: coordinator-server - remote.data.dir: /tmp/fluss/remote-data - lakehouse.storage: paimon - paimon.catalog.metastore: filesystem - paimon.catalog.warehouse: /tmp/paimon - metrics.reporters: prometheus - metrics.reporter.prometheus.port: 9250 - tablet-server: - image: fluss/fluss:0.5.0 - command: tabletServer - depends_on: - - coordinator-server - environment: - - | - FLUSS_PROPERTIES= - zookeeper.address: zookeeper:2181 - tablet-server.host: tablet-server - data.dir: /tmp/fluss/data - remote.data.dir: /tmp/fluss/remote-data - kv.snapshot.interval: 0s - lakehouse.storage: paimon - paimon.catalog.metastore: filesystem - paimon.catalog.warehouse: /tmp/paimon - metrics.reporters: prometheus - metrics.reporter.prometheus.port: 9250 - zookeeper: - restart: always - image: zookeeper:3.9.2 - #end - #begin Flink cluster - jobmanager: - image: fluss/quickstart-flink:1.20-0.5 - ports: - - "8083:8081" - command: jobmanager - environment: - - | - FLINK_PROPERTIES= - jobmanager.rpc.address: jobmanager - metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory - metrics.reporter.prom.port: 9250 - volumes: - - shared-tmpfs:/tmp/paimon - taskmanager: - image: fluss/quickstart-flink:1.20-0.5 - depends_on: - - jobmanager - command: taskmanager - environment: - - | - FLINK_PROPERTIES= - jobmanager.rpc.address: jobmanager - taskmanager.numberOfTaskSlots: 10 - taskmanager.memory.process.size: 2048m - taskmanager.memory.framework.off-heap.size: 256m - metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory - metrics.reporter.prom.port: 9250 - volumes: - - shared-tmpfs:/tmp/paimon - #end - #begin observability - prometheus: - image: bitnami/prometheus:2.55.1-debian-12-r0 - ports: - - 9092:9090 - volumes: - - ./fluss-quickstart-observability/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro - grafana: - image: - grafana/grafana:11.4.0 - ports: - - 3002:3000 - depends_on: - - prometheus - volumes: - - ./fluss-quickstart-observability/grafana:/etc/grafana:ro - #end - -volumes: - shared-tmpfs: - driver: local - driver_opts: - type: "tmpfs" - device: "tmpfs" -``` - -and run - -```shell -docker compose up -d -``` - -to apply the changes. - -:::warning -This recreates `shared-tmpfs` and all data is lost (created tables, running jobs, etc.) -::: - -Make sure that the Prometheus and Grafana container are up and running using - -```shell -docker ps -``` - -3. Now you are all set! You can visit - -- [Grafana](http://localhost:3002/dashboards) to observe the cluster status of the Fluss and Flink cluster with the provided dashboards, or -- the [Prometheus Web UI](http://localhost:9092) to directly query Prometheus with [PromQL](https://prometheus.io/docs/prometheus/2.55/getting_started/). \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/maintenance/observability/quickstart.md b/website/docs/maintenance/observability/quickstart.md new file mode 100644 index 000000000..0952becb4 --- /dev/null +++ b/website/docs/maintenance/observability/quickstart.md @@ -0,0 +1,220 @@ +--- +sidebar_label: Quickstart Guides +sidebar_position: 1 +--- + +# Observability Quickstart Guides + +On this page, you can find the following guides to set up and observability stack based on the instructions in the [Flink quickstart guide](/docs/quickstart/flink): + +- [Observability with Prometheus, Loki and Grafana](#observability-with-prometheus-loki-and-grafana) + +## Observability with Prometheus, Loki and Grafana + +We provide a minimal quickstart configuration for application observability with Prometheus (metric aggregation system), Loki (log aggregation sytem) and Grafana (dashboard system) + [here](../../assets/fluss-quickstart-observability.zip). + +The quickstart configuration comes with 2 metric dashboards. + +- `Fluss – overview`: Selected metrics to observe the overall cluster status +- `Fluss – detail`: Majority of metrics listed in [metrics list](./monitor-metrics.md#metrics-list) + +Follow the instructions below to add observability capabilities to your setup. + +1. Download the [observability quickstart configuration](../../assets/fluss-quickstart-observability.zip) and extract the ZIP archive in your working directory. +After extracting the archive, the contents of the working directory should be as follows. + +``` +├── docker-compose.yml # docker compose manifest from quickstart guide +└── fluss-quickstart-observability # downloaded and extracted ZIP archive + ├── grafana + │ ├── grafana.ini + │ └── provisioning + │ ├── dashboards + │ │ ├── default.yml + │ │ └── fluss + │ │ └── ... + │ └── datatsources + │ └── default.yml + ├── prometheus + │ └── prometheus.yml + └── slf4j + └── ... +``` + +2. Next, you need to configure Fluss to expose logs to Loki. We will use [Loki4j](https://loki4j.github.io/loki-logback-appender/) which uses Logback as logging backend. +The container manifest below configures Fluss to use Logback and Loki4j. Save it to a file named `fluss-slf4j-logback.Dockerfile` in your working directory. + +```dockerfile +# should be the same Fluss image as in the Flink quickstart guide +FROM fluss/fluss:0.5.0 + +ENV ENV_BASE_DIR /opt/fluss + +# remove default logging backend from classpath and add logback to classpath +RUN rm -rf ${ENV_BASE_DIR}/lib/log4j-slf4j-impl-*.jar && \ + wget https://repo1.maven.org/maven2/ch/qos/logback/logback-classic/1.2.13/logback-classic-1.2.13.jar -P ${ENV_BASE_DIR}/lib/ && \ + wget https://repo1.maven.org/maven2/ch/qos/logback/logback-core/1.2.13/logback-core-1.2.13.jar -P ${ENV_BASE_DIR}/lib/ + +# add loki4j logback appender to classpath +RUN wget https://repo1.maven.org/maven2/com/github/loki4j/loki-logback-appender/1.4.2/loki-logback-appender-1.4.2.jar -P ${ENV_BASE_DIR}/lib/ + +# logback configuration that exposes metrics to loki +COPY fluss-quickstart-observability/slf4j/logback-loki-console.xml ${ENV_BASE_DIR}/conf/logback-console.xml +``` + +:::note +Detailed configuration instructions for Fluss and Logback can be found [here](./logging.md#configuring-logback). +::: + +3. Additionally, you need to adapt the `docker-compose.yml` manifest and + +- add containers for Prometheus, Loki and Grafana and mount the corresponding configuration directories, and +- build and use the new Fluss image manifest (`fluss-sfl4j-logback.Dockerfile`), configure Fluss to expose metrics via Prometheus +``` +metrics.reporters: prometheus +metrics.reporter.prometheus.port: 9250 +``` +and add the desired application name that should be used when displaying logs in Grafana as environment variable (`APP_NAME`). +- configure Flink to expose metrics via Prometheus +``` +metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory +metrics.reporter.prom.port: 9250 +``` + +You can simply copy the manifest below into your `docker-compose.yml` + + +```yaml +services: + #begin Fluss cluster + coordinator-server: + image: fluss-slf4j-logback:0.5.0 + build: + dockerfile: fluss-slf4j-logback.Dockerfile + command: coordinatorServer + depends_on: + - zookeeper + environment: + - | + FLUSS_PROPERTIES= + zookeeper.address: zookeeper:2181 + coordinator.host: coordinator-server + remote.data.dir: /tmp/fluss/remote-data + lakehouse.storage: paimon + paimon.catalog.metastore: filesystem + paimon.catalog.warehouse: /tmp/paimon + metrics.reporters: prometheus + metrics.reporter.prometheus.port: 9250 + - APP_NAME=coordinator-server + tablet-server: + image: fluss-slf4j-logback:0.5.0 + build: + dockerfile: fluss-slf4j-logback.Dockerfile + command: tabletServer + depends_on: + - coordinator-server + environment: + - | + FLUSS_PROPERTIES= + zookeeper.address: zookeeper:2181 + tablet-server.host: tablet-server + data.dir: /tmp/fluss/data + remote.data.dir: /tmp/fluss/remote-data + kv.snapshot.interval: 0s + lakehouse.storage: paimon + paimon.catalog.metastore: filesystem + paimon.catalog.warehouse: /tmp/paimon + metrics.reporters: prometheus + metrics.reporter.prometheus.port: 9250 + logback.configurationFile=logback-loki-console.xml + - APP_NAME=tablet-server + zookeeper: + restart: always + image: zookeeper:3.9.2 + #end + #begin Flink cluster + jobmanager: + image: fluss/quickstart-flink:1.20-0.5 + ports: + - "8083:8081" + command: jobmanager + environment: + - | + FLINK_PROPERTIES= + jobmanager.rpc.address: jobmanager + metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory + metrics.reporter.prom.port: 9250 + volumes: + - shared-tmpfs:/tmp/paimon + taskmanager: + image: fluss/quickstart-flink:1.20-0.5 + depends_on: + - jobmanager + command: taskmanager + environment: + - | + FLINK_PROPERTIES= + jobmanager.rpc.address: jobmanager + taskmanager.numberOfTaskSlots: 10 + taskmanager.memory.process.size: 2048m + taskmanager.memory.framework.off-heap.size: 256m + metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory + metrics.reporter.prom.port: 9250 + volumes: + - shared-tmpfs:/tmp/paimon + #end + #begin observability + prometheus: + image: bitnami/prometheus:2.55.1-debian-12-r0 + ports: + - "9092:9090" + volumes: + - ./fluss-quickstart-observability/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro + loki: + image: grafana/loki:3.3.2 + ports: + - "3102:3100" + grafana: + image: + grafana/grafana:11.4.0 + ports: + - "3002:3000" + depends_on: + - prometheus + - loki + volumes: + - ./fluss-quickstart-observability/grafana:/etc/grafana:ro + #end + +volumes: + shared-tmpfs: + driver: local + driver_opts: + type: "tmpfs" + device: "tmpfs" +``` + +and run + +```shell +# note the --build flag! +docker compose up -d --build +``` + +to apply the changes. + +:::warning +This recreates `shared-tmpfs` and all data is lost (created tables, running jobs, etc.) +::: + +Make sure that the modified and added containers are up and running using + +```shell +docker ps +``` + +4. Now you are all set! You can visit + +- Grafana to view Fluss logs with the [log explorer](http://localhost:3002/a/grafana-lokiexplore-app/) and observe metrics of the Fluss and Flink cluster with the [provided dashboards](http://localhost:3002/dashboards) +- the [Prometheus Web UI](http://localhost:9092) to directly query Prometheus with [PromQL](https://prometheus.io/docs/prometheus/2.55/getting_started/). \ No newline at end of file diff --git a/website/docs/quickstart/flink.md b/website/docs/quickstart/flink.md index 94f62d0fc..711fc7be0 100644 --- a/website/docs/quickstart/flink.md +++ b/website/docs/quickstart/flink.md @@ -37,11 +37,11 @@ cd fluss-quickstart-flink ```yaml services: - #begin Flink cluster + #begin Fluss cluster coordinator-server: image: fluss/fluss:0.5.0 command: coordinatorServer @@ -135,6 +135,7 @@ to check whether all containers are running properly. You can also visit http://localhost:8083/ to see if Flink is running normally. :::note +- If you want to additionally use an observability stack, follow one of the provided quickstart guides [here](../maintenance/observability/quickstart.md) and then continue with this guide. - If you want to run with your own Flink environment, remember to download the [fluss-connector-flink](/downloads), [flink-connector-faker](https://github.com/knaufk/flink-faker/releases), [paimon-flink](https://paimon.apache.org/docs/0.8/flink/quick-start/) connector jars and then put them to `FLINK_HOME/lib/`. - All the following commands involving `docker compose` should be executed in the created working directory that contains the `docker-compose.yml` file. ::: @@ -493,6 +494,4 @@ docker compose down -v to stop all containers. ## Learn more -Now that you're up an running with Fluss and Flink, check out -- the [Apache Flink Engine](engine-flink/getting-started.md) docs to learn more features with Flink -- [this guide](/docs/maintenance/monitor-metrics/#observability-prometheus--grafana) to learn how to set up an observability stack for Fluss and Flink. \ No newline at end of file +Now that you're up an running with Fluss and Flink, check out the [Apache Flink Engine](engine-flink/getting-started.md) docs to learn more features with Flink. \ No newline at end of file