From 56ef4a99b93924580656a3f4fd970f8c88fb64a0 Mon Sep 17 00:00:00 2001 From: TianKai Ma Date: Wed, 15 Jan 2025 15:33:12 +0800 Subject: [PATCH] feat: add tutorial documentation for monitoring setup and update navigation --- docs/lab/{ => admin}/checklist.md | 0 docs/lab/{ => admin}/network.md | 0 docs/lab/{srv => }/img/1.png | Bin docs/lab/srv/8x4090.md | 32 +----------------------------- docs/lab/tutorial.md | 31 +++++++++++++++++++++++++++++ mkdocs.yml | 6 ++++-- 6 files changed, 36 insertions(+), 33 deletions(-) rename docs/lab/{ => admin}/checklist.md (100%) rename docs/lab/{ => admin}/network.md (100%) rename docs/lab/{srv => }/img/1.png (100%) create mode 100644 docs/lab/tutorial.md diff --git a/docs/lab/checklist.md b/docs/lab/admin/checklist.md similarity index 100% rename from docs/lab/checklist.md rename to docs/lab/admin/checklist.md diff --git a/docs/lab/network.md b/docs/lab/admin/network.md similarity index 100% rename from docs/lab/network.md rename to docs/lab/admin/network.md diff --git a/docs/lab/srv/img/1.png b/docs/lab/img/1.png similarity index 100% rename from docs/lab/srv/img/1.png rename to docs/lab/img/1.png diff --git a/docs/lab/srv/8x4090.md b/docs/lab/srv/8x4090.md index e253c2e..6a2bb69 100644 --- a/docs/lab/srv/8x4090.md +++ b/docs/lab/srv/8x4090.md @@ -4,34 +4,4 @@ 从上架开始就接手的服务器,记录的内容会更全面一点。 - 上架检查单 -> [docs/lab/checklist.md](/lab/checklist) - -## 监控 - - - -配置了 Prometheus + Grafana 来做监控: - -- CPU、内存、硬盘、网络流量:`node-exporter` -- GPU 监控:`dcgm-exporter` -- 监控本体:`prometheus` -- 可视化:`grafana` - -配置文件参考: - -!!! note "" - - + 为 Grafana 开启了「允许未登录」的设置,可以直接访问查看监控数据,只能查看不能修改。 - + 同机房的另一台机器 `8xa6000` 使用了类似的部署方案,但使用这台机器的 grafana 做可视化,在下面的设置中可以切换数据源。 - - ![切换 Prometheus 数据源](./img/1.png){width=400} - -## 网络说明 - -+ 使用如下命令设置代理: - - ```bash - export http_proxy="http://192.168.50.1:7890"; - export https_proxy=$http_proxy; - export no_proxy="localhost, 127.0.0.1, ::1" - ``` \ No newline at end of file + 上架检查单 -> [docs/lab/checklist.md](/lab/checklist) \ No newline at end of file diff --git a/docs/lab/tutorial.md b/docs/lab/tutorial.md new file mode 100644 index 0000000..5c297aa --- /dev/null +++ b/docs/lab/tutorial.md @@ -0,0 +1,31 @@ +# 使用文档 + +## 监控 + + + +配置了 Prometheus + Grafana 来做监控: + +- CPU、内存、硬盘、网络流量:`node-exporter` +- GPU 监控:`dcgm-exporter` +- 监控本体:`prometheus` +- 可视化:`grafana` + +配置文件参考: + +!!! note "" + + + 为 Grafana 开启了「允许未登录」的设置,可以直接访问查看监控数据,只能查看不能修改。 + + 同机房的另一台机器 `8xa6000` 使用了类似的部署方案,但使用这台机器的 grafana 做可视化,在下面的设置中可以切换数据源。 + + ![切换 Prometheus 数据源](./img/1.png){width=400} + +## 网络说明 + ++ 使用如下命令设置代理: + + ```bash + export http_proxy="http://192.168.50.1:7890"; + export https_proxy=$http_proxy; + export no_proxy="localhost, 127.0.0.1, ::1" + ``` \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 2a9ff33..e4e6ceb 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -71,8 +71,10 @@ nav: - 3D Gaussian Splatting: ml/dl/gs/3d-gs.md - Lab: - lab/index.md - - lab/checklist.md - - lab/network.md + - lab/tutorial.md + - Admin: + - lab/admin/checklist.md + - lab/admin/network.md - Servers: - lab/srv/8x4090.md - lab/srv/8xa6000.md