Skip to content

Commit

Permalink
(WIP) improve README
Browse files Browse the repository at this point in the history
  • Loading branch information
jhoblitt committed May 1, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
1 parent bf75271 commit 687cb41
Showing 1 changed file with 88 additions and 21 deletions.
109 changes: 88 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,73 @@
# gnocpush

## Development

```bash
virtualenv venv
. venv/bin/activate
pip install --editable .
A simple service to accept webhook payloads from [Prometheus Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) and to push those alerts on to [GlobalNOC's Alertmon](https://alertmon-stage.grnoc.iu.edu/alertmon2/).

## Alert format

`gnocpush` expects that alerts have labels taht match GlocalNOC's required parameter names.

### Required labels

* `node_name` - The name of the node that the alert is associated with.
* `service_name` - The name of the service that the alert is associated with.
* `severity` - The severity of the alert. One of: `Critical`, `Major`, `Minor`, `Unknown`, `OK`
* `description` - A description of the alert.

### Optional labels

* `device` - The subcomponent of the node that is alarming.
* `start_time` - The time that the alert started.

### Example PrometheusRule

```yaml
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
lsst.io/rule: "true"
name: net
spec:
groups:
- name: net.rules
rules:
- alert: lhn_interface_up
annotations:
description: '{{ $labels.instance }} - {{ $labels.ifName }}|{{ $labels.ifAlias }} is down'
expr: ifOperStatus{ifAlias=~".*LHN.*"} != 1
for: 30s
labels:
severity: critical
node_name: '{{ $labels.instance }}'
device: '{{ $labels.ifName }}'
service_name: ifInErrors-{{ $labels.ifName}}
gnoc: "true"
```
```bash
docker build -t lsstit/gnocpush .
docker push lsstit/gnocpush
## Alertmanager Configuration
Note that `gnocpush` does not impose any alert grouping constraints.

```yaml
config:
routes:
- receiver: gnocpush
continue: true
repeat_interval: 30s
group_interval: 30s
group_wait: 30s
group_by:
- gnoc
matchers:
- gnoc = "true"
receivers:
- name: gnocpush
webhook_configs:
- url: http://gnocpush.gnocpush:8080/alerts
```

## Testing with OCI image

```bash
docker run -e GNOC_USERNAME=$GNOC_USERNAME -e GNOC_PASSWORD=$GNOC_PASSWORD -e GNOC_SERVER=$GNOC_SERVER -e GNOC_REALM=$GNOC_REALM --network=host lsstit/gnocpush
```

## Testing on k8s
## Deployment on Kubernetes

```bash
helm upgrade --install \
@@ -28,28 +76,47 @@ helm upgrade --install \
-f ./values.yaml
```

### Debugging a Kubernetes Deployment

```bash
k logs alertmanager-kube-prometheus-stack-alertmanager-0 --tail=100 -f
k logs -l app.kubernetes.io/instance=gnocpush -f
```

### prometheus metrics

```bash
k -n gnocpush port-forward gnocpush-dc4d94d8-mqvqq 8080
$ curl localhost:8080/metrics
```

## Testing gnocpush with curl
## Development

### Local Development

```bash
virtualenv venv
. venv/bin/activate
pip install --editable .
```

### Testing with the OCI image

```bash
docker run \
-e GNOC_USERNAME=$GNOC_USERNAME \
-e GNOC_PASSWORD=$GNOC_PASSWORD \
-e GNOC_SERVER=$GNOC_SERVER \
-e GNOC_REALM=$GNOC_REALM \
--network=host ghcr.io/lsst-it/gnocpush
```

### without auth
### Testing gnocpush with curl

```bash
curl http://localhost:8080/alerts -v --json @- < alerts.json
```

## URLs
## Useful GlobalNOC URLs

### Stage

0 comments on commit 687cb41

Please sign in to comment.