diff --git a/.cloudignore b/.cloudignore
new file mode 100644
index 0000000..bc4dd39
--- /dev/null
+++ b/.cloudignore
@@ -0,0 +1,342 @@
+# Created by https://www.toptal.com/developers/gitignore/api/visualstudiocode,go,macos,terraform,node,angular,react
+# Edit at https://www.toptal.com/developers/gitignore?templates=visualstudiocode,go,macos,terraform,node,angular,react
+
+secrets/
+
+### Angular ###
+## Angular ##
+# compiled output
+dist/
+tmp/
+app/**/*.js
+app/**/*.js.map
+
+# dependencies
+node_modules/
+bower_components/
+
+# IDEs and editors
+.idea/
+
+# misc
+.sass-cache/
+connect.lock/
+coverage/
+libpeerconnection.log/
+npm-debug.log
+testem.log
+typings/
+.angular/
+
+# e2e
+e2e/*.js
+e2e/*.map
+
+# System Files
+.DS_Store/
+
+### Go ###
+# If you prefer the allow list template instead of the deny list, see community template:
+# https://github.com/github/gitignore/blob/main/community/Golang/Go.AllowList.gitignore
+#
+# Binaries for programs and plugins
+*.exe
+*.exe~
+*.dll
+*.so
+*.dylib
+
+# Test binary, built with `go test -c`
+*.test
+
+# Output of the go coverage tool, specifically when used with LiteIDE
+*.out
+
+# Dependency directories (remove the comment below to include it)
+# vendor/
+
+# Go workspace file
+go.work
+
+### Go Patch ###
+/vendor/
+/Godeps/
+
+### macOS ###
+# General
+.DS_Store
+.AppleDouble
+.LSOverride
+
+# Icon must end with two \r
+Icon
+
+
+# Thumbnails
+._*
+
+# Files that might appear in the root of a volume
+.DocumentRevisions-V100
+.fseventsd
+.Spotlight-V100
+.TemporaryItems
+.Trashes
+.VolumeIcon.icns
+.com.apple.timemachine.donotpresent
+
+# Directories potentially created on remote AFP share
+.AppleDB
+.AppleDesktop
+Network Trash Folder
+Temporary Items
+.apdisk
+
+### macOS Patch ###
+# iCloud generated files
+*.icloud
+
+### Node ###
+# Logs
+logs
+*.log
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+lerna-debug.log*
+.pnpm-debug.log*
+
+# Diagnostic reports (https://nodejs.org/api/report.html)
+report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
+
+# Runtime data
+pids
+*.pid
+*.seed
+*.pid.lock
+
+# Directory for instrumented libs generated by jscoverage/JSCover
+lib-cov
+
+# Coverage directory used by tools like istanbul
+coverage
+*.lcov
+
+# nyc test coverage
+.nyc_output
+
+# Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
+.grunt
+
+# Bower dependency directory (https://bower.io/)
+bower_components
+
+# node-waf configuration
+.lock-wscript
+
+# Compiled binary addons (https://nodejs.org/api/addons.html)
+build/Release
+
+# Dependency directories
+jspm_packages/
+
+# Snowpack dependency directory (https://snowpack.dev/)
+web_modules/
+
+# TypeScript cache
+*.tsbuildinfo
+
+# Optional npm cache directory
+.npm
+
+# Optional eslint cache
+.eslintcache
+
+# Optional stylelint cache
+.stylelintcache
+
+# Microbundle cache
+.rpt2_cache/
+.rts2_cache_cjs/
+.rts2_cache_es/
+.rts2_cache_umd/
+
+# Optional REPL history
+.node_repl_history
+
+# Output of 'npm pack'
+*.tgz
+
+# Yarn Integrity file
+.yarn-integrity
+
+# dotenv environment variable files
+.env
+.env.development.local
+.env.test.local
+.env.production.local
+.env.local
+.env.fake
+
+# parcel-bundler cache (https://parceljs.org/)
+.cache
+.parcel-cache
+
+# Next.js build output
+.next
+out
+
+# Nuxt.js build / generate output
+.nuxt
+dist
+
+# Gatsby files
+.cache/
+# Comment in the public line in if your project uses Gatsby and not Next.js
+# https://nextjs.org/blog/next-9-1#public-directory-support
+# public
+
+# vuepress build output
+.vuepress/dist
+
+# vuepress v2.x temp and cache directory
+.temp
+
+# Docusaurus cache and generated files
+.docusaurus
+
+# Serverless directories
+.serverless/
+
+# FuseBox cache
+.fusebox/
+
+# DynamoDB Local files
+.dynamodb/
+
+# TernJS port file
+.tern-port
+
+# Stores VSCode versions used for testing VSCode extensions
+.vscode-test
+
+# yarn v2
+.yarn/cache
+.yarn/unplugged
+.yarn/build-state.yml
+.yarn/install-state.gz
+.pnp.*
+
+### Node Patch ###
+# Serverless Webpack directories
+.webpack/
+
+# Optional stylelint cache
+
+# SvelteKit build / generate output
+.svelte-kit
+
+### react ###
+.DS_*
+**/*.backup.*
+**/*.back.*
+
+node_modules
+
+*.sublime*
+
+psd
+thumb
+sketch
+
+### Terraform ###
+# Local .terraform directories
+**/.terraform/*
+
+# .tfstate files
+*.tfstate
+*.tfstate.*
+
+# Crash log files
+crash.log
+crash.*.log
+
+# Ignore override files as they are usually used to override resources locally and so
+# are not checked in
+override.tf
+override.tf.json
+*_override.tf
+*_override.tf.json
+
+# Include override files you do wish to add to version control using negated pattern
+# !example_override.tf
+
+# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
+# example: *tfplan*
+
+# Ignore CLI configuration files
+.terraformrc
+terraform.rc
+
+### VisualStudioCode ###
+.vscode/*
+!.vscode/settings.json
+!.vscode/tasks.json
+!.vscode/launch.json
+!.vscode/extensions.json
+!.vscode/*.code-snippets
+
+# Local History for Visual Studio Code
+.history/
+
+# Built Visual Studio Code Extensions
+*.vsix
+
+### VisualStudioCode Patch ###
+# Ignore all local history of files
+.history
+.ionide
+
+# Support for Project snippet scope
+.vscode/*.code-snippets
+
+# Ignore code-workspaces
+*.code-workspace
+
+# End of https://www.toptal.com/developers/gitignore/api/visualstudiocode,go,macos,terraform,node,angular,react
+
+# readd removed values
+# Snyk cache.
+.dccache
+# terraform plan
+tf.plan
+.vscode/launch.json
+src/storage/main.go
+gitlab_sa.key
+
+# Cypress
+cypress/screenshots
+cypress/videos
+
+# Compiled output
+/dist
+/tmp
+/out-tsc
+/bazel-out
+
+# IDEs and editors
+.idea/
+.project
+.classpath
+.c9/
+*.launch
+.settings/
+*.sublime-workspace
+
+# Test certificates
+src/test/certs/*
+
+# Nix Shell
+/shell.nix
+
+# Linter output
+modron-lint.xml
diff --git a/.dockerignore b/.dockerignore
new file mode 100644
index 0000000..ca809ac
--- /dev/null
+++ b/.dockerignore
@@ -0,0 +1,4 @@
+**/node_modules
+**/.angular
+/.git
+/terraform
\ No newline at end of file
diff --git a/.git-cc.yaml b/.git-cc.yaml
new file mode 100644
index 0000000..4c48cde
--- /dev/null
+++ b/.git-cc.yaml
@@ -0,0 +1,11 @@
+scopes:
+ collector: Collector changes
+ docker: Docker related changes
+ nagatha: Nagatha related changes
+ otel: OpenTelemetry related changes
+ rules: Changes to the rules
+ scc: Security Command Center related changes
+ server: Changes to the Modron backend
+ storage: Changes to the storage layer
+ terraform: Terraform related changes
+ ui: UI changes
diff --git a/.gitignore b/.gitignore
index 68188c5..bc4dd39 100644
--- a/.gitignore
+++ b/.gitignore
@@ -175,6 +175,7 @@ web_modules/
.env.test.local
.env.production.local
.env.local
+.env.fake
# parcel-bundler cache (https://parceljs.org/)
.cache
@@ -308,7 +309,6 @@ terraform.rc
.dccache
# terraform plan
tf.plan
-src/storage/bigquerystorage/bqsa_key.json
.vscode/launch.json
src/storage/main.go
gitlab_sa.key
@@ -331,3 +331,12 @@ cypress/videos
*.launch
.settings/
*.sublime-workspace
+
+# Test certificates
+src/test/certs/*
+
+# Nix Shell
+/shell.nix
+
+# Linter output
+modron-lint.xml
diff --git a/.golangci.yml b/.golangci.yml
index 1c67b1c..2ea66a9 100644
--- a/.golangci.yml
+++ b/.golangci.yml
@@ -1,18 +1,34 @@
linters:
disable-all: true
enable:
- - deadcode
+ - contextcheck
- errcheck
+ - errorlint
+ - gocritic
+ - mnd
+ - gosec
- gosimple
- govet
- ineffassign
+ - misspell
+ - nestif
+ - nilerr
+ - nilnil
+ - revive
- staticcheck
- - structcheck
- typecheck
- unused
- - varcheck
+ - wastedassign
+issues:
+ exclude-dirs:
+ - "terraform"
run:
timeout: 5m
- skip-dirs:
- - "terraform"
+
+output:
+ formats:
+ - format: colored-line-number
+ path: stdout
+ - format: junit-xml
+ path: modron-lint.xml
diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 0000000..66916f6
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,65 @@
+# Changelog
+
+## v1.0.0
+
+### Structure
+
+- Support Resource Group Hierarchy
+
+### Observations
+
+- Add [Risk Score](docs/RISK_SCORE.md) to observations, calculated from the severity of the observation (as defined in the rule) and the impact of the observation (detected from the environment)
+- Collectors can now collect observations
+
+### Stats
+
+- Improved stats view
+- Improved export to CSV
+
+### GCP
+
+- Add support for [Security Command Center (SCC)](https://cloud.google.com/security-command-center/docs/concepts-security-command-center-overview)
+- Start collecting Kubernetes resources
+
+### Storage
+
+- Use [GORM](https://gorm.io/) for both the PSQL and SQLite storage backends
+- Use SQLite for the in-memory database for testing
+
+### Performance
+
+- Increase performance overall by optimizing the DB queries, parallelizing the scans, and reducing the number of external calls
+- Introduce rate limiting for the collectors
+
+### Observability
+
+- Use [logrus](https://github.com/sirupsen/logrus) with structured logging for GCP Logging (Stackdriver)
+- Add support for OpenTelemetry
+ - Add an otel-collector to receive traces and metrics
+ - Send traces to [Google Cloud Trace](https://cloud.google.com/trace)
+ - Send metrics to [Google Cloud Monitoring](https://cloud.google.com/monitoring)
+
+### UI
+
+- Completely rework the UI with an improved design
+- Show observations as a table, sorted by Risk Score by default
+- Add a detailed view dialog for the observations
+
+### Misc
+
+- Use [`go-arg`](https://github.com/alexflint/go-arg) for the CLI arguments / environment variables
+- Switch to [buf](https://buf.build/) for the protobuf generation
+- Bug fixes
+- Upgrade to Go 1.23
+- Rules now support external configuration
+
+## v0.2
+
+- Moved to go 1.19
+- Added automated runs for scans
+- Fixed issue where last reported observation would still appear even if newer scans reported no observations
+- Fixed group member ship resolution when checking for accesses to GCP projects
+
+## v0.1
+
+- Initial public release
diff --git a/README.md b/README.md
index 2155a0d..069e592 100644
--- a/README.md
+++ b/README.md
@@ -1,47 +1,143 @@
# Modron - Cloud security compliance
-
+
+
+
-```
-”
-We are the ultimate law. All other law is tainted when compared to us.
-We are order. All other order disappears when held to our light.
-We are structure. All other structure crumbles when brought against us.
-We are perfect law.
-”
-— A spokesmodron
-```
+> _We are the ultimate law. All other law is tainted when compared to us.
+> We are order. All other order disappears when held to our light.
+> We are structure. All other structure crumbles when brought against us.
+> We are perfect law._
+>
+> — A spokesmodron
+
+Monte Cook, Colin McComb (1997-10-28). The Great Modron March. Edited by Michele Carter. (TSR, Inc.), p. 26. ISBN
+0-7869-0648-0.
-Monte Cook, Colin McComb (1997-10-28). The Great Modron March. Edited by Michele Carter. (TSR, Inc.), p. 26. ISBN 0-7869-0648-0.
-The rise of cloud computing has sharply increased the number of resources that need to be managed in a production environment. This has increased the load on security teams. At the same time, vulnerability and compliance scanning on the cloud have made little progress. The process of inventory, data collection, analysis and remediation have scaled up, but did not evolve to manage the scale and diversity of cloud computing assets. Numerous security tools still assume that maintaining inventory, collecting data, looking at results and fixing issues is performed by the same person. This leads to increased pressure on teams already overwhelmed by the size of their infrastructure.
+## Introduction
-Maintaining a secure cloud infrastructure is surprisingly hard. Cloud computing came with the promise of automation and ease of use, yet there is a lot of progress to be made on both of these fronts. Infrastructure security also suffered from the explosion of assets under management and lack of security controls on new and existing assets.
+Modron is a cloud security compliance tool. It is designed to help organizations manage their cloud infrastructure
+and ensure that it is compliant with their security policies.
-Modron addresses the inventory and ownership issues raising with large cloud infrastructure, as well as the scalability of the remediation process by resolving ownership of assets and handling communication with different asset owners.
-Modron still has the security practitioners and leadership teams in mind and provides organization wide statistics about the reported issues.
+Users can navigate the Modron UI and view their resource groups, together with the respective observations.
+Resource Groups that require attention are immediately visible and users can dig deeper to assess the observations.
-Designed with multi cloud and scalability in mind, Modron is based on GCP today. The model allows for writing detection rules once and apply them across multiple platforms.
+
-## Taxonomy
+A detailed explanation of why Modron was created can be read on [the original blog post](https://nianticlabs.com/news/modron).
+
+## Problem Statement
+
+The rise of cloud computing has sharply increased the number of resources that need to be managed in a production
+environment. This has increased the load on security teams. At the same time, vulnerability and compliance scanning on
+the cloud have made little progress. The process of inventory, data collection, analysis and remediation have scaled up,
+but did not evolve to manage the scale and diversity of cloud computing assets. Numerous security tools still assume
+that maintaining inventory, collecting data, looking at results and fixing issues is performed by the same person. This
+leads to increased pressure on teams already overwhelmed by the size of their infrastructure.
+
+Maintaining a secure cloud infrastructure is surprisingly hard. Cloud computing came with the promise of automation and
+ease of use, yet there is a lot of progress to be made on both of these fronts. Infrastructure security also suffered
+from the explosion of assets under management and lack of security controls on new and existing assets.
+
+Modron addresses the inventory and ownership issues raising with large cloud infrastructure, as well as the scalability
+of the remediation process by resolving ownership of assets and handling communication with different asset owners.
+Modron still has the security practitioners and leadership teams in mind and provides organization wide statistics about
+the reported issues.
+
+Designed with multi cloud and scalability in mind, Modron is based on GCP today. The model allows for writing detection
+rules once and apply them across multiple platforms.
+
+## The Modron solution
+
+With the help of Modron, organizations can:
+- Automatically collect data from their cloud infrastructure
+- Run security rules against the collected data
+- Notify the owners of the resources that are not compliant with the security rules
+- Provide a personalized dashboard to visualize the compliance status of the organization
+- Provide engineers information on how to remediate the issues
+
+### Analyzing the Resource Group observations
+
+Through the Modron UI, users can view a list of their resource groups. By clicking on a resource group, they can
+see a list of observations that have been made against that resource group:
+
+
+
+This view provides a list of observations for the resource group. Each observation has an associated "Risk Score" that
+is computed taking into consideration the severity of the rule that generated the observation and the environment in
+which the resource group is running. This allows users to prioritize their remediation efforts.
+
+### Expanding a single observation
+
+By expanding a single observation, users can see more details about it - including remediation steps.
+
+
+
+When available, a command is provided to enable the user to quickly remediate the issue.
+
+### Risk Score
+
+Each observation has an associated Risk Score. This score is computed based on the severity of the rule that generated the
+observation (Severity) and the environment in which the resource group is running (Impact).
+
+Risk Scores range from *INFO* to *CRITICAL* (slightly adapted from the [CVSS v3.x Ratings](https://nvd.nist.gov/vuln-metrics/cvss)).
+By also analyzing the impact of the observation, the risk score can be used to prioritize remediation efforts: an
+observation in an environment containing customer data (e.g: production) will be considered more critical than the same
+finding in an environment containing only test data (e.g: dev).
+
+
-A *Resource* is an entity existing in the cloud platform. A resource can be a VM instance, a service account, a kubernetes clusters, etc.
+The details on how we compute and define the Risk Score are available in the [Risk Score documentation](docs/RISK_SCORE.md).
-A *Resource group* is the smallest administrative grouping of resources, usually administered by the same individual or group of individuals. On GCP this corresponds to a Project, on Azure to a Resource Group.
+### Statistics
-A *Rule* is the implementation of a desired state for a given resource or set of resources. A rule applies only to a predefined set of resources, compares the state of the resource with the expected state. If these states differ, the rule generates one or more *observation*.
+Via the statistics page, users can view a list of all the rules that have been run against their resource groups,
+together with the results. Each rule can be exported to a CSV file for further analysis.
-An *Observation* is an instance of a difference between the state of a resource and its expected state at a given timestamp.
+
-A *Collection* defines the action of fetching all data from the cloud platforms. This data is then stored in the database, ready to run the scan.
+By expanding the "List of the observations", users can see all the matching observations for that rule, regardless of
+the resource group they belong to.
+This tool can be used by security teams to understand the impact of a rule across the organization, and which
+issues to tackle first.
-A *Scan* defines the action of running a set of *rules* against a set of *resource groups*. *Observations* resulting of that scan are added to the database. There is no guarantee that all observations of the same scan will have the same timestamp.
+### Notifications
-*Nagatha* is the notification system associated with Modron. It aggregates notifications going to the same recipient over a given time frame and sends a notification to that user.
+A remediation cannot be effective if the right people are not informed.
+Modron sends notifications to the owners of the resource groups that have observations via [Nagatha](https://github.com/nianticlabs/nagatha).
-A *Notification* is an instance of a message sent to an owner of a *resource group* for a given *observation*.
+Users will receive periodically a notification with the list of observations that they need to address. Nagatha
+will take care of delivering the notification via Slack and Email:
-An *Exception* is one owner of a *resource group* opting out of *notifications* for a specific *rule*. Exceptions *must* have an expiration date and cannot be set forever. This limitation can be bypassed by accessing the nagatha service directly.
+
+
+
+
+
+
+
+
+
+
+
Email Notification
+
Slack Notification
+
+
+
+## Taxonomy
+
+| Term | Definition |
+|:---------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Resource | An entity existing in the cloud platform. A resource can be a VM instance, a service account, a kubernetes clusters, etc. |
+| Resource group | The smallest administrative grouping of resources, usually administered by the same individual or group of individuals. On GCP this corresponds to a Project, on Azure to a Resource Group. |
+| Rule | The implementation of a desired state for a given resource or set of resources. A rule applies only to a predefined set of resources, compares the state of the resource with the expected state. If these states differ, the rule generates one or more *observation*s. |
+| Observation | Instance of a difference between the state of a resource and its expected state at a given timestamp. |
+| Collection | The action of fetching all data from the cloud platforms. This data is then stored in the database, ready to run the scan. |
+| Scan | The action of running a set of *rules* against a set of *resource groups*. *Observations* resulting of that scan are added to the database. There is no guarantee that all observations of the same scan will have the same timestamp. |
+| Nagatha | The notification system associated with Modron. It aggregates notifications going to the same recipient over a given time frame and sends a notification to that user. |
+| Notification | An instance of a message sent to an owner of a *resource group* for a given *observation*. |
+| Exception | An owner of a *resource group* opting out of *notifications* for a specific *rule*. Exceptions *must* have an expiration date and cannot be set forever. This limitation can be bypassed by accessing the Nagaatha service directly. |
## Process
@@ -50,90 +146,178 @@ Modron follows the process of any security scanning engine:

Except that in most scanning engines, the inventory and remediation parts are left as an exercise for the user.
-In Modron, inventory is taken care of by identifying automatically the owners of a resource group based on the people that have the permission to act on it, as the remediation is largely facilitated by running the communication with the different resource group owners.
-
-* *Collector*: The collector fetches the data from the cloud platforms. This code must be implemented for each supported code platform separately. It takes care of the inventory and data collection parts of the process.
-* *Rule engine*: The rule engine runs the rules against all collected resources and generates observations. Notifications are sent to Nagatha for each observation.
-* *Nagatha* receives all the notifications for all observations, aggregates, deduplicates and limits the rate of notification. It also applies the exceptions provided by the user.
+In Modron, inventory is taken care of by identifying automatically the owners of a resource group based on the people
+that have the permission to act on it, as the remediation is largely facilitated by running the communication with the
+different resource group owners.
+
+* *Collector*: The collector fetches the data from the cloud platforms. This code must be implemented for each supported
+ code platform separately. It takes care of the inventory and data collection parts of the process.
+* *Rule engine*: The rule engine runs the rules against all collected resources and generates observations.
+ Notifications are sent to Nagatha for each observation.
+* *Nagatha* receives all the notifications for all observations, aggregates, deduplicates and limits the rate of
+ notification. It also applies the exceptions provided by the user.
+
+## Architecture
+
+### GCP
+
+
+
+
+
+### Modron
+
+```mermaid
+flowchart LR
+ User --> ui
+ ui[Modron UI]
+ ui --> Modron
+ subgraph Modron
+ coll[Collector]
+ acl[ACL Fetcher]
+ re[Rule Engine]
+ re -->|Check rule| re
+ end
+
+ re -->|Create Observation| psql
+ coll --->|fetch| gcp
+ acl --->|fetch| gcp
+ coll --->|store| psql
+ gcp[GCP APIs]
+ psql[(PSQL)]
+ re --->|Fetch resources| psql
+ re --->|Create Notification| Nagatha
+ slack(Slack)
+ email(Email)
+ Nagatha --> slack
+ Nagatha --> email
+```
## Getting started
In order to install Modron & Nagatha, you'll need to:
-1. Build the modron images:
- * in [src/](src): `gcloud builds builds submit . --tag gcr.io/your-project/modron:prod --timeout 900`
- * in [src/ui/](src/ui): `gcloud builds builds submit . --tag gcr.io/your-project/modron-ui:prod --timeout 900`
- * in [nagatha/](nagatha): `gcloud builds submit . --tag gcr.io/your-project/nagatha:dev --timeout=900`
+1. Build the modron images following [Building the images](#building-the-images) below.
1. Create a copy of [main.tf.example](terraform/dev/main.tf.example) and edit it with your own configuration
1. Run `tf plan --out tf.plan` in the [dev folder](terraform/dev/)
* This could need multiple occurrences as setting up resources on GCP takes time.
-1. Create a copy of [tf.tfvars.json.example](nagatha/terraform/tf.tfvars.json.example) and edit it with your own configuration
+1. Create a copy of [tf.tfvars.json.example](nagatha/terraform/tf.tfvars.json.example) and edit it with your own
+ configuration
1. Run `tf plan --out tf.plan` in the [nagatha folder](nagatha/terraform/)
1. Assign the permissions to the Modron runner as mentioned in [permissions](#permissions)
-## Logo
+### Building the images
-Generated with Dall-E with "logo art of a victorian cubical robot in a tuxedo with a top hat and holding binoculars"
+This assumes you have an Artifact registry in your GCP project following this format:
+```
+us-central1-docker.pkg.dev/$PROJECT_ID/modron
+```
-## Start developing on Modron
+#### Modron Backend
-### Infrastructure
+The Modron backend image can be built using [Cloud Build](https://cloud.google.com/build/docs):
+```bash
+gcloud \
+ --project "$PROJECT_ID" \
+ builds submit \
+ --config cloudbuild.yaml \
+ --substitutions=_TAG_REF_1=dev-$(date +%s),_TAG_REF_2=dev
+```
-Here is an overview of Modron's infrastructure:
+#### Modron UI
-
+```bash
+gcloud \
+ --project "$PROJECT_ID" \
+ builds submit \
+ --config cloudbuild-ui.yaml \
+ --substitutions=_TAG_REF_1=dev-$(date +%s),_TAG_REF_2=dev
+```
-Both Modron Cloud Run run in the same [modron project](https://console.cloud.google.com/home/dashboard?project=modron).
-Nagatha runs in a [separate project](https://console.cloud.google.com/home/dashboard?project=nagatha).
-There is a dev container. This project is meant to be opened with [VSCode](https://code.visualstudio.com/).
+## Development of Modron
+
+### Requirements
To run this project you'll need:
* Docker
* Go
* The Google SDK
-* A protobuf compiler
* npm
+* terraform
-The dev container provides these tools. Upon starting, vscode will ask if you want to reopen the project in the dev container, accept.
+### Getting started
-If you have problems with your git configuration inside the container, set `remote.containers.copyGitConfig` to true.
-https://github.com/microsoft/vscode-remote-release/issues/6124
+#### Generate the protobuf files
-## Permissions
+We'll use a Docker image that contains [`buf`](https://buf.build/) and some protoc plugins to generate the protobuf
+files.
+We'll call this image `bufbuild` - it needs to be built only once:
-The Modron service is meant to work at the organization level on GCP. In order to access the data it needs to run the analysis, the Modron runner service account will need the following permissions at the organization level:
+```bash
+docker build -t bufbuild -f docker/Dockerfile.buf .
+```
+
+Now, this image can be used to generate the protobuf files:
+```bash
+docker run -v "$PWD:/app" -w "/app" bufbuild generate
```
- "apikeys.keys.list",
- "cloudasset.assets.searchAllIamPolicies",
- "compute.backendServices.list",
- "compute.instances.list",
- "compute.regions.list",
- "compute.sslCertificates.list",
- "compute.sslPolicies.list",
- "compute.subnetworks.list",
- "compute.targetHttpsProxies.list",
- "compute.targetHttpsProxies.list",
- "compute.targetSslProxies.list",
- "compute.urlMaps.list",
- "compute.zones.list",
- "container.clusters.list",
- "iam.serviceAccounts.list",
- "iam.serviceAccountKeys.list",
- "monitoring.metricDescriptors.get",
- "monitoring.metricDescriptors.list",
- "monitoring.timeSeries.list",
- "resourcemanager.projects.getIamPolicy",
- "serviceusage.services.get",
- "storage.buckets.list",
- "storage.buckets.getIamPolicy",
+
+
+ We don't use the buf plugins because we might encounter some
+ rate limits
+
+
+### Formatting
+
+You can format your code using:
+- `gofmt -w ./`
+- `terraform fmt -recursive`
+- `eslint`
+
+#### IDE
+
+You can configure your IDE to format Terraform code by following these guides:
+- [Use Terraform formatter on IDEA-based IDEs](https://www.jetbrains.com/help/idea/terraform.html#use-terraform-formatter)
+- [Terraform extension for VSCode - Formatting](https://marketplace.visualstudio.com/items?itemName=hashicorp.terraform#formatting)
+
+## Permissions
+
+The Modron service is meant to work at the organization level on GCP. In order to access the data it needs to run the
+analysis, the Modron runner service account will need the following permissions at the organization level:
+
+```plain
+apikeys.keys.list
+cloudasset.assets.searchAllIamPolicies
+compute.backendServices.list
+compute.instances.list
+compute.regions.list
+compute.sslCertificates.list
+compute.sslPolicies.list
+compute.subnetworks.list
+compute.targetHttpsProxies.list
+compute.targetHttpsProxies.list
+compute.targetSslProxies.list
+compute.urlMaps.list
+compute.zones.list
+container.clusters.list
+iam.serviceAccounts.list
+iam.serviceAccountKeys.list
+iam.serviceAccounts.getIamPolicy
+monitoring.metricDescriptors.get
+monitoring.metricDescriptors.list
+monitoring.timeSeries.list
+resourcemanager.projects.getIamPolicy
+serviceusage.services.get
+storage.buckets.list
+storage.buckets.getIamPolicy
```
It is recommended to create a custom role with these permissions. For that you can use this terraform stanza:
-```
+```hcl
resource "google_organization_iam_custom_role" "modron_lister" {
org_id = var.org_id
role_id = "ModronSecurityLister"
@@ -169,6 +353,8 @@ resource "google_organization_iam_custom_role" "modron_lister" {
## Debug
+### GoSec
+
Run gosec as run by gitlab:
```
@@ -192,36 +378,82 @@ To run the integration test, you'll need a self signed certificate for the notif
```
openssl req -x509 -newkey rsa:4096 -keyout key.pem -nodes -out cert.pem -sha256 -days 365 -subj '/CN=modron_test' -addext "subjectAltName = DNS:modron_test"
-docker-compose up --build --exit-code-from "modron_test" --abort-on-container-exit
+docker compose up --build --exit-code-from "modron_test" --abort-on-container-exit
```
### UI Integration test
```
-docker-compose -f docker-compose.ui.yaml up --build --exit-code-from "modron_test" --abort-on-container-exit
+docker compose -f docker-compose.ui.yaml up --build --exit-code-from "modron_test" --abort-on-container-exit
```
### Running locally
+#### Log in to GCP
+
+In order to use the Google Cloud APIs, you need to log in to GCP as if you were using a service account:
+
+```bash
+gcloud auth application-default login
+```
+
+Check out the [`gcloud` docs](https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login) for more
+information.
+If you don't log in using the above command, the collector might fail with an error similar to:
+
+```plain
+"invalid_grant" "reauth related error (invalid_rapt)" "https://support.google.com/a/answer/9368756"
+```
+
+#### Start the docker-compose stack
+
Use this docker command to spin up a local deployment via docker-compose (will rebuild on every run):
+
```
-docker-compose -f docker-compose.ui.yaml up --build
+docker compose -f docker-compose.ui.yaml up --build
```
-In case you want to clean up all the created images, services and volumes (e.g. if you suspect a caching issue or if a service does not properly shut down):
+
+In case you want to clean up all the created images, services and volumes (e.g. if you suspect a caching issue or if a
+service does not properly shut down):
+
```
-docker-compose rm -fsv # remove all images, services and volumes if needed
+docker compose rm -fsv # remove all images, services and volumes if needed
```
+#### Use Docker by itself
-Alternative: Use the docker command to run modron locally (against a dev project):
+As an alternative you can use the following docker command to run Modron locally (against a dev project):
```
chmod 644 ~/.config/gcloud/application_default_credentials.json
docker build -f Dockerfile.db -t modron-db:latest .
-docker run -e POSTGRES_PASSWORD="docker-test-password" -e POSTGRES_USER="modron" -e POSTGRES_DB="modron" -e PG_DATA="tmp_data/" -t modron-db:latest -p 5432
-GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/application_default_credentials.json PORT="8080" GCP_PROJECT_ID=modron-dev OPERATION_TABLE_ID="operations" OBSERVATION_TABLE_ID="observations" RESOURCE_TABLE_ID="resources" RUN_AUTOMATED_SCANS="false" ORG_SUFFIX="@example.com" STORAGE="SQL" DB_MAX_CONNECTIONS="1" SQL_BACKEND_DRIVER="postgres" SQL_CONNECT_STRING="host=localhost port=5432 user=modron password=docker-test-password database=modron sslmode=disable" go run . --logtostderr
+docker run -e POSTGRES_PASSWORD="docker-test-password" -e POSTGRES_USER="modron" -e POSTGRES_DB="modron" -e PG_DATA="tmp_data/" -t postgres:latest -p 5432
+GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/application_default_credentials.json PORT="8080" RUN_AUTOMATED_SCANS="false" ORG_SUFFIX="@nianticlabs.com" STORAGE="SQL" DB_MAX_CONNECTIONS="1" SQL_BACKEND_DRIVER="postgres" SQL_CONNECT_STRING="host=localhost port=5432 user=modron password=docker-test-password database=modron sslmode=disable" go run .
```
+## Telemetry
+
+Modron supports [OpenTelemetry](https://opentelemetry.io/docs/) and expects a GRPC OTEL collector to be running
+alongside the deployment. We currently export traces and metrics through this collector.
+
+The collector (`otel-collector`) can be configured to forward the telemetry data to other exporters - by default
+these are Google Cloud Monitoring for the production environment and Prometheus / Jaeger for the local deployment.
+
+### Checking the telemetry data locally
+
+When running Modron locally, we suggest to start the auxiliary services by running:
+
+```bash
+docker-compose -f docker-compose.dev.yaml up -d
+```
+
+This will start everything you need to get started to develop locally for Modron:
+
+- [Jaeger](http://127.0.0.1:16686/)
+- [Prometheus](http://127.0.0.1:9090/)
+- `otel-collector` running on `127.0.0.1:4317` (GRPC)
+- PostgreSQL running on `127.0.0.1:5432`
+
## Future developments
* Provide an historical view of the reported issues.
@@ -231,4 +463,4 @@ GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/application_default_credentials.
## Security
-Report any security issue to [security@example.com](mailto:security@example.com).
+Report any security issue to [security@nianticlabs.com](mailto:security@nianticlabs.com).
diff --git a/buf.gen.yaml b/buf.gen.yaml
new file mode 100644
index 0000000..7f7b406
--- /dev/null
+++ b/buf.gen.yaml
@@ -0,0 +1,30 @@
+version: v2
+managed:
+ enabled: true
+
+plugins:
+ - local: protoc-gen-go
+ out: src/proto/generated
+ - local: protoc-gen-go-grpc
+ out: src/proto/generated
+ - local: protoc-gen-js
+ out: src/ui/client/src/proto/
+ opt: import_style=commonjs,binary
+ - local: protoc-gen-grpc-web
+ out: src/ui/client/src/proto/
+ opt:
+ - import_style=typescript
+ - mode=grpcweb
+
+inputs:
+ - directory: ./src/proto
+ - directory: ./src/nagatha/proto
+ - module: buf.build/googleapis/googleapis:8bc2c51e08c447cd8886cdea48a73e14
+ paths:
+ - google/api
+ - google/rpc
+ - google/longrunning
+ - module: buf.build/k8s/api:8f68e41b943c4de8a5e9c9a921c889a7
+ paths:
+ - k8s.io/api/core
+ - k8s.io/apimachinery/
\ No newline at end of file
diff --git a/buf.lock b/buf.lock
new file mode 100644
index 0000000..5373d39
--- /dev/null
+++ b/buf.lock
@@ -0,0 +1,9 @@
+# Generated by buf. DO NOT EDIT.
+version: v2
+deps:
+ - name: buf.build/googleapis/googleapis
+ commit: 8bc2c51e08c447cd8886cdea48a73e14
+ digest: b5:b7e0ac9d192bd0eae88160101269550281448c51f25121cd0d51957661a350aab07001bc145fe9029a8da10b99ff000ae5b284ecaca9c75f2a99604a04d9b4ab
+ - name: buf.build/k8s/api
+ commit: 8f68e41b943c4de8a5e9c9a921c889a7
+ digest: b5:0c188e351df7b094d6a5412f4cd5f097fbf1a32d4a2d4c42b83774e168961447e6e706e4ebf241a13b94493aa6cbe08dc8abd03e2a1f8207ac7620bf186030c8
diff --git a/buf.yaml b/buf.yaml
new file mode 100644
index 0000000..f0a19ec
--- /dev/null
+++ b/buf.yaml
@@ -0,0 +1,10 @@
+version: v2
+modules:
+ - path: src/proto
+ - path: src/nagatha/proto
+deps:
+ - buf.build/googleapis/googleapis
+ - buf.build/k8s/api
+lint:
+ use:
+ - DEFAULT
\ No newline at end of file
diff --git a/cloudbuild-ui.yaml b/cloudbuild-ui.yaml
new file mode 100644
index 0000000..3683f49
--- /dev/null
+++ b/cloudbuild-ui.yaml
@@ -0,0 +1,12 @@
+steps:
+ - name: 'gcr.io/cloud-builders/docker'
+ args:
+ - build
+ - --tag=us-central1-docker.pkg.dev/$PROJECT_ID/modron/modron-ui:$_TAG_REF_1
+ - --tag=us-central1-docker.pkg.dev/$PROJECT_ID/modron/modron-ui:$_TAG_REF_2
+ - -f
+ - ./src/ui/Dockerfile
+ - .
+images:
+ - "us-central1-docker.pkg.dev/$PROJECT_ID/modron/modron-ui:$_TAG_REF_1"
+ - "us-central1-docker.pkg.dev/$PROJECT_ID/modron/modron-ui:$_TAG_REF_2"
diff --git a/cloudbuild.yaml b/cloudbuild.yaml
new file mode 100644
index 0000000..e30b208
--- /dev/null
+++ b/cloudbuild.yaml
@@ -0,0 +1,12 @@
+steps:
+ - name: 'gcr.io/cloud-builders/docker'
+ args:
+ - build
+ - --tag=us-central1-docker.pkg.dev/$PROJECT_ID/modron/modron:$_TAG_REF_1
+ - --tag=us-central1-docker.pkg.dev/$PROJECT_ID/modron/modron:$_TAG_REF_2
+ - -f
+ - ./src/Dockerfile
+ - .
+images:
+ - "us-central1-docker.pkg.dev/$PROJECT_ID/modron/modron:$_TAG_REF_1"
+ - "us-central1-docker.pkg.dev/$PROJECT_ID/modron/modron:$_TAG_REF_2"
\ No newline at end of file
diff --git a/docker-compose.dev.yaml b/docker-compose.dev.yaml
new file mode 100644
index 0000000..8a2aa9e
--- /dev/null
+++ b/docker-compose.dev.yaml
@@ -0,0 +1,53 @@
+version: '3'
+
+services:
+ postgres_db:
+ container_name: postgres_db
+ image: postgres:16
+ restart: always
+ environment:
+ POSTGRES_USER: "modron"
+ POSTGRES_PASSWORD: "modron"
+ POSTGRES_DB: "modron"
+ PGDATA: "/tmp/"
+ ports:
+ - "5432:5432"
+ healthcheck:
+ test: ["CMD-SHELL", "pg_isready -U modron"]
+ interval: 1s
+ timeout: 2s
+ retries: 5
+ tmpfs:
+ - /tmp
+
+ jaeger:
+ image: jaegertracing/all-in-one:1.59
+ ports:
+ - "16686:16686"
+ environment:
+ COLLECTOR_OTLP_ENABLED: true
+ COLLECTOR_OTLP_GRPC_HOST_PORT: 0.0.0.0:4317
+ networks:
+ - otel
+
+ prometheus:
+ image: prom/prometheus:latest
+ command:
+ - --web.enable-remote-write-receiver
+ ports:
+ - "9090:9090"
+ networks:
+ - otel
+
+ otel-collector:
+ image: otel/opentelemetry-collector:0.108.0
+ command:
+ - --config=/etc/otel/config.yaml
+ ports:
+ - "4317:4317"
+ volumes:
+ - ./otel/config:/etc/otel
+ networks:
+ - otel
+networks:
+ otel: {}
diff --git a/docker-compose.ui.yaml b/docker-compose.ui.yaml
index 2149ee9..aa5b14f 100644
--- a/docker-compose.ui.yaml
+++ b/docker-compose.ui.yaml
@@ -1,5 +1,3 @@
-version: '3'
-
services:
modron_proxy:
container_name: modron_proxy
@@ -16,30 +14,35 @@ services:
modron_fake:
container_name: modron_fake
- build: src/
+ build:
+ context: .
+ dockerfile: src/Dockerfile
environment:
RUN_AUTOMATED_SCANS: "false"
COLLECTOR: "FAKE"
DB_MAX_CONNECTIONS: "1"
GRPC_TRACE: "all"
GRPC_VERBOSITY: "DEBUG"
- OBSERVATION_TABLE_ID: "observations"
- OPERATION_TABLE_ID: "operations"
- ORG_ID: "0123456789"
+ LISTEN_ADDR: "0.0.0.0"
+ ORG_ID: "111111111111"
ORG_SUFFIX: "@example.com"
PORT: 8080
- RESOURCE_TABLE_ID: "resources"
SQL_BACKEND_DRIVER: "postgres"
SQL_CONNECT_STRING: "host=postgres_db port=5432 user=modron password=docker-test-password database=modron sslmode=disable"
STORAGE: "SQL"
+ TAG_CUSTOMER_DATA: 111111111111/customer_data
+ TAG_EMPLOYEE_DATA: 111111111111/employee_data
+ TAG_ENVIRONMENT: 111111111111/environment
networks:
- modron
depends_on:
- - postgres_db
+ postgres_db:
+ condition: service_healthy
modron_ui:
container_name: modron_ui
- build: ./src/ui
+ build:
+ dockerfile: src/ui/Dockerfile
environment:
ENVIRONMENT: "E2E_TESTING"
DIST_PATH: "./ui"
@@ -50,8 +53,8 @@ services:
modron_test:
container_name: modron_test
build:
- context: ./src/ui/client
- dockerfile: Dockerfile.e2e
+ context: .
+ dockerfile: ./src/ui/client/Dockerfile.e2e
depends_on:
- modron_proxy
environment:
@@ -68,15 +71,18 @@ services:
postgres_db:
container_name: postgres_db
- build:
- context: src/
- dockerfile: Dockerfile.db
+ image: postgres:14-bookworm
restart: always
environment:
POSTGRES_USER: "modron"
POSTGRES_PASSWORD: "docker-test-password"
POSTGRES_DB: "modron"
PGDATA: "/tmp/"
+ healthcheck:
+ test: ["CMD-SHELL", "pg_isready -U modron"]
+ interval: 1s
+ timeout: 2s
+ retries: 5
tmpfs:
- /tmp
networks:
diff --git a/docker-compose.yaml b/docker-compose.yaml
index 594ddb0..c4f7258 100644
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@@ -1,17 +1,20 @@
-version: '3'
-
services:
postgres_db:
container_name: postgres_db
- build:
- context: src/
- dockerfile: Dockerfile.db
+ image: postgres:14-bookworm
restart: always
+ ports:
+ - "5432:5432"
environment:
POSTGRES_USER: "modron"
POSTGRES_PASSWORD: "docker-test-password"
POSTGRES_DB: "modron"
PGDATA: "/tmp/"
+ healthcheck:
+ test: ["CMD-SHELL", "pg_isready -U modron"]
+ interval: 1s
+ timeout: 2s
+ retries: 5
tmpfs:
- /tmp
networks:
@@ -19,36 +22,39 @@ services:
modron_fake:
container_name: modron_fake
- build: src/
+ build:
+ context: .
+ dockerfile: src/Dockerfile
environment:
COLLECTOR: "FAKE"
DB_BATCH_SIZE: "1"
DB_MAX_CONNECTIONS: "1"
- ENVIRONMENT: "E2E_GRPC_TESTING"
- GLOG_v: "10"
+ IS_E2E_GRPC_TEST: "true"
+ LISTEN_ADDR: "0.0.0.0"
NOTIFICATION_SERVICE: "modron_test:8082"
- OBSERVATION_TABLE_ID: "observations"
- OPERATION_TABLE_ID: "operations"
- ORG_ID: "0123456789"
+ ORG_ID: "111111111111"
ORG_SUFFIX: "@example.com"
PORT: 8081
- RESOURCE_TABLE_ID: "resources"
RUN_AUTOMATED_SCANS: "false"
SQL_BACKEND_DRIVER: "postgres"
SQL_CONNECT_STRING: "host=postgres_db port=5432 user=modron password=docker-test-password database=modron sslmode=disable"
STORAGE: "SQL"
+ TAG_CUSTOMER_DATA: 111111111111/customer_data
+ TAG_EMPLOYEE_DATA: 111111111111/employee_data
+ TAG_ENVIRONMENT: 111111111111/environment
ports:
- "8081:8081"
networks:
- modron
depends_on:
- - postgres_db
+ postgres_db:
+ condition: service_healthy
modron_test:
container_name: e2e_test
build:
- context: src/
- dockerfile: Dockerfile.e2e
+ context: .
+ dockerfile: src/Dockerfile.e2e
environment:
BACKEND_ADDRESS: "modron:8080"
FAKE_BACKEND_ADDRESS: "modron_fake:8081"
diff --git a/docker/Dockerfile.buf b/docker/Dockerfile.buf
new file mode 100644
index 0000000..626f41f
--- /dev/null
+++ b/docker/Dockerfile.buf
@@ -0,0 +1,51 @@
+FROM ubuntu:24.04
+ARG BUF_VERSION="1.46.0"
+ARG BUF_MINISIG_PUBKEY="RWQ/i9xseZwBVE7pEniCNjlNOeeyp4BQgdZDLQcAohxEAH5Uj5DEKjv6"
+ARG PROTOBUF_JS_VERSION="3.21.4"
+ARG GRPC_WEB_VERSION="1.5.0"
+ARG GRPC_GATEWAY_VERSION="2.23.0"
+
+RUN apt-get update && \
+ apt-get install -y \
+ protoc-gen-go \
+ protoc-gen-go-grpc \
+ curl \
+ wget \
+ minisign \
+ perl
+WORKDIR /build
+
+# buf
+RUN wget -q "https://github.com/bufbuild/buf/releases/download/v${BUF_VERSION}/buf-$(uname -s)-$(uname -m)" && \
+ wget -q "https://github.com/bufbuild/buf/releases/download/v${BUF_VERSION}/sha256.txt" && \
+ wget -q "https://github.com/bufbuild/buf/releases/download/v${BUF_VERSION}/sha256.txt.minisig" && \
+ minisign -Vm sha256.txt -P "$BUF_MINISIG_PUBKEY" && \
+ shasum -a 256 -c sha256.txt --ignore-missing && \
+ mv "buf-$(uname -s)-$(uname -m)" /usr/local/bin/buf && \
+ chmod +x /usr/local/bin/buf && \
+ rm *
+
+RUN bash -c "ARCH=$(dpkg --print-architecture); if [ \"\$ARCH\" = \"arm64\" ]; then ARCH=\"aarch_64\"; fi; echo -n \$ARCH > /tmp/arch"
+
+# protobuf-javascript
+RUN wget -q -O /tmp/protobuf-javascript.tar.gz "https://github.com/protocolbuffers/protobuf-javascript/releases/download/v${PROTOBUF_JS_VERSION}/protobuf-javascript-${PROTOBUF_JS_VERSION}-$(uname -s | tr "[:upper:]" "[:lower:]")-$(cat /tmp/arch).tar.gz" && \
+ mkdir /tmp/protobuf-javascript && \
+ tar -xzvf /tmp/protobuf-javascript.tar.gz -C /tmp/protobuf-javascript && \
+ mv /tmp/protobuf-javascript/bin/protoc-gen-js /usr/local/bin/protoc-gen-js && \
+ rm -rf /tmp/protobuf-javascript
+
+# protoc-gen-grpc-web
+RUN wget -q -O /usr/local/bin/protoc-gen-grpc-web "https://github.com/grpc/grpc-web/releases/download/${GRPC_WEB_VERSION}/protoc-gen-grpc-web-${GRPC_WEB_VERSION}-$(uname -s | tr "[:upper:]" "[:lower:]")-$(uname -m)" && \
+ chmod +x /usr/local/bin/protoc-gen-grpc-web
+
+# protoc-gen-grpc-gateway
+RUN wget -q -O /usr/local/bin/protoc-gen-grpc-gateway \
+ "https://github.com/grpc-ecosystem/grpc-gateway/releases/download/v${GRPC_GATEWAY_VERSION}/protoc-gen-grpc-gateway-v${GRPC_GATEWAY_VERSION}-$(uname -s | tr "[:upper:]" "[:lower:]")-$(dpkg --print-architecture)" && \
+ chmod a+x /usr/local/bin/protoc-gen-grpc-gateway
+
+# protoc-gen-openapiv2
+RUN wget -q -O /usr/local/bin/protoc-gen-openapiv2 \
+ "https://github.com/grpc-ecosystem/grpc-gateway/releases/download/v${GRPC_GATEWAY_VERSION}/protoc-gen-openapiv2-v${GRPC_GATEWAY_VERSION}-$(uname -s | tr "[:upper:]" "[:lower:]")-$(dpkg --print-architecture)" && \
+ chmod a+x /usr/local/bin/protoc-gen-openapiv2
+
+ENTRYPOINT [ "buf" ]
diff --git a/docs/FINDINGS.md b/docs/FINDINGS.md
new file mode 100644
index 0000000..178b55d
--- /dev/null
+++ b/docs/FINDINGS.md
@@ -0,0 +1,165 @@
+# Findings
+
+## API_KEY_WITH_OVERBROAD_SCOPE
+The API key is granting access to too many different scopes, or is not limited at all in what actions it allows.
+A malicious actor in possession of this key would be able to make a lot of damages to your infrastructure.
+
+### Recommendation
+Limit the scope of the API key to the smallest set of actions required by the user of this key to run properly.
+A list of scope is available in the [Google documentation](https://developers.google.com/identity/protocols/oauth2/scopes).
+
+## BUCKET_IS_PUBLIC
+A public bucket means that the content of this bucket is accessible to anybody on the internet.
+Make sure that the content of this bucket is actually intended to be public.
+
+> [!WARNING]
+> Do not assume that files with a cryptic name will never be found. These files will eventually be found. If files should stay private, then they should be hosted in a private bucket.
+
+### Recommendation
+Make this bucket private
+
+OR
+
+Make sure that the content of this bucket is intended to be public
+
+## CLUSTER_NODES_HAVE_PUBLIC_IPS
+On GCP, this means that you have a public cluster. There is no reason to have a public cluster today. Services that should be publicly accessible should be exposed using a load balancer.
+
+For Airflow and Dataflow clusters, there is an option to set when starting the flows to use private cluster.
+
+### Recommendation
+
+> [!NOTE]
+> There is no way to transform a public cluster into a private one.
+
+1. Create a new private cluster matching the specifications of the existing one
+2. Migrate your workloads to the new cluster
+3. Delete the old public cluster.
+
+## CROSS_PROJECT_PERMISSIONS
+The resource is controlled by an account defined in another project.
+This circumvents the isolation provided by a project.
+
+### Recommendation
+Use only accounts defined in the project to grant write and admin access to a resource.
+
+## DATABASE_ALLOWS_UNENCRYPTED_CONNECTIONS
+All connections to a database should use an encrypted connection. No clear text communication between a workload and a database should be allowed.
+
+### Recommendation
+Configure your database to allow only encrypted connections
+
+## DATABASE_AUTHORIZED_NETWORKS_NOT_SET
+Anyone can connect to this database without limitations.
+
+### Recommendation
+Add a list of IP or IP networks from which you expect connections and allow only connections from these networks.
+This is also valid if your database is only available to internal IPs.
+
+## EXPORTED_KEY_EXPIRY_TOO_LONG
+An exported key has been around for too long.
+
+Exported keys are immutable credentials that grant whoever has access to them a time unbounded access to our infrastructure. As people come and go, it is recommended to regularly rotate credentials to reduce the risk associated with leaks and malicious activity.
+
+### Recommendation
+
+- Rotate these credentials by deleting the existing one and creating a new one
+- Create a process, possibly automated, to rotate credentials in the future and run this process regularly (every 3-6 months)
+
+## EXPORTED_KEY_WITH_ADMIN_PRIVILEGES
+An exported key in that resource group grants administrative privileges. Read EXPORTED_KEY_EXPIRY_TOO_LONG and add to the risk the fact that these credentials grant access to deployments, databases and possibly user-data.
+
+### Recommendation
+
+1. Remove the admin privileges of that service account or create a new service account with limited privileges
+2. Rotate the key after this has been done
+
+## HUMAN_WITH_OVERPRIVILEGED_BASIC_ROLE
+A human user or a group has one of the following permissions at the project level:
+
+- Owner
+- Editor
+- Viewer
+- Security Admin
+
+### Recommendation
+
+Use less privileged roles for humans. The principle of least privilege should be applied to all users.
+
+## LOAD_BALANCER_MIN_TLS_VERSION_TOO_OLD
+The load balancer supports a deprecated TLS version.
+The TLS version was deprecated because it supports broken cryptographic primitives.
+
+### Recommendation
+Define an SSL Policy at the project level or for the load balancer specifically that with a minimum TLS version
+of 1.2 and a MODERN or RESTRICTED profile.
+See [defining an SSL policy](https://cloud.google.com/load-balancing/docs/ssl-policies-concepts#defining_an_ssl_policy) for more information.
+
+## LOAD_BALANCER_USER_MANAGED_CERTIFICATE
+A load balancer has been found with a user generated certificate.
+
+User generated certificates have multiple risks:
+- The cryptographic material associated with that load balancer must be manually managed, leaving the door open to:
+ - Certificate expiry if the certificates are not renewed in time
+ - Credentials leakage if anybody that has access to the cryptographic material is compromised or bad intended
+
+- The generation of the private key has usually been done on a private machine where
+ - the Pseudo Random Number Generator has not been verified
+ - the entropy might have been too low
+
+This weakens the encryption of the communication between the clients and the load balancer
+
+### Recommendation
+Migrate to a Google Managed certificate.
+
+## KUBERNETES_VULNERABILITY_SCANNING_DISABLED
+
+The GKE cluster is not using the [workload vulnerability scanning feature](https://cloud.google.com/kubernetes-engine/docs/concepts/about-workload-vulnerability-scanning).
+This means that container image vulnerabilities aren’t surfaced, or are only partially surfaced.
+
+Follow the steps in [Automatically scan workloads for known vulnerabilities](https://cloud.google.com/kubernetes-engine/docs/how-to/security-posture-vulnerability-scanning)
+to enable it. Modron also suggests a command to run based on the name of your cluster and the project it is in.
+
+## MASTER_AUTHORIZED_NETWORKS_NOT_SET
+The administration interface of your Kubernetes cluster is available anyone who has access to a Google IP, or anybody on the internet as anybody can create a VM on GCP.
+
+### Recommendation
+
+Restrict the access to the Kubernetes API to a list of IP or IP networks from which you expect connections.
+
+## OUTDATED_KUBERNETES_VERSION
+
+The version of Kubernetes running in that cluster is not supported anymore.
+Running outdated software is the first source of compromise.
+Running up-to-date software is the first barrier of defence against known vulnerabilities.
+
+### Recommendation
+
+- Update your Kubernetes cluster to a supported version
+- Onboard into release channel to benefit from automated updates in the future.
+
+## PRIVATE_GOOGLE_ACCESS_DISABLED
+The mentioned network contains some subnets that can have a preferred routing to the Google APIs without going through the Internet. It is recommended to use this routing pattern for security and latency reasons.
+
+### Recommendation
+Enable Private Google Access on all your subnetworks.
+
+## SERVICE_ACCOUNT_TOO_HIGH_PRIVILEGES
+This service account has too high privileges. In general we try to avoid granting too high privileges to service accounts.
+Often time, permissions are granted at the project level where they should be granted at a more granular level.
+For instance, Service Account Token Creator at the project level allows for privilege escalation as it allows the
+service account that has this permission to get a token for any other service account in that project.
+
+Granting this permissions at the service account level only allows that service account to get a token
+for another specific service account.
+
+### Recommendation
+Limit the permissions of that service account to the strict minimum set of permissions required for this service
+account to run the tasks it is running.
+
+## UNUSED_EXPORTED_CREDENTIALS
+An exported credential is still valid but has not been used in a while.
+
+### Recommendation
+It is recommended to delete unused credentials and regenerate a new set when they are needed to prevent leaking
+of credentials and unauthorised access to our infrastructure.
diff --git a/docs/RISK_SCORE.md b/docs/RISK_SCORE.md
new file mode 100644
index 0000000..b9fda96
--- /dev/null
+++ b/docs/RISK_SCORE.md
@@ -0,0 +1,120 @@
+# Risk Score
+
+## Definition
+
+The Risk Score is an indicator that is used to calculate the priority to assign to observations.
+We compute this score as an indicative measure of the risk given the conditions in which the observation was created.
+
+The Risk Score ranges from INFO to CRITICAL.
+
+## Severity
+
+### CVSS v3.x
+
+CVSS v3.x defines a qualitative measure of severity of a vulnerability.
+[As they define in their own documentation](https://nvd.nist.gov/vuln-metrics/cvss), **CVSS is not a measure of risk**.
+
+CVSS v3.x defines the severity scoring as follows:
+
+| Severity | Score Range |
+|----------|:-----------:|
+| NONE | 0.0 |
+| LOW | 0.1 - 3.9 |
+| MEDIUM | 4.0 - 6.9 |
+| HIGH | 7.0 - 8.9 |
+| CRITICAL | 9.0 - 10.0 |
+
+### Modron
+
+In Modron, the Severity matches the CVSS v3.x severity levels - with the exception of the "NONE" severity that
+is called "INFO". Rules can define the severity of the observations they create:
+```go
+ob := &pb.Observation{
+ Name: "BUCKET_IS_PUBLIC",
+ ExpectedValue: "private",
+ ObservedValue: "public",
+ Remediation: &pb.Remediation{
+ Description: "Bucket foo is publicly accessible",
+ Recommendation: "Unless strictly needed, restrict the IAM policy of the bucket",
+ },
+ Severity: pb.Severity_SEVERITY_MEDIUM, // <-- We define the severity here
+}
+```
+
+| Severity |
+|----------|
+| INFO |
+| LOW |
+| MEDIUM |
+| HIGH |
+| CRITICAL |
+
+#### UI representation
+
+In the UI we represents the severities (together with the risk score and the impact) using the following icons:
+
+
+
+
+
+The screenshot includes a "?" severity - this is a very special case that is used when the severity is not known and
+indicates a bug in the code if it is displayed in the UI.
+
+
+#### Re-using the CVSS v3.x score
+
+When available, we use the CVSS v3.x score to map it to the Modron severity (external sources) - internally in Modron we
+use the scale directly (e.g: a rule defines its observations to be LOW severity)
+
+## Impact
+
+Since the CVSS score (and the Modron Severity) do not measure risk, we use "facts" to calculate the _Impact_ of
+an observation. The impact, when combined with the severity, is used to determine the Risk Score of an observation.
+
+### Definition
+
+An _Impact_ can only assume three values: LOW, MEDIUM or HIGH.
+
+To define the impact, we collect context about the environment in which the observation was generated that we call "facts".
+We always take the highest impact defined by the list of facts to determine the final impact.
+
+### Facts
+
+An example of a fact is "workload is from the production environment". This is a fact that can make a misconfiguration more impactful.
+By leveraging these informations we can increase or decrease the risk score.
+
+#### Real-world example
+
+| Kind | Example | Decision |
+|-------------|--------------------------------------------------------------|----------------|
+| Observation | A misconfigured SQL database is accessible from the internet | Severity: HIGH |
+| Fact | The database contains sensitive information | Impact: HIGH |
+| Fact | The database is not used | Impact: LOW |
+| Fact | The database is not used in production | Impact: LOW |
+
+The rule "SQL database is accessible from the internet" defines the severity of the observations as HIGH.
+One of the facts sets the impact to be HIGH, while the other two facts set the impact to be LOW: we always take
+the worst case scenario when defining the impact, so the final impact is HIGH.
+
+We now have the Severity (HIGH) and the Impact (HIGH). We can now calculate the Risk Score.
+
+## Risk Score
+
+The Risk Score calculation is straight-forward:
+
+- If the impact is **MEDIUM**, the risk score is equal to the severity,
+- If the impact is **HIGH**, the risk score is one category higher than the severity (e.g: **MEDIUM** -> **HIGH**)
+- If the impact is **LOW**, the risk score is one category lower than the severity (e.g: **MEDIUM** -> **LOW**)
+
+We can therefore define a "Risk Score matrix" as follows:
+
+