Skip to content

Commit

Permalink
More documentation (#280)
Browse files Browse the repository at this point in the history
* docs: more documentation

* infra: different resources depending on environment + docs

* docs: more documentation about infrascrutcture and TODOs
  • Loading branch information
guifry authored Feb 7, 2025
1 parent b582ea0 commit 7db04d5
Show file tree
Hide file tree
Showing 10 changed files with 168 additions and 158 deletions.
155 changes: 11 additions & 144 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,12 @@ remembering a user's preferences without repeatedly asking for consent.
clicking a link (eg via a bookmark, or typing in the URL to the address bar
in your browser), your consent preferences will be remembered.

9. **Audit Logging**: Following the CQRS (Command Query Responsibility Segregation) pattern,
whenever consent data is written to the PostgreSQL database, an event is also pushed
to a BigQuery dataset. This provides a complete audit trail of all consent changes,
enabling future analysis and compliance verification if needed.


## System Architecture

![System Architecture Diagram](docs/diagram.png)
Expand All @@ -58,165 +64,26 @@ docker compose up

## Installation

To make use of the Single Consent service on your website, please see the
[Single Consent client Quick Start documentation](client/README.md)

#### Environment variables

- `DATABASE_URL` (default is `postgresql+asyncpg://localhost:5432/consent_api`)
- `ENV` (default is `development`)
- `PORT` (default is `8000`)
- `SECRET_KEY` (default is randomly generated)
- You can configure the number of web server worker processes with the
`WEB_CONCURRENCY` environment variable (default is 1)

## Development

You can run all the services without setup needed:

```shell
make docker-build
docker compose up
```

### Loading the environment with direnv

When running docker commands, you will need a few extra environment variables.

It's easiest to use [Direnv](https://direnv.net/) to load the environment.

Copy `.envrc.template` to `.envrc` and load it with direnv:

```shell
direnv allow
```

Those variables will be used by both docker-compose and the Makefile.

Additionally, we recommend [hooking direnv with your shell](https://direnv.net/docs/hook.html), for automatic environment loading.

### Run Locally

To run the API locally:

```shell
make install
make run
```

It will install poetry, our python dependencies manager, as well as the project dependencies.

### Testing
Each time a file is modified in the applications, the container application will restart.

#### Unit tests

Run unit tests with the following command:
## Integration Tests

```sh
make test
```

#### End-to-end tests

##### Running in Docker Compose

You will need to build a Docker image to run the tests against, using the
following command:

```sh
make docker-build
```

You also need to have the Chrome Docker image already on your system, which you
can do with the following command:

```sh
docker pull selenoid/chrome:110.0
```

> **Note**
> Currently, Selenoid does not provide a Chrome image that works on Apple M1 hosts. As a
> workaround, you can use a third-party Chromium image:
>
> ```sh
> docker pull sskorol/selenoid_chromium_vnc:100.0
> ```
>
> Then set the following environment variable:
>
> ```sh
> export SPLINTER_REMOTE_BROWSER_VERSION=sskorol/selenoid_chromium_vnc:100.0
> ```
The easiest way to run the end-to-end tests is in Docker Compose using the following
command:
```sh
make test-end-to-end-docker
cd apps/consent-api/tests
BASE_URL=http://localhost:8000 poetry run pytest .
```

##### Running locally

To run end-to-end tests you will need Chrome or Firefox installed. Specify which you
want to use for running tests by setting the `SELENIUM_DRIVER` environment variable
(defaults to `chrome`), eg:

```sh
export SELENIUM_DRIVER=firefox
```

You also need a running instance of the Consent API and two instances of webapps
which have the Single Consent client installed.

> Note
> For convenience, a dummy service is included in the API.
> You can run two more instances of the Consent API on different port numbers to
> act as dummy services:
>
> ```sh
> CONSENT_API_ORIGIN=http://localho.st:8000 OTHER_SERVICE_ORIGIN=http://localho.st:8082 PORT=8081 make run
> ```
>
> and
>
> ```sh
> CONSENT_API_ORIGIN=http://localho.st:8000 PORT=8082 make run
> ```
The tests expect to find these available at the following URLs:
| Name | Env var | Default |
| --------------- | ---------------------------- | ---------------------- |
| Consent API | E2E_TEST_CONSENT_API_URL | http://localho.st:8000 |
| Dummy service 1 | E2E_TEST_DUMMY_SERVICE_1_URL | http://localho.st:8080 |
| Dummy service 2 | E2E_TEST_DUMMY_SERVICE_2_URL | http://localho.st:8081 |
Due to CORS restrictions, the tests will fail if the URL domain is `localhost` or
`127.0.01`, so a workaround is to use `localho.st` which resolves to `127.0.0.1`.
Run the tests with the following command:
```
make test-end-to-end
```
### Branching
This project uses [Github Flow](https://githubflow.github.io/).
- `main` branch is always deployable
- To work on something new, create a descriptively named branch off `main`
- Commit to that branch locally and regularly push to the same named branch on the
server (Github)
- When you need feedback or help, or you think the branch is ready to merge, rebase off
`main` and open a pull request
- After the pull request has been reviewed and automated checks have passed, you can
merge to `main`
- Commits to `main` are automatically built, deployed and tested in the Integration
environment.
You can also point the integration tests at the cloud instances by specifying the URL.

New features are developed on feature branches, which must be rebased on the main branch
and squashed before merging to main.

## Documentation

Expand Down
62 changes: 62 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Project TODOs and Production Readiness Checklist

## Infrastructure Improvements

### Cloud Run Configuration
- [ ] Fix default image deployment issue
- Current: Terraform deploys hello-world image during updates
- Need: Use latest tag or specified image variable
- Fallback: Use hello-world only if GCR image doesn't exist

### Performance Optimization
- [ ] Implement aggressive scaling strategy
- [ ] Set lower CPU utilization threshold (around 50%) for production
- [ ] Goal: Maintain one spare instance to prevent startup delays
- Note: Only apply to production, not staging/development

- [ ] Optimize instance resources
- Current: 1 vCPU, 1GB RAM per instance
- Proposed: 4 vCPU, 4GB RAM per instance
- Benefits:
- Reduced need for frequent scaling
- Better request latency handling
- More efficient Unicorn worker distribution

### Server Optimization
- [ ] Investigate Unicorn optimization opportunities
- Current: Basic configuration
- Goal: Improve load distribution and reduce latency
- Areas to explore:
- Worker process configuration
- Connection pooling
- Request timeout settings

## Cost-Performance Balance
- [ ] Evaluate resource allocation strategy
- Consider trade-off: Fewer, more powerful pods vs many smaller pods
- Focus on optimizing Unicorn configuration for better resource utilization
- Balance between scaling speed and resource efficiency

## Notes for Future Development
- Service not yet in production with departments
- All scaling and performance configurations should be thoroughly tested before production deployment
- Monitor startup times and request latency during peak loads


## CI/CD and Testing Pipeline
- [ ] Migrate deployment scripts to GitHub Actions
- [ ] Set up deployment workflows for each environment
- [ ] Implement proper environment variable handling
- [ ] Add deployment approval gates for production

- [ ] Implement automated testing in CI
- [ ] Run integration tests in GitHub Actions
- [ ] Configure Playwright end-to-end tests
- [ ] Set up test reporting and notifications

## Security and Monitoring
- [ ] Enhance Cloud Armor configuration
- [ ] Test and monitor WAF rules
- [ ] Verify alert configurations
- [ ] Document incident response procedures
- [ ] Set up alert notifications for security events
35 changes: 35 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,38 @@ The Consent API follows Domain-Driven Design (DDD) and Hexagonal Architecture pr
3. API processes requests through its layered architecture
4. Data is stored in PostgreSQL for application state
5. Consent events are logged to BigQuery for audit purposes

## Infrastructure Scaling Strategy

### Resource Allocation by Environment

The Single Consent service uses environment-specific resource allocation to ensure optimal performance while maintaining cost efficiency. Here's how resources are provisioned across environments:

#### Production Environment
- **Cloud SQL**: 8 vCPU, 16GB RAM
- **Cloud Run**: 3-20 instances, 4 CPU cores and 2GB RAM per container
- **Rationale**:
- High-traffic public service serving millions of UK users
- Critical for maintaining low latency across multiple government domains
- Aggressive scaling strategy (min 3 instances) to handle traffic spikes without cold starts
- Higher resource allocation per instance reduces request latency and improves user experience

#### Staging Environment
- **Cloud SQL**: 2 vCPU, 4GB RAM
- **Cloud Run**: 1-2 instances, 1 CPU core and 512MB RAM per container
- **Purpose**: Testing environment that mirrors production configuration but with reduced resources

#### Development Environment
- **Cloud SQL**: 2 vCPU, 4GB RAM
- **Cloud Run**: 1-2 instances, 1 CPU core and 512MB RAM per container
- **Purpose**: Local development and testing with minimal resource allocation

### Scaling Strategy

The production environment employs an aggressive scaling strategy with a lower CPU utilization threshold (50%) for scaling up. This ensures:
1. Minimal cold starts by maintaining warm instances
2. Faster response to traffic spikes
3. Consistent performance across all government domains
4. Reduced latency for consent checks and updates

This strategy is particularly important for the Single Consent service as it acts as a central point for cookie consent across multiple government domains, where any performance degradation could impact user experience across the entire gov.uk estate.
6 changes: 6 additions & 0 deletions infra/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,12 @@ terraform apply
- variables.tf
- terraform.tfvars (update values)

## Resource Management

Infrastructure resources (Cloud Run instances and Cloud SQL databases) are managed through environment-specific variables, allowing flexible resource allocation based on environment needs. This enables appropriate scaling from development to production workloads.

For detailed information about environment-specific hardware specifications and the rationale behind our resource allocation strategy, see the [Infrastructure Scaling Strategy](../docs/architecture.md#infrastructure-scaling-strategy) section in our architecture documentation.

## Module Updates

When making changes to the shared module:
Expand Down
16 changes: 10 additions & 6 deletions infra/environments/development/terraform.tfvars
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
environment = "development"
project_id = "sde-consent-api"
region = "europe-west2"
domain_name = "gds-single-consent-dev.app"
domain_name = "dev.gds-single-consent.app"
db_name = "consent-api"

# Development settings (commented for production testing)
db_tier = "db-custom-2-4096" # 2 vCPU, 4GB RAM for development
db_version = "POSTGRES_14"
db_deletion_protected = false
min_instances = 2
max_instances = 5
container_concurrency = 80
db_deletion_protected = false # Allow deletion in development

# Cloud Run configuration for development
min_instances = 1 # Minimum instances for development
max_instances = 2 # Maximum 2 instances for development
container_cpu = "1000m" # 1 CPU core per container
container_memory = "512Mi" # 512MB RAM per container
container_concurrency = 80 # Same concurrency settings

# Production settings for load testing on the development instance
# db_tier = "db-custom-8-16384" # 8 vCPU, 16GB RAM as in production
Expand All @@ -21,4 +25,4 @@ container_concurrency = 80
# container_concurrency = 80 # Production concurrency

# Load testing configuration
load_test_ip = "35.246.19.18"
load_test_ip = "35.246.19.18" # IP for load testing in development
6 changes: 4 additions & 2 deletions infra/environments/production/terraform.tfvars
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,10 @@ db_version = "POSTGRES_14"
db_deletion_protected = true

# Cloud Run configuration for high throughput
min_instances = 3 # Keep more warm instances ready
max_instances = 10 # Scale up to 10 instances
min_instances = 3 # Minimum instances for production load
max_instances = 20 # Scale up to 20 instances for high load
container_cpu = "4000m" # 4 CPU cores per container
container_memory = "2048Mi" # 2GB RAM per container
container_concurrency = 80 # Optimize for throughput

# Production requires no load test IP
Expand Down
14 changes: 8 additions & 6 deletions infra/environments/staging/terraform.tfvars
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
environment = "staging"
project_id = "sde-consent-api"
region = "europe-west2"
domain_name = "gds-single-consent-staging.app"
domain_name = "staging.gds-single-consent.app"
db_name = "consent-api"
db_tier = "db-custom-4-8192" # 4 vCPU, 8GB RAM for staging
db_tier = "db-custom-2-4096" # 2 vCPU, 4GB RAM for staging
db_version = "POSTGRES_14"
db_deletion_protected = true

# Cloud Run configuration
min_instances = 2
max_instances = 8
container_concurrency = 80
# Cloud Run configuration for staging
min_instances = 1 # Minimum instances for staging
max_instances = 2 # Maximum 2 instances for staging
container_cpu = "1000m" # 1 CPU core per container
container_memory = "512Mi" # 512MB RAM per container
container_concurrency = 80 # Same concurrency settings

# Load testing configuration
load_test_ip = "35.246.19.18"
7 changes: 7 additions & 0 deletions infra/modules/consent-api/cloud_run.tf
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,13 @@ resource "google_cloud_run_service" "this" {
containers {
image = local.container_image

resources {
limits = {
cpu = var.container_cpu
memory = var.container_memory
}
}

# Mount secrets as environment variables
env {
name = "DB_USER"
Expand Down
Loading

0 comments on commit 7db04d5

Please sign in to comment.