Skip to content

Commit

Permalink
feat: support the vfps pseudonym service (#59)
Browse files Browse the repository at this point in the history
* feat: support for the vfps pseudonym service

* test: improved code coverage
  • Loading branch information
chgl authored Oct 11, 2022
1 parent 91a8b6e commit 7c224cc
Show file tree
Hide file tree
Showing 40 changed files with 2,110 additions and 630 deletions.
19 changes: 19 additions & 0 deletions .checkov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
skip-check:
- CKV_DOCKER_3
- CKV_DOCKER_2
# CKV_K8S_21: "The default namespace should not be used" - used for simple testing inside a KinD cluster
- CKV_K8S_21
# CKV_K8S_10: "CPU requests should be set" - ignored for iter8 job pod
- CKV_K8S_10
# CKV_K8S_11: "CPU limits should be set" - ignored for iter8 job pod
- CKV_K8S_11
# CKV_K8S_12: "Memory requests should be set"
- CKV_K8S_12
# CKV_K8S_13: "Memory limits should be set" - ignored for iter8 job pod
- CKV_K8S_13
# CKV_K8S_15: "Image Pull Policy should be Always" - ignored for digest-pinned iter8
- CKV_K8S_15
# CKV_K8S_12: "Memory requests should be set" - ignored for iter8
- CKV_K8S_12
# CKV_K8S_38: "Ensure that Service Account Tokens are only mounted where necessary" - necessary for iter8
- CKV_K8S_38
133 changes: 129 additions & 4 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ jobs:
pull-requests: write
outputs:
image-tags: ${{ steps.container_meta.outputs.tags }}
image-version: ${{ steps.container_meta.outputs.version }}
steps:
- name: Docker meta
id: container_meta
Expand Down Expand Up @@ -112,14 +113,15 @@ jobs:
env:
IS_PULL_REQUEST: ${{ github.event_name == 'pull_request' }}

- name: Build and push
uses: docker/build-push-action@v3
- name: Build and push image
uses: docker/build-push-action@c84f38281176d4c9cdb1626ffafcd6b3911b5d94 # tag=v3
with:
cache-from: type=gha
cache-to: type=gha,mode=max
load: ${{ github.event_name == 'pull_request' }}
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.container_meta.outputs.tags }}
labels: ${{ steps.container_meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
platforms: ${{ steps.platforms.outputs.platforms }}

- name: Add Coverage PR Comment
Expand All @@ -129,6 +131,129 @@ jobs:
recreate: true
path: code-coverage-results.md

- name: Save container image as tar archives
if: ${{ github.event_name == 'pull_request' }}
env:
IMAGE: ${{ fromJson(steps.container_meta.outputs.json).tags[0] }}
run: |
docker save "$IMAGE" -o /tmp/image.tar
- name: Upload container image
if: ${{ github.event_name == 'pull_request' }}
uses: actions/upload-artifact@3cea5372237819ed00197afe530f5a7ea3e805c8 # tag=v3.1.0
with:
name: container-image
path: |
/tmp/image.tar
run-iter8-tests:
name: run iter8 tests
runs-on: ubuntu-22.04
if: ${{ github.event_name == 'pull_request' }}
needs:
- build
permissions:
contents: read
pull-requests: write
steps:
- name: Checkout
uses: actions/checkout@2541b1294d2704b0964813337f33b291d3f8596b # tag=v3

- uses: iter8-tools/iter8@v0.11

- name: Create KinD cluster
uses: helm/kind-action@9e8295d178de23cbfbd8fa16cf844eec1d773a07 # tag=v1.4.0
with:
cluster_name: kind

- name: Download container images
uses: actions/download-artifact@fb598a63ae348fa914e94cd0ff38f362e927b741 # tag=v3.0.0
with:
name: container-image
path: /tmp

- name: Load image into KinD
run: |
kind load image-archive /tmp/image.tar
- name: List images in cluster
run: docker exec kind-control-plane crictl images

- name: Install the latest version of vfps as a pseudonymization service
run: |
helm repo add chgl https://chgl.github.io/charts
helm install \
--wait \
--timeout=10m \
vfps chgl/vfps
- name: Install "fhir-pseudonymizer"
env:
IMAGE_TAG: ${{ needs.build.outputs.image-version }}
run: |
helm repo add miracum https://miracum.github.io/charts
helm install \
--set="image.tag=${IMAGE_TAG}" \
-f tests/iter8/values.yaml \
--wait \
--timeout=10m \
fhir-pseudonymizer miracum/fhir-pseudonymizer
- name: Launch iter8 experiment
run: kubectl apply -f tests/iter8/experiment.yaml

- name: Wait for experiment completion
run: iter8 k assert -c completed --timeout 10m

- name: Assert no failures and SLOs are satisfied
run: iter8 k assert -c nofailure,slos

- name: Create iter8 reports
if: always()
run: |
iter8 k report | tee iter8-report.txt
iter8 k report -o html > iter8-report.html
- name: Enhance iter8 report output for use as a PR comment
run: |
ITER8_REPORT_TXT=$(cat iter8-report.txt)
{
echo -e '---';
echo -e '## iter8 report';
echo -e '```console';
echo -e "${ITER8_REPORT_TXT}";
echo -e '```'
} >> iter8-output.md
- name: Append sticky comment with iter8 report
uses: marocchino/sticky-pull-request-comment@39c5b5dc7717447d0cba270cd115037d32d28443 # tag=v2.2.0
if: ${{ github.event_name == 'pull_request' }}
with:
append: true
path: iter8-output.md

- name: Upload report
if: always()
uses: actions/upload-artifact@3cea5372237819ed00197afe530f5a7ea3e805c8 # tag=v3.1.0
with:
name: iter8-report.html
path: |
iter8-report.html
- name: Print cluster and iter8 logs
if: always()
run: |
kubectl cluster-info dump -o yaml | tee kind-cluster-dump.txt
iter8 k log -l trace
- name: Upload cluster dump
if: always()
uses: actions/upload-artifact@3cea5372237819ed00197afe530f5a7ea3e805c8 # tag=v3.1.0
with:
name: kind-cluster-dump.txt
path: |
kind-cluster-dump.txt
release:
needs: build
name: Release
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/mega-linter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
# Upload MegaLinter artifacts
- name: Archive production artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
uses: actions/upload-artifact@3cea5372237819ed00197afe530f5a7ea3e805c8 # tag=v3.1.0
with:
name: MegaLinter reports
path: |
Expand Down
3 changes: 3 additions & 0 deletions .mega-linter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,6 @@ FILEIO_REPORTER: false

BASH_SHFMT_ARGUMENTS:
- "--indent=2"

REPOSITORY_TRIVY_ARGUMENTS:
- "--severity='MEDIUM,HIGH,CRITICAL'"
6 changes: 6 additions & 0 deletions .protolintrc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
lint:
rules_option:
max_line_length:
max_chars: 120
indent:
not_insert_newline: true
3 changes: 3 additions & 0 deletions .trivyignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# iter8 requires access to secrets
AVD-KSV-0041
KSV041
3 changes: 2 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ RUN dotnet test \
--configuration=Release \
--collect:"XPlat Code Coverage" \
--results-directory=./coverage \
-l "console;verbosity=detailed"
-l "console;verbosity=detailed" \
--settings=runsettings.xml

FROM runtime
COPY --from=build /build/publish/*anonymization.yaml /etc
Expand Down
82 changes: 71 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

> Send a FHIR® resource to `/fhir/$de-identify` get it back anonymized and/or pseudonymized.
Based on the brilliant [FHIR Tools for Anonymization](https://github.com/microsoft/FHIR-Tools-for-Anonymization/).
Based on the brilliant [Tools for Health Data Anonymization](https://github.com/microsoft/Tools-for-Health-Data-Anonymization).

## Usage

Expand All @@ -19,7 +19,7 @@ Container images are pushed to the following registries:
- `quay.io/miracum/fhir-pseudonymizer:latest`
- `harbor.miracum.org/miracum-etl/fhir-pseudonymizer:latest`

For deployment in Kubernetes see <https://github.com/miracum/charts/tree/master/charts/fhir-gateway> for a Helm Chart using the FHIR Pseudonymizer as one of its components.
For deployment in Kubernetes see <https://github.com/miracum/charts/tree/master/charts/fhir-pseudonymizer> for a Helm Chart deploying the FHIR Pseudonymizer.

### API Endpoints

Expand All @@ -29,11 +29,11 @@ An OpenAPI definition for the FHIR operation endpoints is available at `/swagger

#### `$de-identify`

The server provides a `/fhir/$de-identify` operation to de-identfiy received FHIR resources according to the configuration in the [anonymization.yaml](src/FhirPseudonymizer/anonymization.yaml) rules. See <https://github.com/microsoft/FHIR-Tools-for-Anonymization/> for more details on the anonymization rule configuration.
The server provides a `/fhir/$de-identify` operation to de-identfiy received FHIR resources according to the configuration in the [anonymization.yaml](src/FhirPseudonymizer/anonymization.yaml) rules. See [Tools for Health Data Anonymization](https://github.com/microsoft/Tools-for-Health-Data-Anonymization) for more details on the anonymization rule configuration.

The service comes with a sample configuration file to help meet the requirements of HIPAA Safe Harbor Method (2)(i): [hipaa-anonymization.yaml](src/FhirPseudonymizer/hipaa-anonymization.yaml).This configuration can be used by setting `ANONYMIZATIONENGINECONFIGPATH=/etc/hipaa-anonymization.yaml`.

A new `pseudonymize` method was added to the default list of anonymization methods linked above. It uses [gPAS](https://www.ths-greifswald.de/en/researchers-general-public/gpas/) to create pseudonyms and replace the values in the resource with them.
A new `pseudonymize` method was added to the default list of anonymization methods linked above. It uses either [gPAS](https://www.ths-greifswald.de/en/researchers-general-public/gpas/) or [Vfps](https://github.com/chgl/vfps) to create pseudonyms and replace the values in the resource with them.
For example, the following rule replaces all identifiers of type `http://terminology.hl7.org/CodeSystem/v2-0203|MR` with a pseudonym generated in the `PATIENT` domain.

```yaml
Expand All @@ -45,7 +45,10 @@ fhirPathRules:
Note that if the `domain` setting is omitted, and an ID or reference is pseudonymized, then the resource name is used as the pseudonym domain. For example, pseudonymizing `"reference": "Patient/123"` will try to create a pseudonym for `123` in the `Patient` domain.

Note that all methods defined in [FHIR-Tools-for-Anonymization](https://github.com/microsoft/FHIR-Tools-for-Anonymization/) are supported. For example, to clamp a patient's birthdate if they were born before January 1st 1931 to 01/01/1930, use:
When using [Vfps](https://github.com/chgl/vfps), the `domain` setting can instead also be set as `namespace`.

Note that all methods defined in [Tools for Health Data Anonymization](https://github.com/microsoft/Tools-for-Health-Data-Anonymization) are supported.
For example, to clamp a patient's birthdate if they were born before January 1st 1931 to 01/01/1930, use:

```yaml
fhirPathRules:
Expand Down Expand Up @@ -77,16 +80,35 @@ Additionally, there are some optional configuration values that can be set as en

| Environment Variable | Description | Default |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------- |
| `gPAS__Url` | The gPAS TTP FHIR Gateway URL. Only required if any of the anonymization.yaml rules use the `pseudonymize` method. | `""` |
| `gPAS__Auth__Basic__Username` | The HTTP basic auth username to connect to gPAS | `""` |
| `gPAS__Auth__Basic__Password` | The HTTP basic auth password to connect to gPAS | `""` |
| `AnonymizationEngineConfigPath` | Path to the `anonymization.yaml` that contains the rules to transform the resources. | `"/etc/anonymization.yaml"` |
| `ApiKey` | Key that must be set in the `X-Api-Key` header to allow requests to protected endpoints. | `""` |
| `gPAS__Version` | Version of gPAS to support. There were breaking changes to the FHIR API in 1.10.2 and 1.10.3, so explicitely set this value if you are using a later version than 1.10.1. | `"1.10.1"` |
| `UseSystemTextJsonFhirSerializer` | Enable the new `System.Text.Json`-based FHIR serializer to significantly [improve throughput and latencies](#usesystemtextjsonfhirserializer). See <https://github.com/FirelyTeam/firely-net-sdk/releases/tag/v4.0.0-r4> | `false` |
| `PseudonymizationService` | The type of pseudonymization service to use. Can be one of `gPAS`, `Vfps`, `None` | `"gPAS"` |

See [appsettings.json](src/FhirPseudonymizer/appsettings.json) for additional options.

The application supports pseudonymization using either [gPAS](https://www.ths-greifswald.de/forscher/gpas/) or [Vfps](https://github.com/chgl/vfps) which can be configured via the `PseudonymizationService` setting.
Service-specific configuration settings are listed below.

### gPAS

| Environment Variable | Description | Default |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- |
| `gPAS__Url` | The gPAS TTP FHIR Gateway URL. Only required if any of the anonymization.yaml rules use the `pseudonymize` method. | `""` |
| `gPAS__Auth__Basic__Username` | The HTTP basic auth username to connect to gPAS | `""` |
| `gPAS__Auth__Basic__Password` | The HTTP basic auth password to connect to gPAS | `""` |
| `gPAS__Version` | Version of gPAS to support. There were breaking changes to the FHIR API in 1.10.2 and 1.10.3, so explicitely set this value if you are using a later version than 1.10.1. | `"1.10.1"` |

### Vfps

| Environment Variable | Description | Default |
| ----------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
| `Vfps__Address` | The Vfps service address. Use `dns:///` scheme for client-side load-balancing. | `""` |
| `Vfps__UnsafeUseInsecureChannelCallCredentials` | If set to `true`, `CallCredentials` are applied to gRPC calls made by an insecure channel. Sending authentication headers over an insecure connection has security implications and shouldn't be done in production environments. | `true` |
| `Vfps__UseTls` | If set to `true`, creates client-side SSL credentials loaded from disk file pointed to by the `GRPC_DEFAULT_SSL_ROOTS_FILE_PATH` environment variable. If that fails, gets the roots certificates from a well known place on disk. | `false` |
| `Vfps__Auth__Basic__Username` | The HTTP basic auth username to connect to the Vfps service. Used in the `Authorization: Basic` metadata header value for the gRPC calls. | `""` |
| `Vfps__Auth__Basic__Password` | The HTTP basic auth password to connect to the Vfps service. | `""` |

## Dynamic rule settings

Anonymization and pseudonymization rules in the `anonymization.yaml` config file can be overridden and/or extended on a per request basis.
Expand Down Expand Up @@ -217,6 +239,44 @@ pre-commit install
pre-commit install --hook-type commit-msg
```

### Run iter8 SLO experiments locally

```sh
kind create cluster
export IMAGE_TAG="iter8-test"
docker build -t ghcr.io/miracum/fhir-pseudonymizer:${IMAGE_TAG} .
kind load docker-image ghcr.io/miracum/fhir-pseudonymizer:${IMAGE_TAG}
helm repo add chgl https://chgl.github.io/charts
helm repo add miracum https://miracum.github.io/charts
helm repo update
helm install \
--wait \
--timeout=10m \
vfps chgl/vfps
helm upgrade --install \
--set="image.tag=${IMAGE_TAG}" \
-f tests/iter8/values.yaml \
--wait \
--timeout=10m \
fhir-pseudonymizer miracum/fhir-pseudonymizer
kubectl apply -f tests/iter8/experiment.yaml
iter8 k assert -c completed --timeout 15m
iter8 k assert -c nofailure,slos
iter8 k report
# to restart:
kubectl delete job default-1-job
kubectl apply -f tests/iter8/experiment.yaml
```

## Benchmark

> **Note**
Expand Down Expand Up @@ -300,8 +360,8 @@ cosign verify --key https://miracum.github.io/cosign.pub ghcr.io/miracum/fhir-ps
## Semantic versioning exclusion policies

The project's versioning follows the [SemVer](https://semver.org/) convention.
However, we exclude metrics (ie. anything under the `/metrics` endpoint), traces,
and the contents of the container image from this. Alwas be prepared to double-check the release notes before updating.
However, we exclude metrics (ie. anything under the `/metrics` endpoint), traces, and the contents of the container image from this.
Alwas be prepared to double-check the release notes before updating.

## Attribution

Expand Down
11 changes: 6 additions & 5 deletions benchmark/bombardier.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
RESOURCE_PATH=${RESOURCE_PATH:-bundle.json}

bombardier -f "${RESOURCE_PATH}" \
-H "Content-Type:application/fhir+json" \
-m POST \
-d 60s \
-l \
"http://localhost:5000/fhir/\$de-identify"
--timeout=10s \
-H "Content-Type:application/fhir+json" \
-m POST \
-d 60s \
-l \
"http://localhost:5000/fhir/\$de-identify"
Loading

0 comments on commit 7c224cc

Please sign in to comment.