This repository provides a Docker instance of Synthea, a tool for generating synthetic patient records.
0) Prerequisites:
- An up-to-date Docker instance (tested with
v20.10.14
). git
installed.- A stable internet connection.
- Only for Linux users: Please provide Docker and its corresponding user group(s) read AND write access to your file system (specifically: The folder from which you start your Docker containers and all its subdirectories).
1) Clone this repository:
Use either of the following two commands to clone this repository to your computer:
git clone https://github.com/hpi-dhc/synthea-v270.git
git clone git@github.com:hpi-dhc/synthea-v270.git
On your machine, cd
to the cloned folder and create an empty folder called output
.
mkdir output
This folder is needed to store the output generated by the dockerized Synthea.
2) Build Docker image:
Initially, you need to build the Docker image (name: synthea-v270) from the Dockerfile provided within this repository. Make sure that your working directory is this folder.
docker build -t synthea-v270 .
3) Run image as a Docker container:
Once the image has been built, run it as a container (here, its name is 'synthea'), mount the synthea.properties
file (the file shipped with this repository uses default settings except for the parameter to produce additional .csv
output for further processing in an ETL pipeline for an OMOP CDM-formatted database), and mount the output folder of the container to your host file system to access the data in a later project.
Select the appropriate populationSize
(e.g., here: 123).
If you wish to see the console output of synthea, remove the -d
flag from the command below.
The container will automatically be stopped and removed after the data will have been generated.
docker run --rm -d --name synthea \
--mount type=bind,source=$(pwd)/synthea.properties,target=/app/synthea.properties,readonly \
--mount type=bind,source=$(pwd)/output,target=/app/output \
-e populationSize=123 \
synthea-v270
4) Access your files:
The synthetic patient data output (persisted as .csv
files on the host machine) is available in the ./output/csv
folder for inspection and further processing in other projects, e.g., the OMOP import container environment.
The corresponding JSON-formatted FHIR (v4) data is in the ./output/fhir
folder.
- All settings for Synthea can be changed in the
./synthea.properties
file, e.g., when you wish to have additional STU 3- or DSTU 2-formatted FHIR output). All FHIR JSON files will also be in the./output
folder. - Every time you build a new cohort, you need delete the content of the
./output
folder of your host prior to executing the script. - The only Synthea version currently (2022-04-05) compatible with the OHDSI ETL-Synthea scripts is
v2.7.0
, even though Synthea is already atv3.x
. One of the notable differences is that the newer versions use a slightly different syntax for thesynthea.properties
file. - If not otherwise specified, all commands are executed on the host machine with the working directory being the cloned repository.
- This repository is intended for local use only. Even though easily implementable best practices for creating Dockerfiles were followed, deployment in a production setting would require additional security mechanisms.
- This project intentionally refrains from using a copyleft license. Nevertheless, all users are kindly invited to contribute to the project, specifically to leave a note to the author if you find parts of the code to be broken or the explanations in this README ambiguous.
Copyright 2022 Hasso Plattner Institute
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
(Written by Jan Philipp Sachs on May 17, 2022)