MDF Connect

The Materials Data Facility Connect service is the ETL flow to deeply index datasets into MDF Search. It is not intended to be run by end-users. To submit data to the MDF, visit the Materials Data Facility.

Architecture

The MDF Connect service is a serverless REST service that is deployed on AWS. It consists of an AWS API Gateway that uses a lambda function to authenticate requests against GlobusAuth. If authorised, the endpoints trigger AWS lambda functions. Each endpoint is implemented as a lambda function contained in a python file in the aws/ directory. The lambda functions are deployed via GitHub actions as described in a later section.

The API Endpoints are:

POST /submit: Submits a dataset to the MDF Connect service. This triggers a Globus Automate flow
GET /status: Returns the status of a dataset submission
POST /submissions: Forms a query and returns a list of submissions

Globus Automate Flow

The Globus Automate flow is a series of steps that are triggered by the POST /submit endpoint. The flow is defined using a python dsl that can be found in automate/minimus_mdf_flow.py. At a high level the flow:

Notifies the admin that a dataset has been submitted
Checks to see if the data files have been updated or if this is a metadata only submission
If there is a dataset, it starts a globus transfer
Once the transfer is complete it may trigger a curation step if the organization is configured to do so
A DOI is minted if the organization is configured to do so
The dataset is indexed in MDF Search
The user is notified of the completion of the submission

Development Workflow

Changes should be made in a feature branch based off of the dev branch. Create PR and get a friend to review your changes. Once the PR is approved, merge it into the dev branch. The dev branch is automatically deployed to the dev environment. Once the changes have been tested in the dev environment, create a PR from dev to main. Once the PR is approved, merge it into main. The main branch is automatically deployed to the prod environment.

Deployment

The MDF Connect service is deployed on AWS into development and production environments. The automate flow is deployed into the Globus Automate service via a second GitHub action.

Deploy the Automate Flow

Changes to the automate flow are deployed via a GitHub action, triggered by the push of a new GitHub release. If the release is tagged as "pre-release" it will be deployed to the dev environment, otherwise it will be deployed to the prod environment.

The flow IDs for dev and prod are stored in automate/mdf_dev_flow_info.json and automate/mdf_prod_flow_info.json respectively. The flow ID is stored in the flow_id key.

Deploy a Dev Release of the Flow

Merge your changes into the dev branch
On the GitHub website, click on the Release link on the repo home page.
Click on the Draft a new release button
Fill in the tag version as X.Y.Z-alpha.1 where X.Y.Z is the version number. You can use subsequent alpha tags if you need to make further changes.
Fill in the release title and description
Select dev as the Target branch
Check the Set as a pre-release checkbox
Click the Publish release button

Deploy a Prod Release of the Flow

Merge your changes into the main branch
On the GitHub website, click on the Release link on the repo home page.
Click on the Draft a new release button
Fill in the tag version as X.Y.Z where X.Y.Z is the version number.
Fill in the release title and description
Select main as the Target branch
Check the Set as the latest release checkbox
Click the Publish release button

You can verify deployment of the flows in the Globus Automate Console.

Deploy the MDF Connect Service

The MDF Connect service is deployed via a GitHub action. The action is triggered by a push to the dev or main branch. The action will deploy the service to the dev or prod environment respectively.

Updating Schemas

Schemas and the MDF organization database are managed in the automate branch of the Data Schemas Repo.

The schema is deployed into the docker images used to serve up the lambda functions.

Reviewing Logs

Running Tests

To run the tests first make sure that you are running python 3.7.10. Then install the dependencies:

$ cd aws/tests
$ pip3 install -r requirements-test.txt

Now you can run the tests using the command:

$ PYTHONPATH=.. python -m pytest --ignore schemas

Support

This work was performed under financial assistance award 70NANB14H012 from U.S. Department of Commerce, National Institute of Standards and Technology as part of the Center for Hierarchical Material Design (CHiMaD). This work was performed under the following financial assistance award 70NANB19H005 from U.S. Department of Commerce, National Institute of Standards and Technology as part of the Center for Hierarchical Materials Design (CHiMaD). This work was also supported by the National Science Foundation as part of the Midwest Big Data Hub under NSF Award Number: 1636950 "BD Spokes: SPOKE: MIDWEST: Collaborative: Integrative Materials Design (IMaD): Leverage, Innovate, and Disseminate".

Name		Name	Last commit message	Last commit date
Latest commit History 1,234 Commits
.github/workflows		.github/workflows
automate		automate
aws		aws
dockerize		dockerize
docs		docs
infra		infra
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MDF Connect

Architecture

Globus Automate Flow

Development Workflow

Deployment

Deploy the Automate Flow

Deploy a Dev Release of the Flow

Deploy a Prod Release of the Flow

Deploy the MDF Connect Service

Updating Schemas

Reviewing Logs

Running Tests

Support

About

Releases 26

Packages

Contributors 7

Languages

License

materials-data-facility/connect_server

Folders and files

Latest commit

History

Repository files navigation

MDF Connect

Architecture

Globus Automate Flow

Development Workflow

Deployment

Deploy the Automate Flow

Deploy a Dev Release of the Flow

Deploy a Prod Release of the Flow

Deploy the MDF Connect Service

Updating Schemas

Reviewing Logs

Running Tests

Support

About

Resources

License

Stars

Watchers

Forks

Releases 26

Packages 0

Contributors 7

Languages

Packages