This repository hosts general tutorials on the OPTIMADE specification and particular database implementations of the API. These open-ended exercises were initially provided to accompany the following workshops:
- NOMAD CoE Tutorial 6: OPTIMADE, 7-8 September 2021
- ICTP-EAIFR Training School: Working with Materials Databases and OPTIMADE, November-December 2021.
- CECAM Flagship Workshop Open Databases Integration for Materials Design, May 30, 2022 - June 3, 2022.
- Actively Learning Materials Science, Aalto University, February 27, 2023 - March 3, 2023.
This document is hosted on GitHub, and all feedback or suggestions for new exercises can be provided as an issue or pull request in that repository.
If you would like to get involved with the OPTIMADE consortium, you can find some more details on the OPTIMADE home page.
- Matthew Evans, UCLouvain (repository and general exercises)
- Matthew Horton, LBNL (
pymatgen
exercise) - Evgeny Blokhin, Tilde Materials Informatics (typos and bug fixes)
- Cormac Toher, Duke University (AFLOW exercise)
- Abhijith Gopakumar, Northwestern U. (OQMD exercise)
- Johan Bergsma, CECAM (typos, testing and feedback)
The OPTIMADE specification defines a web-based JSON API that is
implemented by many different materials
databases to allow users
to query the underlying data with the same syntax and response format.
There are several tools that can access these APIs, for example, any web
browser, any programming language that can make HTTP requests, or common
command-line tools such as curl
or wget
.
There are also specialist tools, developed by members of the OPTIMADE community. You may have heard about three such tools in other tutorials and talks:
- The Materials Cloud web-based OPTIMADE client.
- The optimade.science web-based aggregator.
pymatgen
's built-in OPTIMADE client.optimade-python-tools
'sOptimadeClient
Some of these clients can send requests to multiple OPTIMADE providers simultaneously, based on programmatic providers list. You can explore this list at the human-readable providers dashboard, where you can see the current OPTIMADE structure count exceeds 26 million!
You may wish to familiarise yourselves with the OPTIMADE API by writing your own queries, scripts or code. Some possible options:
- Craft (or copy) your own URL queries to a particular OPTIMADE implementation. Some web browsers (e.g., Firefox) will automatically format the JSON response for you (see Exercise 1).
- Use command-line tools such as
curl
orwget
to receive data in your terminal, or pipe it to a file. You could use the tooljq
to format the JSON response. - Make an appropriate HTTP request from your programming language of
choice. For Python, you could use the standard library
urllib.request
or the more ergonomic external libraries
requests and
httpx. Some example code for Python is
provided below the exercises. In Javascript, you can just use
fetch(...)
or a more advanced OPTIMADE client such as that provided by Tilde Informatics' optimade-client.
If you are following these tutorials as part of a school or workshop, please do not hesitate to ask about how to get started with any of the above tools!
This aim of this exercise is to familiarise yourself with the OPTIMADE JSON API. In the recent OPTIMADE paper [1], we provided the number of results to a set of queries across all OPTIMADE implementations, obtained by applying the same filter to the structures endpoint of each database. The filters are:
-
Query for structures containing a group IV element:
elements HAS ANY "C", "Si", "Ge", "Sn", "Pb"
. -
As above, but return only binary phases:
elements HAS ANY "C", "Si", "Ge", "Sn", "Pb" AND nelements=2
. -
This time, exclude lead and return ternary phases:
elements HAS ANY "C", "Si", "Ge", "Sn" AND NOT elements HAS "Pb" AND elements LENGTH 3
. -
In your browser, try visiting the links in Table 1 of the OPTIMADE paper [1] (clickable links in arXiv version [2]), which is reproduced below.
- Familiarise yourself with the standard JSON:API output fields
(
data
,meta
andlinks
). - You will find the crystal structures returned for the query as a
list under the
data
key, with the OPTIMADE-defined fields listed under theattributes
of each list entry. - The
meta
field provides useful information about your query, e.g.data_returned
shows how many results there are in total, not just in the current page of the response (you can check if the table still contains the correct number of entries, or if it is now out of date). - The
links
field provides links to the next or previous pages of your response, in case you requested more structures than thepage_limit
for that implementation.
- Familiarise yourself with the standard JSON:API output fields
(
-
Choose one particular entry to focus on: replace the
filter
URL parameter with/<structure_id>
for theid
of one particular structure (e.g.https://example.org/optimade/v1/structures/<structure_id>
). -
Explore other endpoints provided by each of these providers. If they serve "extra" fields (i.e. those containing the provider prefix), try to find out what these fields mean by querying the
/info/structures
endpoint. -
Try performing the same queries with some of the tools listed above, or in scripts of your own design.
Provider | N1 | N2 | N3 |
---|---|---|---|
AFLOW | 700,192 | 62,293 | 382,554 |
Crystallography Open Database (COD) | 416,314 | 3,896 | 32,420 |
Theoretical Crystallography Open Database (TCOD) | 2,631 | 296 | 660 |
Materials Cloud | 886,518 | 801,382 | 103,075 |
Materials Project | 27,309 | 3,545 | 10,501 |
Novel Materials Discovery Laboratory (NOMAD) | 3,359,594 | 532,123 | 1,611,302 |
Open Database of Xtals (odbx) | 55 | 54 | 0 |
Open Materials Database (omdb) | 58,718 | 690 | 7,428 |
Open Quantum Materials Database (OQMD) | 153,113 | 11,011 | 70,252 |
[1] Andersen et al., "OPTIMADE, an API for exchanging materials data", Sci Data 8, 217 (2021) 10.1038/s41597-021-00974-z.
[2] Andersen et al., "OPTIMADE, an API for exchanging materials data" (2021) arXiv:2103.02068.
The filters from Exercise 1 screened for group IV containing compounds, further refining the query to exclude lead, and finally to include only ternary phases.
- Choose a suitable database and modfiy the filters from Exercise 1 to
search for binary [III]-[V] semiconductors.
- A "suitable" database here is one that you think will have good coverage across this chemical space.
- Using the
chemical_formula_anonymous
field, investigate the most common stoichiometric ratios between the constituent elements, e.g. 1:1, 2:1, etc.- You may need to follow pagination links (
links->next
in the response) to access all available data for your query, or you can try adding thepage_limit=100
URL parameter to request more structures per response.
- You may need to follow pagination links (
- Apply the same filter to another database and assess the similarity
between the results, thinking carefully about how the different
focuses of each database and different methods in their
construction/curation could lead to biases in this outcome.
- For example, an experimental database may have one crystal structure entry per experimental sample studied, in which case the most useful (or "fashionable") compositions will return many more entries, especially when compared to a database that curates crystal structures such that each ideal crystal has one canonical entry (e.g., a database of minerals).
- Try to use the query you have constructed in the multi-provider clients (linked above), to query all OPTIMADE providers simultaneously.
This interactive exercise will explore the use of the OPTIMADE client
implemented in the pymatgen
Python library. This exercise can be found
in this repository under ./notebooks/demonstration-pymatgen.ipynb
or
accessed online in Google
Colab
(or equivalent notebook runners, such as
Binder).
There are many useful properties that the OPTIMADE specification has not
standardized. This is typically because the use of the property requires
additional context, e.g., reporting a "band gap" without describing how
it was calculated or measured, or properties that are only meaningful in
the context of a database, e.g., relative energies that depend on other
reference calculations. For this reason, the OPTIMADE specification
allows implementations to serve their own fields with an appropriate
"provider prefix" to the field name, and a description at the
/info/structures
endpoint.
One computed property that is key to many high-throughput studies is the
chemical stability (
- Interrogate the
/info/structures
endpoints of the OPTIMADE implementations that serve DFT data (e.g., Materials Project, AFLOW, OQMD, etc.) and identify those that serve a field that could correspond to hull distance, or other stability metrics. - Construct a filter that allows you to screen a database for metastable
materials (i.e.,
$0 < \delta < 25\text{ meV/atom}$ ) according to this metric. - Try to create a filter that can be applied to multiple databases
simultaneously (e.g., apply
?filter=_databaseA_hull_distance < 25 OR _databaseB_stability < 25
). What happens when you run this filter against a database that does not contain the field?
As a final general exercise, consider your own research problems and how you might use OPTIMADE. If you have any suggestions or feedback about how OPTIMADE can be made more useful for you, please start a discussion on the OPTIMADE MatSci forum or raise an issue at the appropriate Materials-Consortia GitHub repository.
Some potential prompts:
- What additional fields or entry types should OPTIMADE standardize to be most useful to you?
- How could the existing tools be improved, or what new tools could be created to make OPTIMADE easier to use?
- What features from other APIs/databases that you use could be adopted within OPTIMADE?
The AFLOW database is primarily built by decorating crystallographic
prototypes, and a list of the most common prototypes can be found in the
Library of Crystallographic
Prototypes. The prototype
labels can also be used to search the database for entries with relaxed
structures matching a particular prototype, using the AFLOW keyword
aflow_prototype_label_relax
; a full list of AFLOW keywords can be
found at AFLOW's /info/structures
endpoint
(http://aflow.org/API/optimade/v1.0/info/structures). Searches can be
performed for prototype labels using OPTIMADE by appending the _aflow_
prefix to the keyword: _aflow_aflow_prototype_label_relax
.
- Use OPTIMADE to search AFLOW for NaCl in the rock salt structure
(prototype label
AB_cF8_225_a_b
) - Use OPTIMADE to search AFLOW for lead-free halide cubic perovskites
with a band gap greater than 3 eV: (cubic perovskite prototype label
is
AB3C_cP5_221_a_c_b
)
This interactive exercise explores the OQMD's OPTIMADE API, and
demonstrates how you can train machine learning models on OPTIMADE data.
The notebook is available at
./notebooks/exercise7-oqmd-optimade-tutorial
and can also be accessed
online with
Colab
or
Binder
(buttons below).
This example explores the use of optimade-python-tools for querying and
serving OPTIMADE data. The notebook is available at
./notebooks/exercise8-optimade-python-tools
and can be accessed online
with Colab or Biner (buttons below).
You may find the following Python code snippets useful in the above exercises. This document can be opened as a Jupyter notebook using the Colab or Binder buttons above, or by downloading the notebook from the GitHub repository.
# Construct a query URL.
#
# You should be able to use any valid OPTIMADE implementation's
# database URL with any valid query
#
# Lets choose a random provider for now:
import random
some_optimade_base_urls = [
"https://optimade.materialsproject.org",
"http://crystallography.net/cod/optimade",
"https://nomad-lab.eu/prod/rae/optimade/"
]
database_url = random.choice(some_optimade_base_urls)
query = 'elements HAS ANY "C", "Si", "Ge", "Sn", "Pb"'
params = {
"filter": query,
"page_limit": 3
}
query_url = f"{database_url}/v1/structures"
# Using the third-party requests library:
!pip install requests
# Import the requests library and make the query
import requests
response = requests.get(query_url, params=params)
print(response)
json_response = response.json()
# Explore the first page of results
import pprint
print(json_response.keys())
structures = json_response["data"]
meta = json_response["meta"]
print(f"Query {query_url} returned {meta['data_returned']} structures")
print("First structure:")
pprint.pprint(structures[0])
# Using pagination to loop multiple requests
# We want to add additional page_limit and page_offset parameters to the query
offset = 0
page_limit = 10
while True:
params = {
"filter": query,
"page_limit": page_limit,
"page_offset": offset
}
response = requests.get(query_url, params=params).json()
# Print the IDs in the response
for result in response["data"]:
print(result["id"])
offset += page_limit
if response["meta"]["data_returned"] < offset:
break
if offset > 100:
break