DP-GEN (Deep Generator) is a software written in Python, delicately designed to generate a deep learning based model of interatomic potential energy and force field. DP-GEN is depedent on DeepMD-kit (https://github.com/deepmodeling/deepmd-kit/blob/master/README.md). With highly scalable interface with common softwares for molecular simulation, DP-GEN is capable to automatically prepare scripts and maintain job queues on HPC machines (High Performance Cluster) and analyze results.
If you use this software in any publication, please cite:
Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E, DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models, Computer Physics Communications, 2020, 107206.
- Accurate and efficient: DP-GEN is capable to sample more than tens of million structures and select only a few for first principles calculation. DP-GEN will finally obtain a uniformly accurate model.
- User-friendly and automatic: Users may install and run DP-GEN easily. Once succusefully running, DP-GEN can dispatch and handle all jobs on HPCs, and thus there's no need for any personal effort.
- Highly scalable: With modularized code structures, users and developers can easily extend DP-GEN for their most relevant needs. DP-GEN currently supports for HPC systems (Slurm, PBS, LSF and cloud machines ), Deep Potential interface with DeePMD-kit, MD interface with LAMMPS and ab-initio calculation interface with VASP, PWSCF,SIESTA and Gaussian. We're sincerely welcome and embraced to users' contributions, with more possibilities and cases to use DP-GEN.
-
dpgen:
-
data: source codes for preparing initial data of bulk and surf systems.
-
generator: source codes for main process of deep generator.
-
auto_test : source code for undertaking materials property analysis.
-
remote : source code for automatically submiting scripts,maintaining job queues and collecting results.
-
database : source code for collecting data generated by DP-GEN and interface with database.
-
-
examples : providing example JSON files.
-
tests : unittest tools for developers.
One can easily run DP-GEN with :
dpgen TASK PARAM MACHINE
where TASK is the key word, PARAM and MACHINE are both JSON files.
Options for TASK:
init_bulk
: Generating initial data for bulk systems.init_surf
: Generating initial data for surface systems.run
: Main process of Deep Generator.test
: Auto-test for Deep Potential.db
: Collecting data from DP-GEN.
One can download the source code of dpgen by
git clone https://github.com/deepmodeling/dpgen.git
then you may install DP-GEN easily by:
cd dpgen
pip install --user .
With this command, the dpgen executable is install to $HOME/.local/bin/dpgen
. You may want to export the PATH
by
export PATH=$HOME/.local/bin:$PATH
To test if the installation is successful, you may execute
dpgen -h
and if everything works, it gives
DeepModeling
------------
Version: 0.5.1.dev53+gddbeee7.d20191020
Date: Oct-07-2019
Path: /home/me/miniconda3/envs/py363/lib/python3.6/site-packages/dpgen-0.5.1.dev53+gddbeee7.d20191020-py3.6.egg/dpgen
Dependency
------------
numpy 1.17.2 /home/me/miniconda3/envs/py363/lib/python3.6/site-packages/numpy
dpdata 0.1.10 /home/me/miniconda3/envs/py363/lib/python3.6/site-packages/dpdata-0.1.10-py3.6.egg/dpdata
pymatgen 2019.7.2 /home/me/miniconda3/envs/py363/lib/python3.6/site-packages/pymatgen
monty 2.0.4 /home/me/miniconda3/envs/py363/lib/python3.6/site-packages/monty
ase 3.17.0 /home/me/miniconda3/envs/py363/lib/python3.6/site-packages/ase-3.17.0-py3.6.egg/ase
paramiko 2.6.0 /home/me/miniconda3/envs/py363/lib/python3.6/site-packages/paramiko
custodian 2019.2.10 /home/me/miniconda3/envs/py363/lib/python3.6/site-packages/custodian
Description
------------
usage: dpgen [-h] {init_surf,init_bulk,run,run/report,test,db} ...
dpgen is a convenient script that uses DeepGenerator to prepare initial data,
drive DeepMDkit and analyze results. This script works based on several sub-
commands with their own options. To see the options for the sub-commands, type
"dpgen sub-command -h".
positional arguments:
{init_surf,init_bulk,run,run/report,test,db}
init_surf Generating initial data for surface systems.
init_bulk Generating initial data for bulk systems.
run Main process of Deep Potential Generator.
run/report Report the systems and the thermodynamic conditions of
the labeled frames.
test Auto-test for Deep Potential.
db Collecting data from Deep Generator.
optional arguments:
-h, --help show this help message and exit
You may prepare initial data for bulk systems with VASP by:
dpgen init_bulk PARAM [MACHINE]
The MACHINE configure file is optional. If this parameter exists, then the optimization tasks or MD tasks will be submitted automatically according to MACHINE.json.
Basically init_bulk
can be devided into four parts , denoted as stages
in PARAM
:
- Relax in folder
00.place_ele
- Pertub and scale in folder
01.scale_pert
- Run a shor AIMD in folder
02.md
- Collect data in folder
02.md
.
All stages must be in order. One doesn't need to run all stages. For example, you may run stage 1 and 2, generating supercells as starting point of exploration in dpgen run
.
If MACHINE is None, there should be only one stage in stages. Corresponding tasks will be generated, but user's intervention should be involved in, to manunally run the scripts.
Following is an example for PARAM
, which generates data from a typical structure hcp.
{
"stages" : [1,2,3,4],
"cell_type": "hcp",
"latt": 4.479,
"super_cell": [2, 2, 2],
"elements": ["Mg"],
"potcars": ["....../POTCAR"],
"relax_incar": "....../INCAR_metal_rlx",
"md_incar" : "....../INCAR_metal_md",
"scale": [1.00],
"skip_relax": false,
"pert_numb": 2,
"md_nstep" : 5,
"pert_box": 0.03,
"pert_atom": 0.01,
"coll_ndata": 5000,
"type_map" : [ "Mg", "Al"],
"_comment": "that's all"
}
If you want to specify a structure as starting point for init_bulk
, you may set in PARAM
as follows.
"from_poscar": true,
"from_poscar_path": "....../C_mp-47_conventional.POSCAR",
The following table gives explicit descriptions on keys in PARAM
.
The bold notation of key (such as Elements) means that it's a necessary key.
Key | Type | Example | Discription |
---|---|---|---|
stages | List of Integer | [1,2,3,4] | Stages for init_bulk |
Elements | List of String | ["Mg"] | Atom types |
cell_type | String | "hcp" | Specifying which typical structure to be generated. Options include fcc, hcp, bcc, sc, diamond. |
latt | Float | 4.479 | Lattice constant for single cell. |
from_poscar | Boolean | True | Deciding whether to use a given poscar as the beginning of relaxation. If it's true, keys (cell_type , latt ) will be aborted. Otherwise, these two keys are necessary. |
from_poscar_path | String | "....../C_mp-47_conventional.POSCAR" | Path of POSCAR. Necessary if from_poscar is true. |
relax_incar | String | "....../INCAR" | Path of INCAR for relaxation in VASP. Necessary if stages include 1. |
md_incar | String | "....../INCAR" | Path of INCAR for MD in VASP. Necessary if stages include 3. |
scale | List of float | [0.980, 1.000, 1.020] | Scales for transforming cells. |
skip_relax | Boolean | False | If it's true, you may directly run stage 2 (pertub and scale) using an unrelaxed POSCAR. |
pert_numb | Integer | 30 | Number of pertubations for each POSCAR. |
pert_box | Float | 0.03 | Percentage of Perturbation for cells. |
pert_atom | Float | 0.01 | Pertubation of each atoms (Angstrom). |
md_nstep | Integer | 10 | Steps of AIMD in stage 3. If it's not equal to settings via NSW in md_incar , DP-GEN will follow NSW . |
coll_ndata | Integer | 5000 | Maximal number of collected data. |
type_map | List | [ "Mg", "Al"] | The indices of elements in deepmd formats will be set in this order. |
You may prepare initial data for surface systems with VASP by:
dpgen init_surf PARAM [MACHINE]
The MACHINE configure file is optional. If this parameter exists, then the optimization tasks or MD tasks will be submitted automatically according to MACHINE.json.
Basically init_surf
can be devided into two parts , denoted as stages
in PARAM
:
- Build specific surface in folder
00.place_ele
- Pertub and scale in folder
01.scale_pert
All stages must be in order.
Following is an example for PARAM
, which generates data from a typical structure hcp.
{
"stages": [
1,
2
],
"cell_type": "fcc",
"latt": 4.034,
"super_cell": [
2,
2,
2
],
"layer_numb": 3,
"vacuum_max": 9,
"vacuum_resol": [
0.5,
1
],
"mid_point": 4.0,
"millers": [
[
1,
0,
0
],
[
1,
1,
0
],
[
1,
1,
1
]
],
"elements": [
"Al"
],
"potcars": [
"....../POTCAR"
],
"relax_incar": "....../INCAR_metal_rlx_low",
"scale": [
1.0
],
"skip_relax": true,
"pert_numb": 2,
"pert_box": 0.03,
"pert_atom": 0.01,
"_comment": "that's all"
}
Another example is from_poscar
method. Here you need to specify the POSCAR file.
{
"stages": [
1,
2
],
"cell_type": "fcc",
"from_poscar": true,
"from_poscar_path": "POSCAR",
"super_cell": [
1,
1,
1
],
"layer_numb": 3,
"vacuum_max": 5,
"vacuum_resol": [0.5,2],
"mid_point": 2.0,
"millers": [
[
1,
0,
0
]
],
"elements": [
"Al"
],
"potcars": [
"./POTCAR"
],
"relax_incar" : "INCAR_metal_rlx_low",
"scale": [
1.0
],
"skip_relax": true,
"pert_numb": 5,
"pert_box": 0.03,
"pert_atom": 0.01,
"coll_ndata": 5000,
"_comment": "that's all"
}
The following table gives explicit descriptions on keys in PARAM
.
The bold notation of key (such as Elements) means that it's a necessary key.
Key | Type | Example | Discription |
---|---|---|---|
stages | List of Integer | [1,2,3,4] | Stages for init_surf |
Elements | List of String | ["Mg"] | Atom types |
cell_type | String | "hcp" | Specifying which typical structure to be generated. Options include fcc, hcp, bcc, sc, diamond. |
latt | Float | 4.479 | Lattice constant for single cell. |
layer_numb | Integer | 3 | Number of equavilent layers of slab. |
z__min | Float | 9.0 | Thickness of slab without vacuum (Angstrom). If the layer_numb and z_min are all setted, the z_min value will be ignored. |
vacuum_max | Float | 9 | Maximal thickness of vacuum (Angstrom). |
vacuum_min | Float | 3.0 | Minimal thickness of vacuum (Angstrom). Default value is 2 times atomic radius. |
vacuum_resol | List of float | [0.5, 1 ] | Interval of thichness of vacuum. If size of vacuum_resol is 1, the interval is fixed to its value. If size of vacuum_resol is 2, the interval is vacuum_resol[0] before mid_point , otherwise vacuum_resol[1] after mid_point . |
millers | List of list of Integer | [[1,0,0]] | Miller indices. |
relax_incar | String | "....../INCAR" | Path of INCAR for relaxation in VASP. Necessary if stages include 1. |
scale | List of float | [0.980, 1.000, 1.020] | Scales for transforming cells. |
skip_relax | Boolean | False | If it's true, you may directly run stage 2 (pertub and scale) using an unrelaxed POSCAR. |
pert_numb | Integer | 30 | Number of pertubations for each POSCAR. |
pert_box | Float | 0.03 | Percentage of Perturbation for cells. |
pert_atom | Float | 0.01 | Pertubation of each atoms (Angstrom). |
coll_ndata | Integer | 5000 | Maximal number of collected data. |
You may call the main process by:
dpgen run PARAM MACHINE
.
The whole process of generator will contain a series of iterations, succussively undertaken in order such as heating the system to certain temperature.
In each iteration, there are three stages of work, namely, 00.train 01.model_devi 02.fp
.
-
00.train: DP-GEN will train several (default 4) models based on initial and generated data. The only difference between these models is the random seed for neural network initialization.
-
01.model_devi : represent for model-deviation. DP-GEN will use models obtained from 00.train to run Molecular Dynamics(default LAMMPS). Larger deviation for structure properties (default is force of atoms) means less accuracy of the models. Using this criterion, a few fructures will be selected and put into next stage
02.fp
for more accurate calculation based on First Principles. -
02.fp : Selected structures will be calculated by first principles methods(default VASP). DP-GEN will obtain some new data and put them together with initial data and data generated in previous iterations. After that a new training will be set up and DP-GEN will enter next iteration!
DP-GEN identifies the current stage by a record file, record.dpgen
, which will be created and upgraded by codes.Each line contains two number: the first is index of iteration, and the second ,ranging from 0 to 9 ,records which stage in each iteration is currently running.
0,1,2 correspond to make_train, run_train, post_train. DP-GEN will write scripts in make_train
, run the task by specific machine in run_train
and collect result in post_train
. The records for model_devi and fp stage follow similar rules.
In PARAM
, you can specialize the task as you expect.
{
"type_map": [
"H",
"C"
],
"mass_map": [
1,
12
],
"init_data_prefix": "....../init/",
"init_data_sys": [
"CH4.POSCAR.01x01x01/02.md/sys-0004-0001/deepmd"
],
"init_batch_size": [
8
],
"sys_configs_prefix": "....../init/",
"sys_configs": [
[
"CH4.POSCAR.01x01x01/01.scale_pert/sys-0004-0001/scale*/00000*/POSCAR"
],
[
"CH4.POSCAR.01x01x01/01.scale_pert/sys-0004-0001/scale*/00001*/POSCAR"
]
],
"sys_batch_size": [
8,
8,
8,
8
],
"_comment": " that's all ",
"numb_models": 4,
"train_param": "input.json",
"default_training_param": {
"_comment": "that's all",
"use_smooth": true,
"sel_a": [
16,
4
],
"rcut_smth": 0.5,
"rcut": 5,
"filter_neuron": [
10,
20,
40
],
"filter_resnet_dt": false,
"n_axis_neuron": 12,
"n_neuron": [
100,
100,
100
],
"resnet_dt": true,
"coord_norm": true,
"type_fitting_net": false,
"systems": [],
"set_prefix": "set",
"stop_batch": 40000,
"batch_size": 1,
"start_lr": 0.001,
"decay_steps": 200,
"decay_rate": 0.95,
"seed": 0,
"start_pref_e": 0.02,
"limit_pref_e": 2,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0.0,
"limit_pref_v": 0.0,
"disp_file": "lcurve.out",
"disp_freq": 1000,
"numb_test": 4,
"save_freq": 1000,
"save_ckpt": "model.ckpt",
"load_ckpt": "model.ckpt",
"disp_training": true,
"time_training": true,
"profiling": false,
"profiling_file": "timeline.json"
},
"model_devi_dt": 0.002,
"model_devi_skip": 0,
"model_devi_f_trust_lo": 0.05,
"model_devi_f_trust_hi": 0.15,
"model_devi_clean_traj": true,
"model_devi_jobs": [
{
"sys_idx": [
0
],
"temps": [
100
],
"press": [
1.0
],
"trj_freq": 10,
"nsteps": 300,
"ensemble": "nvt",
"_idx": "00"
},
{
"sys_idx": [
1
],
"temps": [
100
],
"press": [
1.0
],
"trj_freq": 10,
"nsteps": 3000,
"ensemble": "nvt",
"_idx": "01"
}
],
"fp_style": "vasp",
"shuffle_poscar": false,
"fp_task_max": 20,
"fp_task_min": 1,
"fp_pp_path": "....../methane/",
"fp_pp_files": [
"POTCAR"
],
"fp_incar": "....../INCAR_methane"
}
The following table gives explicit descriptions on keys in PARAM
.
The bold notation of key (such aas type_map) means that it's a necessary key.
Key | Type | Example | Discription |
---|---|---|---|
#Basics | |||
type_map | List of string | ["H", "C"] | Atom types |
mass_map | List of float | [1, 12] | Standard atom weights. |
use_ele_temp | int | 0 | Currently only support fp_style vasp. 0(default): no electron temperature. 1: eletron temperature as frame parameter. 2: electron temperature as atom parameter. |
#Data | |||
init_data_prefix | String | "/sharedext4/.../data/" | Prefix of initial data directories |
init_data_sys | List of string | ["CH4.POSCAR.01x01x01/.../deepmd"] | Directories of initial data. You may use either absolute or relative path here. |
sys_format | String | "vasp/poscar" | Format of initial data. It will be vasp/poscar if not set. |
init_multi_systems | Boolean | false | If set to true , init_data_sys directories should contain sub-directories of various systems. DP-GEN will regard all of these sub-directories as inital data systems. |
init_batch_size | String of integer | [8] | Each number is the batch_size of corresponding system for training in init_data_sys . One recommended rule for setting the sys_batch_size and init_batch_size is that batch_size mutiply number of atoms ot the stucture should be larger than 32. If set to auto , batch size will be 32 divided by number of atoms. |
sys_configs_prefix | String | "/sharedext4/.../data/" | Prefix of sys_configs |
sys_configs | List of list of string | [ ["/sharedext4/.../POSCAR"], ["....../POSCAR"] ] |
Containing directories of structures to be explored in iterations.Wildcard characters are supported here. |
sys_batch_size | List of integer | [8, 8] | Each number is the batch_size for training of corresponding system in sys_configs . If set to auto , batch size will be 32 divided by number of atoms. |
#Training | |||
numb_models | Integer | 4 (recommend) | Number of models to be trained in 00.train . |
training_iter0_model_path | list of string | ["/path/to/model0_ckpt/", ...] | The model used to init the first iter training. Number of element should be equal to numb_models |
training_init_model | bool | False | Iteration > 0, the model parameters will be initilized from the model trained at the previous iteration. Iteration == 0, the model parameters will be initialized from training_iter0_model_path . |
default_training_param | Dict | { ... "use_smooth": true, "sel_a": [16, 4], "rcut_smth": 0.5, "rcut": 5, "filter_neuron": [10, 20, 40], ... } |
Training parameters for deepmd-kit in 00.train . You can find instructions from here: (https://github.com/deepmodeling/deepmd-kit).. We commonly let stop_batch = 200 * decay_steps . |
#Exploration | |||
model_devi_dt | Float | 0.002 (recommend) | Timestep for MD |
model_devi_skip | Integer | 0 | Number of structures skipped for fp in each MD |
model_devi_f_trust_lo | Float | 0.05 | Lower bound of forces for the selection. |
model_devi_f_trust_hi | Float | 0.15 | Upper bound of forces for the selection |
model_devi_e_trust_lo | Float | 1e10 | Lower bound of energies for the selection. Recommend to set them a high number, since forces provide more precise information. Special cases such as energy minimization may need this. |
model_devi_e_trust_hi | Float | 1e10 | Upper bound of energies for the selection. |
model_devi_clean_traj | Boolean | true | Deciding whether to clean traj folders in MD since they are too large. |
model_devi_nopbc | Boolean | False | Assume open boundary condition in MD simulations. |
model_devi_activation_func | List of String | ["tanh", "tanh", "tanh", "tanh"] | Set activation functions for models, length of the list should be the same as numb_models |
model_devi_jobs | [ { "sys_idx": [0], "temps": [100], "press": [1], "trj_freq": 10, "nsteps": 1000, "ensembles": "nvt" }, ... ] |
List of dict | Settings for exploration in 01.model_devi . Each dict in the list corresponds to one iteration. The index of model_devi_jobs exactly accord with index of iterations |
model_devi_jobs["sys_idx"] | List of integer | [0] | Systems to be selected as the initial structure of MD and be explored. The index corresponds exactly to the sys_configs . |
model_devi_jobs["temps"] | List of integer | [50, 300] | Temperature (K) in MD |
model_devi_jobs["press"] | List of integer | [1,10] | Pressure (Bar) in MD |
model_devi_jobs["trj_freq"] | Integer | 10 | Frequecy of trajectory saved in MD. |
model_devi_jobs["nsteps"] | Integer | 3000 | Running steps of MD. |
model_devi_jobs["ensembles"] | String | "nvt" | Determining which ensemble used in MD, options include “npt” and “nvt”. |
model_devi_jobs["neidelay"] | Integer | "10" | delay building until this many steps since last build |
model_devi_jobs["taut"] | Float | "0.1" | Coupling time of thermostat (ps) |
model_devi_jobs["taup"] | Float | "0.5" | Coupling time of barostat (ps) |
#Labeling | |||
fp_style | string | "vasp" | Software for First Principles. Options include “vasp”, “pwscf”, “siesta” and “gaussian” up to now. |
fp_task_max | Integer | 20 | Maximum of structures to be calculated in 02.fp of each iteration. |
fp_task_min | Integer | 5 | Minimum of structures to calculate in 02.fp of each iteration. |
fp_accurate_threshold | Float | 0.9999 | If the accurate ratio is larger than this number, no fp calculation will be performed, i.e. fp_task_max = 0. |
fp_accurate_soft_threshold | Float | 0.9999 | If the accurate ratio is between this number and fp_accurate_threshold , the fp_task_max linearly decays to zero. |
fp_cluster_vacuum | Float | None | If the vacuum size is smaller than this value, this cluster will not be choosen for labeling |
fp_style == VASP | |||
fp_pp_path | String | "/sharedext4/.../ch4/" | Directory of psuedo-potential file to be used for 02.fp exists. |
fp_pp_files | List of string | ["POTCAR"] | Psuedo-potential file to be used for 02.fp. Note that the order of elements should correspond to the order in type_map . |
fp_incar | String | "/sharedext4/../ch4/INCAR" | Input file for VASP. INCAR must specify KSPACING and KGAMMA. |
fp_aniso_kspacing | List of integer | [1.0,1.0,1.0] | Set anisotropic kspacing. Usually useful for 1-D or 2-D materials. Only support VASP. If it is setting the KSPACING key in INCAR will be ignored. |
cvasp | Boolean | true | If cvasp is true, DP-GEN will use Custodian to help control VASP calculation. |
fp_style == Gaussian | |||
use_clusters | Boolean | false | If set to true , clusters will be taken instead of the whole system. This option does not work with DeePMD-kit 0.x. |
cluster_cutoff | Float | 3.5 | The cutoff radius of clusters if use_clusters is set to true . |
fp_params | Dict | Parameters for Gaussian calculation. | |
fp_params["keywords"] | String or list | "mn15/6-31g** nosymm scf(maxcyc=512)" | Keywords for Gaussian input. |
fp_params["multiplicity"] | Integer or String | 1 | Spin multiplicity for Gaussian input. If set to auto , the spin multiplicity will be detected automatically. If set to frag , the "fragment=N" method will be used. |
fp_params["nproc"] | Integer | 4 | The number of processors for Gaussian input. |
fp_style == siesta | |||
use_clusters | Boolean | false | If set to true , clusters will be taken instead of the whole system. This option does not work with DeePMD-kit 0.x. |
cluster_cutoff | Float | 3.5 | The cutoff radius of clusters if use_clusters is set to true . |
fp_params | Dict | Parameters for siesta calculation. | |
fp_params["ecut"] | Integer | 300 | Define the plane wave cutoff for grid. |
fp_params["ediff"] | Float | 1e-4 | Tolerance of Density Matrix. |
fp_params["kspacing"] | Float | 0.4 | Sample factor in Brillouin zones. |
fp_params["mixingweight"] | Float | 0.05 | Proportion a of output Density Matrix to be used for the input Density Matrix of next SCF cycle (linear mixing). |
fp_params["NumberPulay"] | Integer | 5 | Controls the Pulay convergence accelerator. |
fp_style == cp2k | |||
fp_params | Dict | Parameters for cp2k calculation. find detail in manual.cp2k.org. only the kind section must be set before use. we assume that you have basic knowledge for cp2k input. |
Converting cp2k input is very simple as dictionary used to dpgen input. You just need follow some simple rule:
- kind section parameter must be provide
- replace
keyword
in cp2k askeyword
in dict. - replace
keyword parameter
in cp2k asvalue
in dict. - replace
section name
in cp2k askeyword
in dict. . The corresponding value is adict
. - repalce
section parameter
in cp2k asvalue
with dict. keyword"_"
repeat section
in cp2k just need to be written once with repeat parameter as list.
If you want to use your own paramter, just write a corresponding dictionary. The COORD
section will be filled by dpgen automatically, therefore do not include this in dictionary. The OT
or Diagonalization
section is require for semiconductor or metal system. For specific example, have a look on example
directory.
Here are examples for setting:
#minimal information you should provide for input
#other we have set other parameters in code, if you want to
#use your own paramter, just write a corresponding dictionary
"user_fp_params": {
"FORCE_EVAL":{
"DFT":{
"BASIS_SET_FILE_NAME": "path",
"POTENTIAL_FILE_NAME": "path",
"SCF":{
"OT":{ "keyword":"keyword parameter", "keyword2":"keyword parameter" }
}
}
"SUBSYS":{
"KIND":{
"_": ["N","C","H"],
"POTENTIAL": ["GTH-PBE-q5","GTH-PBE-q4", "GTH-PBE-q1"],
"BASIS_SET": ["DZVP-MOLOPT-GTH","DZVP-MOLOPT-GTH","DZVP-MOLOPT-GTH"]
}
}
}
}
At this step, we assume that you have prepared some graph files like graph.*.pb
and the particular pseudopotential POTCAR
.
The main code of this step is
dpgen test PARAM MACHINE
where PARAM
and MACHINE
are both json files. MACHINE
is the same as above.
The whole program contains a series of tasks shown as follows. In each task, there are three stages of work, generate, run and compute.
-
00.equi
:(default task) the equilibrium state -
01.eos
: the equation of state -
02.elastic
: the elasticity like Young's module -
03.vacancy
: the vacancy formation energy -
04.interstitial
: the interstitial formation energy -
05.surf
: the surface formation energy
Dpgen auto_test will auto make dir for each task it tests, the dir name is the same as the dir name. And the test results will in a plain text file named result. For example cat ./01.eos/Al/std-fcc/deepmd/result
We take Al as an example to show the parameter settings of param.json
.
The first part is the fundamental setting for particular alloy system.
"_comment": "models",
"potcar_map" : {
"Al" : "/somewhere/POTCAR"
},
"conf_dir":"confs/Al/std-fcc",
"key_id":"API key of Material project",
"task_type":"deepmd",
"task":"eos",
You need to add the specified paths of necessary POTCAR
files in "potcar_map". The different POTCAR
paths are separated by commas.
Then you also need to add the folder path of particular configuration, which contains POSCAR
file.
"confs/[element or alloy]/[std-* or mp-**]"
std-*: standard structures, * can be fcc, bcc, hcp and so on.
mp-**: ** means Material id from Material Project.
Usually, if you add the relative path of POSCAR as the above format,
dpgen test
will check the existence of such file and automatically downloads the standard and existed configurations of the given element or alloy from Materials Project and stores them in confs folder, which needs the API key of Materials project.
task_type
contains 3 optional types for testing, i.e. vasp, deepmd and meam.task
contains 7 options, equi, eos, elastic, vacancy, interstitial, surf and all. The option all can do all the tasks.
It is worth noting that the subsequent tasks need to rely on the calculation results of the equilibrium state, so it is necessary to give priority to the calculation of the equilibrium state while testing. And due to the stable consideration, we recommand you to test the equilibrium state of vasp before other tests.
The second part is the computational settings for vasp and lammps. According to your actual needs, you can choose to add the paths of specific INCAR or use the simplified INCAR by setting vasp_params
. The priority of specified INCAR is higher than using vasp_params
. The most important setting is to add the folder path model_dir
of deepmd model and supply the corresponding element type map. Besides, dpgen test
also is able to call common lammps packages, such as meam.
"relax_incar":"somewhere/relax_incar",
"scf_incar":"somewhere/scf_incar",
"vasp_params": {
"ecut": 650,
"ediff": 1e-6,
"kspacing": 0.1,
"kgamma": false,
"npar": 1,
"kpar": 1,
"_comment": " that's all "
},
"lammps_params": {
"model_dir":"somewhere/example/Al_model",
"type_map":["Al"],
"model_name":false,
"model_param_type":false
},
The last part is the optional settings for various tasks mentioned above. You can change the parameters according to actual needs.
param.json in a dictionary.
Fields | Type | Example | Discription |
---|---|---|---|
potcar_map | dict | {"Al": "example/POTCAR"} | a dict like { "element" : "position of POTCAR" } |
conf_dir | path like string | "confs/Al/std-fcc" | the dir which contains vasp's POSCAR |
key_id | string | "DZIwdXCXg1fiXXXXXX" | the API key of Material project |
task_type | string | "vasp" | task type, one of deepmd vasp meam |
task | string or list | "equi" | task, one or several tasks from { equi, eos, elastic, vacancy, interstitial, surf } or all stands for all tasks |
vasp_params | dict | seeing below | params relating to vasp INCAR |
lammps_params | dict | seeing below | params relating to lammps |
The keys in param["vasp_params"] is shown below.
Fields | Type | Example | Discription |
---|---|---|---|
ecut | real number | 650 | the plane wave cutoff for grid. |
ediff | real number | 1e-6 | Tolerance of Density Matrix |
kspacing | real number | 0.1 | Sample factor in Brillouin zones |
kgamma | boolen | false | whether generate a Gamma centered grid |
npar | positive integer | 1 | the number of k-points that are to be treated in parallel |
kpar | positive integer | 1 | the number of bands that are treated in parallel |
the keys in param["lammps_params"].
Key | Type | Example | Discription |
---|---|---|---|
model_dir | path like string | "example/Al_model" | the model dir which contains .pb file |
type_map | list of string | ["Al"] | a list contains the element, usually useful for multiple element situation |
model_name | boolean | false | |
model_param_type | boolean | false |
"_comment":"00.equi",
"store_stable":true,
store_stable
:(boolean) whether to store the stable energy and volume
param.json.
Field | Type | Example | Discription |
---|---|---|---|
EpA(eV) | real number | -3.7468 | the potential energy of a atom |
VpA(A^3) | real number | 16.511 | theEquilibrium volume of a atom |
test results
conf_dir: EpA(eV) VpA(A^3)
confs/Al/std-fcc -3.7468 16.511
Field | Type | Example | Discription |
---|---|---|---|
EpA(eV) | real number | -3.7468 | the potential energy of a atom |
VpA(A^3) | real number | 16.511 | theEquilibrium volume of a atom |
"_comment": "01.eos",
"vol_start": 12,
"vol_end": 22,
"vol_step": 0.5,
vol_start
,vol_end
andvol_step
determine the volumetric range and accuracy of the eos.
test results
conf_dir:confs/Al/std-fcc
VpA(A^3) EpA(eV)
15.500 -3.7306
16.000 -3.7429
16.500 -3.7468
17.000 -3.7430
Field | Type | Example | Discription |
---|---|---|---|
EpA(eV) | list of real number | [15.5,16.0,16.5,17.0] | the potential energy of a atom in quilibrium state |
VpA(A^3) | list of real number | [-3.7306, -3.7429, -3.746762, -3.7430] | the equilibrium volume of a atom |
"_comment": "02.elastic",
"norm_deform": 2e-2,
"shear_deform": 5e-2,
norm_deform
andshear_deform
are the scales of material deformation. This task uses the stress-strain relationship to calculate the elastic constant.
Key | Type | Example | Discription |
---|---|---|---|
norm_deform | real number | 0.02 | uniaxial deformation range |
shear_deform | real number | 0.05 | shear deformation range |
test results
conf_dir:confs/Al/std-fcc
130.50 57.45 54.45 4.24 0.00 0.00
57.61 130.31 54.45 -4.29 -0.00 -0.00
54.48 54.48 133.32 -0.00 -0.00 -0.00
4.49 -4.02 -0.89 33.78 0.00 -0.00
-0.00 -0.00 -0.00 -0.00 33.77 4.29
0.00 -0.00 -0.00 -0.00 4.62 36.86
# Bulk Modulus BV = 80.78 GPa
# Shear Modulus GV = 36.07 GPa
# Youngs Modulus EV = 94.19 GPa
# Poission Ratio uV = 0.31
Field | Type | Example | Discription |
---|---|---|---|
elastic module(GPa) | 6*6 matrix of real number | [[130.50 57.45 54.45 4.24 0.00 0.00] [57.61 130.31 54.45 -4.29 -0.00 -0.00] [54.48 54.48 133.32 -0.00 -0.00 -0.00] [4.49 -4.02 -0.89 33.78 0.00 -0.00] [-0.00 -0.00 -0.00 -0.00 33.77 4.29] [0.00 -0.00 -0.00 -0.00 4.62 36.86]] | Voigt-notation elastic module;sequence of row and column is (xx, yy, zz, yz, zx, xy) |
bulk modulus(GPa) | real number | 80.78 | bulk modulus |
shear modulus(GPa) | real number | 36.07 | shear modulus |
Youngs Modulus(GPa) | real number | 94.19 | Youngs Modulus |
Poission Ratio | real number | 0.31 | Poission Ratio |
"_comment":"03.vacancy",
"supercell":[3,3,3],
supercell
:(list of integer) the supercell size used to generate vacancy defect and interstitial defect
Key | Type | Example | Discription |
---|---|---|---|
supercell | list of integer | [3,3,3] | the supercell size used to generate vacancy defect and interstitial defect |
test result
conf_dir:confs/Al/std-fcc
Structure: Vac_E(eV) E(eV) equi_E(eV)
struct-3x3x3-000: 0.859 -96.557 -97.416
Field | Type | Example | Discription |
---|---|---|---|
Structure | list of string | ['struct-3x3x3-000'] | structure name |
Vac_E(eV) | real number | 0.723 | the vacancy formation energy |
E(eV) | real number | -96.684 | potential energy of the vacancy configuration |
equi_E(eV) | real number | -97.407 | potential energy of the equilibrium state |
"_comment":"04.interstitial",
"insert_ele":["Al"],
"reprod-opt":false,
insert_ele
:(list of string) the elements used to generate point interstitial defectrepord-opt
:(boolean) whether to reproduce trajectories of interstitial defect
Key | Type | Example | Discription |
---|---|---|---|
insert_ele | list of string | ["Al"] | the elements used to generate point interstitial defect |
reprod-opt | boolean | false | whether to reproduce trajectories of interstitial defect |
test result
conf_dir:confs/Al/std-fcc
Insert_ele-Struct: Inter_E(eV) E(eV) equi_E(eV)
struct-Al-3x3x3-000: 3.919 -100.991 -104.909
struct-Al-3x3x3-001: 2.681 -102.229 -104.909
Field | Type | Example | Discription |
---|---|---|---|
Structure | string | 'struct-Al-3x3x3-000' | structure name |
Inter_E(eV) | real number | 0.723 | the interstitial formation energy |
E(eV) | real number | -96.684 | potential energy of the interstitial configuration |
equi_E(eV) | real number | -97.407 | potential energy of the equilibrium state |
"_comment": "05.surface",
"min_slab_size": 10,
"min_vacuum_size": 11,
"_comment": "pert xz to work around vasp bug...",
"pert_xz": 0.01,
"max_miller": 2,
"static-opt":false,
"relax_box":false,
min_slab_size
andmin_vacuum_size
are the minimum size of slab thickness and the vacuume width.pert_xz
is the perturbation through xz direction used to compute surface energy.max_miller
(integer) is the maximum miller indexstatic-opt
:(boolean) whether to use atomic relaxation to compute surface energy. if false, the structure will be relaxed.relax_box
:(boolean) set true if the box is relaxed, otherwise only relax atom positions.
Key | Type | Example | Discription |
---|---|---|---|
min_slab_size | real number | 10 | the minimum size of slab thickness |
min_vacuum_size | real number | 11 | the minimum size of the vacuume width |
pert_xz | real number | 0.01 | the perturbation through xz direction used to compute surface energy |
max_miller | integer | 2 | the maximum miller index |
static-opt | boolean | false | whether to use atomic relaxation to compute surface energy. if false, the structure will be relaxed. |
relax_box | boolean | false | set true if the box is relaxed, otherwise only relax atom positions |
test result
conf_dir:confs/Al/std-fcc
Miller_Indices: Surf_E(J/m^2) EpA(eV) equi_EpA(eV)
struct-000-m1.1.1m: 0.673 -3.628 -3.747
struct-001-m2.2.1m: 0.917 -3.592 -3.747
Field | Type | Example | Discription |
---|---|---|---|
Miller_Indices | string | struct-000-m1.1.1m | Miller Indices |
Surf_E(J/m^2) | real number | 0.673 | the surface formation energy |
EpA(eV) | real number | -3.628 | potential energy of the surface configuration |
equi_EpA | real number | -3.747 | potential energy of the equilibrium state |
To know what actually will dpgen autotest do, including the lammps and vasp script, the input file and atom configuration file auto_test will generate, please refer to https://hackmd.io/@yeql5ephQLaGJGgFgpvIDw/rJY1FO92B
When you have a dataset containing lots of repeated data, this step will help you simplify your dataset. The workflow contains three stages: train, model_devi, and fp. The train stage and the fp stage are as the same as the run step, and the model_devi stage will calculate model deviations of the rest data that has not been confirmed accurate. Data with small model deviations will be confirmed accurate, while the program will pick data from those with large model deviations to the new dataset.
Use the following script to start the workflow:
dpgen simplify param.json machine.json
Here is an example of param.json
for QM7 dataset:
{
"type_map": [
"C",
"H",
"N",
"O",
"S"
],
"mass_map": [
12.011,
1.008,
14.007,
15.999,
32.065
],
"pick_data": "/scratch/jz748/simplify/qm7",
"init_data_prefix": "",
"init_data_sys": [],
"sys_batch_size": [
"auto"
],
"numb_models": 4,
"train_param": "input.json",
"default_training_param": {
"model": {
"type_map": [
"C",
"H",
"N",
"O",
"S"
],
"descriptor": {
"type": "se_a",
"sel": [
7,
16,
3,
3,
1
],
"rcut_smth": 1.00,
"rcut": 6.00,
"neuron": [
25,
50,
100
],
"resnet_dt": false,
"axis_neuron": 12
},
"fitting_net": {
"neuron": [
240,
240,
240
],
"resnet_dt": true
}
},
"learning_rate": {
"type": "exp",
"start_lr": 0.001,
"decay_steps": 10,
"decay_rate": 0.99
},
"loss": {
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0,
"start_pref_pf": 0,
"limit_pref_pf": 0
},
"training": {
"set_prefix": "set",
"stop_batch": 10000,
"disp_file": "lcurve.out",
"disp_freq": 1000,
"numb_test": 1,
"save_freq": 1000,
"save_ckpt": "model.ckpt",
"load_ckpt": "model.ckpt",
"disp_training": true,
"time_training": true,
"profiling": false,
"profiling_file": "timeline.json"
},
"_comment": "that's all"
},
"use_clusters": true,
"fp_style": "gaussian",
"shuffle_poscar": false,
"fp_task_max": 1000,
"fp_task_min": 10,
"fp_pp_path": "/home/jzzeng/",
"fp_pp_files": [],
"fp_params": {
"keywords": "mn15/6-31g** force nosymm scf(maxcyc=512)",
"nproc": 28,
"multiplicity": 1,
"_comment": " that's all "
},
"init_pick_number":100,
"iter_pick_number":100,
"e_trust_lo":1e10,
"e_trust_hi":1e10,
"f_trust_lo":0.25,
"f_trust_hi":0.45,
"_comment": " that's all "
}
Here pick_data
is the data to simplify and currently only supports MultiSystems
containing System
with deepmd/npy
format, and use_clusters
should always be true
. init_pick_number
and iter_pick_number
are the numbers of picked frames. e_trust_lo
, e_trust_hi
mean the range of the deviation of the frame energy, and f_trust_lo
and f_trust_hi
mean the range of the max deviation of atomic forces in a frame. fp_style
can only be gaussian
currently. Other parameters are as the same as those of generator.
When switching into a new machine, you may modifying the MACHINE
, according to the actual circumstance. Once you have finished, the MACHINE
can be re-used for any DP-GEN tasks without any extra efforts.
An example for MACHINE
is:
{
"train": [
{
"machine": {
"machine_type": "slurm",
"hostname": "localhost",
"port": 22,
"username": "Angus",
"work_path": "....../work"
},
"resources": {
"numb_node": 1,
"numb_gpu": 1,
"task_per_node": 4,
"partition": "AdminGPU",
"exclude_list": [],
"source_list": [
"....../train_tf112_float.env"
],
"module_list": [],
"time_limit": "23:0:0",
"qos": "data"
},
"deepmd_path": "....../tf1120-lowprec"
}
],
"model_devi": [
{
"machine": {
"machine_type": "slurm",
"hostname": "localhost",
"port": 22,
"username": "Angus",
"work_path": "....../work"
},
"resources": {
"numb_node": 1,
"numb_gpu": 1,
"task_per_node": 2,
"partition": "AdminGPU",
"exclude_list": [],
"source_list": [
"......./lmp_tf112_float.env"
],
"module_list": [],
"time_limit": "23:0:0",
"qos": "data"
},
"command": "lmp_serial",
"group_size": 1
}
],
"fp": [
{
"machine": {
"machine_type": "slurm",
"hostname": "localhost",
"port": 22,
"username": "Angus",
"work_path": "....../work"
},
"resources": {
"task_per_node": 4,
"numb_gpu": 1,
"exclude_list": [],
"with_mpi": false,
"source_list": [],
"module_list": [
"mpich/3.2.1-intel-2017.1",
"vasp/5.4.4-intel-2017.1",
"cuda/10.1"
],
"time_limit": "120:0:0",
"partition": "AdminGPU",
"_comment": "that's All"
},
"command": "vasp_gpu",
"group_size": 1
}
]
}
Following table illustrates which key is needed for three types of machine: train
,model_devi
and fp
. Each of them is a list of dicts. Each dict can be considered as an independent environmnet for calculation.
Key | train |
model_devi |
fp |
---|---|---|---|
machine | NEED | NEED | NEED |
resources | NEED | NEED | NEED |
deepmd_path | NEED | ||
command | NEED | NEED | |
group_size | NEED | NEED |
The following table gives explicit descriptions on keys in param.json.
Key | Type | Example | Discription |
---|---|---|---|
deepmd_path | String | "......tf1120-lowprec" | Installed directory of DeepMD-Kit 0.x, which should contain bin lib include . |
python_path | String | "....../python3.6/bin/python" | Python path for DeePMD-kit 1.x installed. This option should not be used with deepmd_path together. |
machine | Dict | Settings of the machine for TASK. | |
resources | Dict | Resources needed for calculation. | |
# Followings are keys in resources | |||
numb_node | Integer | 1 | Node count required for the job |
task_per_node | Integer | 4 | Number of CPU cores required |
numb_gpu | Integer | Integer | 4 |
manual_cuda_devices | Interger | 1 | Used with key "manual_cuda_multiplicity" specify the gpu number |
manual_cuda_multiplicity | Interger | 5 | Used in 01.model_devi,used with key "manual_cuda_devices" specify the MD program number running on one GPU at the same time,dpgen will automatically allocate MD jobs on different GPU. This can improve GPU usage for GPU like V100. |
node_cpu | Integer | 4 | Only for LSF. The number of CPU cores on each node that should be allocated to the job. |
source_list | List of string | "....../vasp.env" | Environment needed for certain job. For example, if "env" is in the list, 'source env' will be written in the script. |
module_list | List of string | [ "Intel/2018", "Anaconda3"] | For example, If "Intel/2018" is in the list, "module load Intel/2018" will be written in the script. |
partition | String | "AdminGPU" | Partition / queue in which to run the job. |
time_limit | String (time format) | 23:00:00 | Maximal time permitted for the job |
mem_limit | Interger | 16 | Maximal memory permitted to apply for the job. |
with_mpi | Boolean | true | Deciding whether to use mpi for calculation. If it's true and machine type is Slurm, "srun" will be prefixed to command in the script. |
qos | "string" | "bigdata" | Deciding priority, dependent on particular settings of your HPC. |
allow_failure | Boolean | false | Allow the command to return a non-zero exit code. |
# End of resources | |||
command | String | "lmp_serial" | Executable path of software, such as lmp_serial , lmp_mpi and vasp_gpu , vasp_std , etc. |
group_size | Integer | 5 | DP-GEN will put these jobs together in one submitting script. |
-
The most common problem is whether two settings correspond with each other, including:
- The order of elements in
type_map
andmass_map
andfp_pp_files
. - Size of
init_data_sys
andinit_batch_size
. - Size of
sys_configs
andsys_batch_size
. - Size of
sel_a
and actual types of atoms in your system. - Index of
sys_configs
andsys_idx
- The order of elements in
-
Please verify the directories of
sys_configs
. If there isnt's any POSCAR for01.model_devi
in one iteration, it may happen that you write the false path ofsys_configs
. -
Correct format of JSON file.
-
In
02.fp
, total cores you require throughtask_per_node
should be devided bynpar
timeskpar
. -
The frames of one system should be larger than
batch_size
andnumb_test
indefault_training_param
. It happens that one iteration adds only a few structures and causes error in next iteration's training. In this condition, you may letfp_task_min
be larger thannumb_test
.
The project dpgen is licensed under GNU LGPLv3.0.