Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lvl_f_lo and Outputs-model_devi show nan during dpgen2 run #276

Open
Andy6M opened this issue Nov 21, 2024 · 4 comments
Open

lvl_f_lo and Outputs-model_devi show nan during dpgen2 run #276

Andy6M opened this issue Nov 21, 2024 · 4 comments

Comments

@Andy6M
Copy link

Andy6M commented Nov 21, 2024

Issue Description

Dear developer, I encountered an issue while running dpgen2, where the output of dpgen2 status shows lvl_f_lo as nan. Additionally, the Outputs-model_devi file contains nan values for all relevant fields. Below are the details:


1. dpgen2 status Output

100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 13.33it/s]
#   stage  id_stg.    iter.      accu.      cand.      fail.   lvl_f_lo lvl_f_hi
# Stage    0  --------------------
        0        0        0     0.7646     0.1323     0.1032     0.2118   0.5000
        0        1        1     0.8519     0.1481     0.0000        nan   0.5000
        0        2        2     0.8519     0.1481     0.0000        nan   0.5000

2. Checking Output Files

  • Outputs-traj:
    The file /iter-000001--run-lmp-group/Outputs-traj contains valid data. Sample content:
ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
184
ITEM: BOX BOUNDS xy xz yz pp pp pp
...
  • Outputs-log:
    The log file also shows normal output:
LAMMPS (29 Aug 2024)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
...

3. Outputs-model_devi Contains nan

However, in the file /iter-000001--run-lmp-group/Outputs-model_devi, the avg_devi_f, max_devi_f, and min_devi_f fields all contain nan:

#       step         max_devi_v         min_devi_v         avg_devi_v         max_devi_f         min_devi_f         avg_devi_f
           0       0.000000e+00      1.797693e+308               -nan               -nan               -nan               -nan
          50       0.000000e+00      1.797693e+308               -nan               -nan               -nan               -nan
         100       0.000000e+00      1.797693e+308               -nan               -nan               -nan               -nan
...

Expected Behavior

The lvl_f_lo value and the fields in Outputs-model_devi should not contain nan.


Steps to Reproduce

  1. Run the dpgen2 workflow with the provided input files.
  2. Observe the dpgen2 status output and check the corresponding output files (Outputs-traj, Outputs-log, Outputs-model_devi).

Environment

  • dpgen2 version: 0.0.8.dev138+g2877e2f
  • DeepMD-kit version: 3.0.0b4
  • Operating website: Bohrium
  • Hardware: Bohrium V100*1

Thank you!


@wanghan-iapcm
Copy link

could you please check if the very first configuration of the trajectory is a valid configuration?
The model deviation of the first conf is 1.797693e+308 , which is unusual.

@Andy6M
Copy link
Author

Andy6M commented Nov 22, 2024

Thank you for your response.

I checked the first configuration in iter-000000--run-lmp-000000 and examined the model_devi output. It appears that the min_devi_v values from all 17 LAMMPS runs are around the range of 1e-3 to 1e-2, and there are no significantly unusual deviations. Below is the relevant data from the model_devi file:

#       step         max_devi_v         min_devi_v         avg_devi_v         max_devi_f         min_devi_f         avg_devi_f
           0       2.813274e-02       1.236430e-03       1.493753e-02       1.294953e-01       1.464195e-02       5.355157e-02
          50       2.965354e-02       1.368058e-03       1.590805e-02       1.759406e-01       1.495048e-02       5.780621e-02
         100       2.838238e-02       8.987198e-04       1.456293e-02       3.704968e-01       1.995545e-02       6.168902e-02
         150       2.898888e-02       5.551084e-04       1.451846e-02       1.926066e-01       1.942737e-02       6.380191e-02
         200       2.859164e-02       1.362606e-03       1.502724e-02       1.506470e-01       1.966401e-02       6.074587e-02
         250       2.638238e-02       1.067859e-03       1.498770e-02       1.662188e-01       1.530976e-02       6.100733e-02
         300       3.096493e-02       1.346518e-03       1.662259e-02       1.287707e-01       9.415163e-03       6.065727e-02
         350       3.007970e-02       1.036756e-03       1.538427e-02       1.285052e-01       1.564005e-02       5.893896e-02
         400       2.847960e-02       1.593137e-03       1.575716e-02       1.448052e-01       2.011331e-02       5.807221e-02
         450       2.936185e-02       9.937412e-04       1.520512e-02       2.149763e-01       1.711258e-02       5.693326e-02
         500       2.865296e-02       1.305567e-03       1.580815e-02       1.537033e-01       1.503539e-02       5.641226e-02
         550       2.928813e-02       1.338427e-03       1.549072e-02       1.700321e-01       1.741005e-02       5.824711e-02
         600       3.045114e-02       1.881260e-03       1.695844e-02       1.468976e-01       1.580279e-02       6.133078e-02
         650       3.099498e-02       1.979718e-03       1.680618e-02       1.421608e-01       2.160320e-02       6.181459e-02
         700       3.274744e-02       1.210968e-03       1.797129e-02       1.831628e-01       1.325974e-02       6.625343e-02
         750       3.213804e-02       9.621239e-04       1.657447e-02       1.631825e-01       1.223954e-02       6.317999e-02
         800       2.790166e-02       6.471979e-04       1.548984e-02       1.701019e-01       1.454535e-02       6.039472e-02
         850       3.102427e-02       6.628643e-04       1.640118e-02       1.452935e-01       1.031704e-02       5.756204e-02
         900       3.017256e-02       1.184238e-03       1.628572e-02       1.617844e-01       1.736325e-02       5.396886e-02
         950       2.824300e-02       2.645454e-03       1.514339e-02       1.435542e-01       1.841572e-02       5.954156e-02
        1000       3.003184e-02       2.018773e-03       1.537587e-02       1.661056e-01       1.105155e-02       5.869294e-02

These values seem consistent and do not show any unusual spikes or extreme outliers. Let me know if there’s anything else I should check or if you need additional information.

Thank you for your kindness, Dr Wang!

@wanghan-iapcm
Copy link

iteration 0 looks great and the issue happens at iteration 1.
please check the quality of the model trained at iteration 1 and the initial configuration used in the iteration 1 MD simulations.

@Andy6M
Copy link
Author

Andy6M commented Nov 26, 2024

Thank you for your response.

I checked the configuration in iter-000001--run-lmp-000000 and examined the model_devi output. Below is the relevant data from the model_devi file:

#       step         max_devi_v         min_devi_v         avg_devi_v         max_devi_f         min_devi_f         avg_devi_f
           0       1.767406e-02       1.923665e-03       9.226556e-03       1.744407e-01       1.733599e-02       5.988462e-02
          50       1.710528e-02       8.148907e-04       9.708625e-03       1.470154e-01       2.247298e-02       6.133791e-02
         100       1.648531e-02       7.205604e-04       8.514945e-03       2.283003e-01       1.731160e-02       6.284691e-02
         150       1.401277e-02       6.956906e-04       7.461165e-03       1.893883e-01       2.181478e-02       6.037625e-02
         200       1.122296e-02       8.039403e-04       6.388654e-03       1.278451e-01       2.261003e-02       6.208718e-02
         250       1.223884e-02       1.111902e-03       7.030710e-03       1.396129e-01       1.874105e-02       5.982687e-02
         300       1.274248e-02       4.881283e-04       7.156683e-03       1.538968e-01       1.926643e-02       5.871147e-02
         350       1.267785e-02       1.059288e-03       6.869723e-03       1.578075e-01       1.529583e-02       6.009399e-02
         400       1.606176e-02       9.227430e-04       8.678422e-03       1.700951e-01       1.348010e-02       6.259148e-02
         450       1.375797e-02       1.208132e-03       7.645918e-03       1.515705e-01       1.599934e-02       6.139468e-02
         500       1.452559e-02       1.445252e-03       8.073384e-03       2.422813e-01       2.254674e-02       6.374373e-02
         550       1.625274e-02       7.398480e-04       8.476174e-03       2.088750e-01       1.873348e-02       6.650365e-02
         600       1.597114e-02       1.153074e-03       9.110399e-03       2.131987e-01       2.254325e-02       6.565988e-02
         650       1.296594e-02       1.260483e-03       7.439792e-03       1.482121e-01       1.982959e-02       6.556800e-02
         700       1.314767e-02       1.855598e-03       7.715936e-03       1.393991e-01       2.082488e-02       6.037559e-02
         750       1.315291e-02       6.362476e-04       7.361491e-03       1.366818e-01       1.670856e-02       6.332969e-02
         800       1.396638e-02       8.883420e-04       7.119505e-03       1.791644e-01       1.593993e-02       6.285636e-02
         850       1.385305e-02       4.007663e-04       7.296780e-03       1.463808e-01       1.248193e-02       6.266973e-02
         900       1.382617e-02       1.950630e-03       7.863833e-03       1.453877e-01       2.297074e-02       6.257215e-02
         950       1.706155e-02       1.537314e-03       9.033908e-03       1.546805e-01       1.645491e-02       5.942675e-02
        1000       1.764165e-02       6.392657e-04       9.215594e-03       1.208138e-01       1.399952e-02       5.731977e-02

Thank you for your kindness, Dr Wang!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants