Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster structure matching via StructureMatcher (and by extension, group_structures) #2593

Open
sgbaird opened this issue Jul 29, 2022 · 6 comments

Comments

@sgbaird
Copy link

sgbaird commented Jul 29, 2022

Is your feature request related to a problem? Please describe.

Using StructureMatcher repeatedly causes a large overhead (in my use-case, ~40 hrs to check the performance of a generative benchmark via matbench-genmetrics).

Describe the solution you'd like
Explore speeding up bottlenecks in the StructureMatcher algorithm, pre-screening, leveraging GPU for distributed calculation of matches, etc.

Describe alternatives you've considered
These are in active discussion at sparks-baird/matbench-genmetrics#9, e.g. AFLOW's XtalFinder, dscribe kernels

Additional context
As a follow-up to @mkhorton 's sparks-baird/matbench-genmetrics#9 (comment), here are some flame graphs on repeated evaluations of StructureMatcher().fit(...):

%pip install matbench-genmetrics
from mp_time_split.utils.gen import DummyGenerator
from matbench_genmetrics.core import MPTSMetrics
from tqdm import tqdm

mptm = MPTSMetrics(dummy=True, verbose=False)
for fold in tqdm(mptm.folds):
    train_val_inputs = mptm.get_train_and_val_data(fold)

    dg = DummyGenerator()
    dg.fit(train_val_inputs)
    gen_structures = dg.gen(n=50)

    mptm.evaluate_and_record(fold, gen_structures)

print(mptm.recorded_metrics)

image

@mkhorton figured we could chat about bottlenecks and the potential for precomputing things and speedups. Here are some of my initial thoughts:

  • we should be able to run _preprocess on all structures once and short-circuit it during later match matrix calculations
  • precompute several common supercells for each structure and use a lookup instead of generating it on the fly. Not exactly clear to me how to implement that in:
    def sc_generator(s1, s2):
    s2_fc = np.array(s2.frac_coords)
    if fu == 1:
    cc = np.array(s1.cart_coords)
    for l, sc_m in self._get_lattices(s2.lattice, s1, fu):
    fc = l.get_fractional_coords(cc)
    fc -= np.floor(fc)
    yield fc, s2_fc, av_lat(l, s2.lattice), sc_m
    else:
    fc_init = np.array(s1.frac_coords)
    for l, sc_m in self._get_lattices(s2.lattice, s1, fu):
    fc = np.dot(fc_init, np.linalg.inv(sc_m))
    lp = lattice_points_in_supercell(sc_m)
    fc = (fc[:, None, :] + lp[None, :, :]).reshape((-1, 3))
    fc -= np.floor(fc)
    yield fc, s2_fc, av_lat(l, s2.lattice), sc_m
    if s1_supercell:
    for x in sc_generator(struct1, struct2):
    yield x
    else:
    for x in sc_generator(struct2, struct1):
    # reorder generator output so s1 is still first
    yield x[1], x[0], x[2], x[3]
  • _cart_dists is probably the easiest to replace or numba-fy, e.g. by implementing a jit-ed version of the Kabsch algorithm (@kjappelbaum)

Would be good to know how much memory is taken up by each calculation and whether or not these could be calculated in parallel on typical consumer GPUs.

Happy to move this to discussions if that's a better place.

@shyuep
Copy link
Member

shyuep commented Jul 29, 2022

Speed ups are always welcome. But I want to note a few things:

  1. Hashing is already done. E.g., if the fractional composition are not equal, two structures would immediately evaluate to not being equal without any matrix comparisons.
  2. Most of the code is already vectorized in numpy. Any GPU optimizations that work with numpy would work with structure matcher.

If any further optimizations are implemented, it should not be at the cost of code maintainability or support for simple single CPU machines.

I should add that as implemented, StructureMatcher is meant for simple one-off comparisons. When we actually run matching across large structure sets (e.g., entire ICSD), we use various tricks to speed it up. Pre-grouping by composition. Reducing to primitive cell. All these allow us to disable some of the more costly aspects of the general structure matcher, e.g., generating supercells. Of course, the only people who really care about performance in this regard are people working with large databases of materials. That's not a huge population to begin with.

@mkhorton
Copy link
Member

Thanks for creating this issue @sgbaird.

I agree with @shyuep's general points, but note that I don't think this is the easiest code to maintain as it is currently written either (eg, we do use the custom Cython linear assignment code that's quite difficult to understand), so I hope that there's good scope for improvements!

While large-scale usage may be comparatively rare, I certainly think the number of people needing to do large numbers of comparisons is increasing, and it's often been a bottleneck.

@mkhorton
Copy link
Member

For the specific suggestions:

we should be able to run _preprocess on all structures once and short-circuit it during later match matrix calculations

Yes, this seems like an obvious optimization if this is being repeated. It's probably not the most critical piece (since it only has to be done once for each structure) but perhaps get_niggli_reduced_lattice can be accelerated too.

precompute several common supercells for each structure and use a lookup instead of generating it on the fly.

Think we'd have to try this out to see how useful it'd be in practice; it's not obvious to me how many supercells are needed to be generated.

_cart_dists is probably the easiest to replace or numba-fy, e.g. by implementing a jit-ed version of the Kabsch algorithm

Agreed, I do want to point out this for examples of where else optimized code in pymatgen lives (this was for neighbor finding). I note that numba is both excellent and a really troublesome dependency, since it typically lags behind the latest numpy and Python versions, so might err towards a Cython solution if feasible. Perhaps a simpler numpy-vectorized solution might also be a better option -- as @shyuep notes this is done in several places already, but perhaps we've missed an opportunity.

@lan496
Copy link
Contributor

lan496 commented Aug 4, 2022

Let me mention that StructureMatcher.group_structures calls _preprocess for each structure only once (#2490).
So, it is a little improvement, but using StructureMatcher.group_structures will reduce _preprocess part of the flame graph from O(n^2) to O(n), and ~150% faster as a whole.

@mkhorton
Copy link
Member

mkhorton commented Aug 8, 2022

That's fantastic @lan496! Sometimes the key to big speed improvements is lots of "small" optimizations, this is definitely appreciated :)

@kavanase
Copy link
Contributor

kavanase commented Oct 9, 2024

Just to second the discussions here, any form of further optimisation for StructureMatcher (particularly the _cart_dists() function which tends to be the bottleneck for large cells -- Cython?) would be really amazing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants