Faster structure matching via `StructureMatcher` (and by extension, `group_structures`) #2593

sgbaird · 2022-07-29T22:19:38Z

Is your feature request related to a problem? Please describe.

Using StructureMatcher repeatedly causes a large overhead (in my use-case, ~40 hrs to check the performance of a generative benchmark via matbench-genmetrics).

Describe the solution you'd like
Explore speeding up bottlenecks in the StructureMatcher algorithm, pre-screening, leveraging GPU for distributed calculation of matches, etc.

Describe alternatives you've considered
These are in active discussion at sparks-baird/matbench-genmetrics#9, e.g. AFLOW's XtalFinder, dscribe kernels

Additional context
As a follow-up to @mkhorton 's sparks-baird/matbench-genmetrics#9 (comment), here are some flame graphs on repeated evaluations of StructureMatcher().fit(...):

%pip install matbench-genmetrics

from mp_time_split.utils.gen import DummyGenerator
from matbench_genmetrics.core import MPTSMetrics
from tqdm import tqdm

mptm = MPTSMetrics(dummy=True, verbose=False)
for fold in tqdm(mptm.folds):
    train_val_inputs = mptm.get_train_and_val_data(fold)

    dg = DummyGenerator()
    dg.fit(train_val_inputs)
    gen_structures = dg.gen(n=50)

    mptm.evaluate_and_record(fold, gen_structures)

print(mptm.recorded_metrics)

@mkhorton figured we could chat about bottlenecks and the potential for precomputing things and speedups. Here are some of my initial thoughts:

we should be able to run _preprocess on all structures once and short-circuit it during later match matrix calculations

precompute several common supercells for each structure and use a lookup instead of generating it on the fly. Not exactly clear to me how to implement that in:

pymatgen/pymatgen/analysis/structure_matcher.py

Lines 469 to 492 in 6d1eae9

    
           def sc_generator(s1, s2): 
        
               s2_fc = np.array(s2.frac_coords) 
        
               if fu == 1: 
        
                   cc = np.array(s1.cart_coords) 
        
                   for l, sc_m in self._get_lattices(s2.lattice, s1, fu): 
        
                       fc = l.get_fractional_coords(cc) 
        
                       fc -= np.floor(fc) 
        
                       yield fc, s2_fc, av_lat(l, s2.lattice), sc_m 
        
               else: 
        
                   fc_init = np.array(s1.frac_coords) 
        
                   for l, sc_m in self._get_lattices(s2.lattice, s1, fu): 
        
                       fc = np.dot(fc_init, np.linalg.inv(sc_m)) 
        
                       lp = lattice_points_in_supercell(sc_m) 
        
                       fc = (fc[:, None, :] + lp[None, :, :]).reshape((-1, 3)) 
        
                       fc -= np.floor(fc) 
        
                       yield fc, s2_fc, av_lat(l, s2.lattice), sc_m 
        
           if s1_supercell: 
        
               for x in sc_generator(struct1, struct2): 
        
                   yield x 
        
           else: 
        
               for x in sc_generator(struct2, struct1): 
        
                   # reorder generator output so s1 is still first 
        
                   yield x[1], x[0], x[2], x[3]

_cart_dists is probably the easiest to replace or numba-fy, e.g. by implementing a jit-ed version of the Kabsch algorithm (@kjappelbaum)

Would be good to know how much memory is taken up by each calculation and whether or not these could be calculated in parallel on typical consumer GPUs.

Happy to move this to discussions if that's a better place.

The text was updated successfully, but these errors were encountered:

shyuep · 2022-07-29T23:07:46Z

Speed ups are always welcome. But I want to note a few things:

Hashing is already done. E.g., if the fractional composition are not equal, two structures would immediately evaluate to not being equal without any matrix comparisons.
Most of the code is already vectorized in numpy. Any GPU optimizations that work with numpy would work with structure matcher.

If any further optimizations are implemented, it should not be at the cost of code maintainability or support for simple single CPU machines.

I should add that as implemented, StructureMatcher is meant for simple one-off comparisons. When we actually run matching across large structure sets (e.g., entire ICSD), we use various tricks to speed it up. Pre-grouping by composition. Reducing to primitive cell. All these allow us to disable some of the more costly aspects of the general structure matcher, e.g., generating supercells. Of course, the only people who really care about performance in this regard are people working with large databases of materials. That's not a huge population to begin with.

mkhorton · 2022-07-30T00:28:43Z

Thanks for creating this issue @sgbaird.

I agree with @shyuep's general points, but note that I don't think this is the easiest code to maintain as it is currently written either (eg, we do use the custom Cython linear assignment code that's quite difficult to understand), so I hope that there's good scope for improvements!

While large-scale usage may be comparatively rare, I certainly think the number of people needing to do large numbers of comparisons is increasing, and it's often been a bottleneck.

mkhorton · 2022-07-30T00:35:04Z

For the specific suggestions:

we should be able to run _preprocess on all structures once and short-circuit it during later match matrix calculations

Yes, this seems like an obvious optimization if this is being repeated. It's probably not the most critical piece (since it only has to be done once for each structure) but perhaps get_niggli_reduced_lattice can be accelerated too.

precompute several common supercells for each structure and use a lookup instead of generating it on the fly.

Think we'd have to try this out to see how useful it'd be in practice; it's not obvious to me how many supercells are needed to be generated.

_cart_dists is probably the easiest to replace or numba-fy, e.g. by implementing a jit-ed version of the Kabsch algorithm

Agreed, I do want to point out this for examples of where else optimized code in pymatgen lives (this was for neighbor finding). I note that numba is both excellent and a really troublesome dependency, since it typically lags behind the latest numpy and Python versions, so might err towards a Cython solution if feasible. Perhaps a simpler numpy-vectorized solution might also be a better option -- as @shyuep notes this is done in several places already, but perhaps we've missed an opportunity.

lan496 · 2022-08-04T12:55:09Z

Let me mention that StructureMatcher.group_structures calls _preprocess for each structure only once (#2490).
So, it is a little improvement, but using StructureMatcher.group_structures will reduce _preprocess part of the flame graph from O(n^2) to O(n), and ~150% faster as a whole.

mkhorton · 2022-08-08T16:50:37Z

That's fantastic @lan496! Sometimes the key to big speed improvements is lots of "small" optimizations, this is definitely appreciated :)

kavanase · 2024-10-09T19:13:23Z

Just to second the discussions here, any form of further optimisation for StructureMatcher (particularly the _cart_dists() function which tends to be the bottleneck for large cells -- Cython?) would be really amazing

sgbaird mentioned this issue Aug 9, 2022

cheaper matching sparks-baird/matbench-genmetrics#9

Open

wladerer mentioned this issue Feb 22, 2024

Should we let the StructureMatcher method work at the affine space group classification level instead of the crystallographic space group classification level? #3612

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster structure matching via `StructureMatcher` (and by extension, `group_structures`) #2593

Faster structure matching via `StructureMatcher` (and by extension, `group_structures`) #2593

sgbaird commented Jul 29, 2022

shyuep commented Jul 29, 2022

mkhorton commented Jul 30, 2022

mkhorton commented Jul 30, 2022

lan496 commented Aug 4, 2022

mkhorton commented Aug 8, 2022

kavanase commented Oct 9, 2024

Faster structure matching via StructureMatcher (and by extension, group_structures) #2593

Faster structure matching via StructureMatcher (and by extension, group_structures) #2593

Comments

sgbaird commented Jul 29, 2022

shyuep commented Jul 29, 2022

mkhorton commented Jul 30, 2022

mkhorton commented Jul 30, 2022

lan496 commented Aug 4, 2022

mkhorton commented Aug 8, 2022

kavanase commented Oct 9, 2024

Faster structure matching via `StructureMatcher` (and by extension, `group_structures`) #2593

Faster structure matching via `StructureMatcher` (and by extension, `group_structures`) #2593