Fix performance regression of mesh map generation #1847

klevzoff · 2022-03-25T06:45:03Z

This PR fixes performance regression of mesh map generation by:

removing dynamic allocations in cell-face loop
restoring some parallel loops that were made serial
removing huge over-allocations when constructing byLowestNode arrays (this affects memory footprint more than speed)

In addition, it restores strict total ordering for NodesAndElementOfFace entries, which is required to have uniquely defined (race-condition-free) maps in case of multi-threaded construction.

Timings (inclusive) for geosx::CellBlockManager::buildMaps, measured on an InternalMesh with 400k elements:

# threads	This branch	`develop` (`9f15819`)
1	0.539594 s	28.683256 s
2	0.328936 s	28.594731 s
4	0.213764 s	28.285016 s
8	0.171741 s	27.934983 s

Mesh with 5M elements:

# threads	This branch
1	6.532824 s
2	3.741959 s
4	2.423015 s
8	1.966910 s

(only current PR is measured, I was unable to run meshes larger than ~500k with current develop as it hangs indefinitely).

High watermark of mesh generation is about 2GB/1M cells (including intentional overallocations in the maps for fracture generation). I don't think we can get significantly lower than that without specialization for cell shapes.

LvArray PR: GEOS-DEV/LvArray#256
Rebaseline PR: https://github.com/GEOSX/integratedTests/pull/211

Rebaseline required for some of the tests (those using multiple cell blocks like "staircase" mesh, or any external meshes). No numeric diffs, but there are some changes in the order of mesh maps.

corbett5 · 2022-03-25T16:24:53Z

I haven't had a chance to look at this yet but the numbers speak for themselves. Is most of this other than the rewriting of the construction of byLowestNode just fixing regressions? And when did the regressions come in?

TotoGaz · 2022-03-25T17:28:25Z

And when did the regressions come in?

It's most likely my PR #1418 which in the very beginning was aiming at dealing with arbitrary cells
So I broke some optimization to make it more generic.
This was surely a wrong idea, my bad. 😞

klevzoff · 2022-03-26T02:14:06Z

I haven't had a chance to look at this yet but the numbers speak for themselves. Is most of this other than the rewriting of the construction of byLowestNode just fixing regressions?

95% of performance or more was lost in dynamic allocations occurring in faceByLowestNode construction - I restored the non-allocating API for CellBlock::getFaceNodes() and got rid of storing all face nodes in the struct. In a couple other places I replaced serial algorithms with parallel (although most of them weren't bottlenecks). In all cases when building ArrayOfArrays, I replaced resizeValues()+appendArray() sequence of calls with a single resizeFromCapacities(), and got rid of arbitrary overallocations by always performing exact capacity precomputation loop.

Everything else more or less is just cleanup/refactoring, extracting repeating patterns (such as forEqualRanges that comes up every time we iterate over byLowestNode data), etc. The diff is a bit noisier than it could be because of it.

klevzoff · 2022-03-26T02:22:30Z

It's most likely my PR #1418 which in the very beginning was aiming at dealing with arbitrary cells
So I broke some optimization to make it more generic.
This was surely a wrong idea, my bad.

To be fair, a few of us (myself included) have looked at that PR and didn't spot any problems. It's only due to @jeannepellerin's work that I decided to take another look at status quo. I hope that this will provide a more meaningful baseline when measuring the effects of cell shape specialization against the general case.

rrsettgast · 2022-03-25T16:50:03Z

src/coreComponents/codingUtilities/Utilities.hpp

+  {
+    func( *r_first, std::distance( r_first, r_last ) );
+  },
+                  std::forward< COMP >( comp ) );


Suggested change

std::forward< COMP >( comp ) );

std::forward< COMP >( comp ) );

I don't know what the best format for this is...but I like this way

This formatting is forced by uncrustify (both shifting lambda body to the left and function argument alignment). Not even adding /* *INDENT-OFF* */ directive seems to help. The only way to make it better would be to switch argument order and put lambda in last place, but then I won't be able to have the comp argument defaulted... which is kind of important.

On a second though, I could also just declare the lambda in a separate statement

I think we have a custom command for this specified in the uncrustify.cfg file. It is UNCRUSTIFY-OFF and UNCRUSTIFY-ON. Of course, this isn't the right way to handle this case.

rrsettgast · 2022-03-25T17:40:06Z

src/coreComponents/mesh/utilities/MeshMapUtilities.hpp

+ */
+template< typename POLICY, typename T, typename COMP = LvArray::sortedArrayManipulation::less< T > >
+ArrayOfSets< T >
+makeArrayOfSets( ArrayOfArrays< T > && input, bool valuesAreUnique, COMP comp = {} )


Should this be a function in LvArray?

The only reason this function exists is because LvArray::ArrayOfSets::assimilate() does sub-array sorting in serial, but there is a TODO comment saying it should take an execution policy argument. I was trying to avoid making an LvArray PR, but if it's the right thing to do, I will go ahead and just update assimilate().

Well...I am not sure I would say it is the "right way", but it does make assimilate() better, and removes this function...so maybe there are advantages to that.

src/coreComponents/mesh/utilities/MeshMapUtilities.hpp

rrsettgast · 2022-03-27T20:11:02Z

src/coreComponents/mesh/CellElementSubRegion.hpp

-                     array1d< localIndex > & nodeIndices ) const;
+  localIndex getFaceNodes( localIndex const elementIndex,
+                           localIndex const localFaceIndex,
+                           Span< localIndex > const nodeIndices ) const;


Wondering about the use of span vs arraySlice1d. Would it be fair to think of geosx::span as a stride 1 arraySlice1d?

Yes, they are mostly equivalent (span is a bit lighter on the stack, but it's not too important). However, span is a more general abstraction - it can be constructed from vector, arraySlice1d, stackArray1d, or a C-array (this latter case is actually how I use it here). Constructing an arraySlice1d from C-array requires too much setup (need to declare separate vars on stack to hold size and stride). Previously (before CellBlock refactoring) we used to pass a raw pointer here, span is basically just that but with miminal buffer overflow protection (at least in debug).

Did you add span in this PR?

(If so I guess I'll see it in just a sec...)

Nope, been added a while ago and I'm slowly extending its use when I see fit. I guess it adds to code/style inconsistency in the way we pass various memory views (pointer, slice, view, span, array-ref), but I can't think of a better solution in some cases.

src/coreComponents/mesh/EdgeManager.cpp

rrsettgast · 2022-03-27T20:21:31Z

src/coreComponents/mesh/generators/CellBlockManager.cpp

+    CellBlock const & cb = this->getCellBlocks().getGroup< CellBlock >( iCellBlock );
+    arrayView2d< localIndex const, cells::NODE_MAP_USD > const elemToNode = cb.getElemToNode();
+    forAll< parallelHostPolicy >( cb.numElements(), [elemsPerNode = elemsPerNode.toView(),
+                                                     elemToNode, &cb] ( localIndex const cellID )


I would think that 'cellIndex' would be better than cellID. I don't know...what is your preference in this case?

Yes, I think I agree. We had a mix and match of naming styles across this part of the code, with k, iCell, cellID, cellIndex, etc. I tried to consistify and pick what seemed most commonly used, but it wasn't my top preference either. I'll go ahead and replace across all of mesh component.

rrsettgast · 2022-03-27T20:29:59Z

src/coreComponents/mesh/generators/CellBlockManager.cpp

      for( localIndex a = 0; a < numNodesPerElement; ++a )
      {
-        localIndex const nodeIndex = elemToNode( k, a );
-        RAJA::atomicInc< parallelHostAtomic >( &elemsPerNode[ nodeIndex ] );
+        localIndex const nodeID = elemToNode( cellID, a );


same question about ID vs Index.

rrsettgast · 2022-03-27T20:31:53Z

src/coreComponents/mesh/utilities/MeshMapUtilities.hpp

 */
 template< typename T >
 GEOSX_HOST_DEVICE
-inline localIndex size0( ArrayOfArraysView< T const > const & map )
+inline localIndex size0( ArrayOfArraysView< T > const & map )


It feels like we shouldn't have these functions?

Yes, but they are currently necessary because ArrayOfArrays::size() and array2d::size() mean different things. I explained in a bit more detail and made a proposal here: GEOS-DEV/LvArray#253 but that change should be its own PR.

I had forgot about this little piece of ugliness.

TotoGaz

Question about some architecture implications.

TotoGaz · 2022-03-25T20:22:49Z

src/coreComponents/mesh/FaceManager.cpp

@@ -116,7 +116,7 @@ void populateRegions( ElementRegionManager const & elementRegionManager,
      for( localIndex iFaceLoc = 0; iFaceLoc < subRegion.numFacesPerElement(); ++iFaceLoc )
      {
        // iFaceLoc is the node index in the referential of each cell (0 to 5 for a cube, e.g.).
-        // While iFace is the global index of the node.
+        // While faceNumber is the global index of the node.


Looks like you did not rename iFace?

I think you've started the review prior to my latest commit, in which I did additional naming cleanup following @rrsettgast's suggestion. So now iFace -> faceIndex and iFaceLoc -> faceNum. Let me know if I still missed something though.

TotoGaz · 2022-03-28T23:40:35Z

src/coreComponents/codingUtilities/Utilities.hpp

+ *       but in most cases default value (std::equal_to<>) should be fine.
+ */
+template< typename ITER, typename FUNC, typename COMP = std::equal_to<> >
+void forEqualRanges( ITER first, ITER const last, FUNC && func, COMP && comp = {} )


So is this basically a zip iterator with a shift of 1 on the iterable that filters out the pairs that are made of identical first and second?

I quite do not understand the forEqualRanges though.

Not quite, the way you described it (if I read it correctly) the user function would receive pairs of not-equal elements. Whereas this function gives it sub-ranges of equal elements, so for input [1,1,2,2,2,3,3,1] the function would be called with [1,1], [2,2,2], [3,3], [1] (passed via pairs of iterators into the input). I prefer to think of it as similar to split_view, only instead of splitting on (and removing) a particular value, it splits on adjacent-pair predicate lhs != rhs and preserves all values.

But I definitely copy-pasted the documentation from forUniqueValues, which I will fix!

Maybe forUniformRanges makes more sense that forEqualRanges? Plus providing small pseudo code example like you did in the doxygen can be of great help.

TotoGaz · 2022-03-28T23:53:57Z

src/coreComponents/codingUtilities/Utilities.hpp

+ *       but in most cases default value (std::equal_to<>) should be fine.
+ */
+template< typename ITER, typename FUNC, typename COMP = std::equal_to<> >
+void forUniqueValues( ITER first, ITER const last, FUNC && func, COMP && comp = {} )


The forEqualRanges func acts on ITER while the forUniqueValues act on the "reference".
Is there a real reason for this discrepancy?

And the two descriptions are absolutely identical while the behavior a not the same.
I would update the doc and maybe write some pseudo code on whats really happening (the equivalent func calls for vector {1, 1, 3, 1, 4, 5, 5} for example).

I do not think that forUniqueValues really represent the behavior of this function. Maybe add some adjacent or consecutive in the name?

This function is a simple wrapper around forEqualRanges. It's existence is not strictly necessary, it just saves some typing. In some cases we're not interested in the repeated values inside each sub-range, just one "representative" from each group and/or the group size. In this case, a reference to the "representative" is a reasonable thing to pass. When the whole sub-range is of interest, passing iterators is the only way.

Naming is hard! I'm struggling to express succinctly the idea of passing back each "different-from-previous" value. forUniqueValues is definitely not telling the entire story, but it has certain similarity to std::unique which behaves in a comparable way when presented with unsorted input.

Great idea about adding an example in the doxygen block for both of these, will definitely do.

Based on the example you provided, I think that forUniqueValues makes sense. But you'll need example un pseudo code to make the understanding easier 😉

TotoGaz · 2022-03-29T00:26:52Z

src/coreComponents/mesh/FaceManager.hpp

   * @param[in] elemManager element manager allowing access to the cell elements
   */
-  void sortAllFaceNodes( NodeManager const & nodeManager,
+  void sortAllFaceNodes( NodeManager const & faceIndex,


Are you sure?

TotoGaz · 2022-03-29T00:36:10Z

src/coreComponents/mesh/utilities/MeshMapUtilities.hpp

-  for( localIndex i = 0; i < map.size(); ++i )
+  ArrayOfArrays< localIndex > & toElementRegionList = toElements.m_toElementRegion;
+  ArrayOfArrays< localIndex > & toElementSubRegionList = toElements.m_toElementSubRegion;
+  ArrayOfArrays< localIndex > & toElementList = toElements.m_toElementIndex;


This line brings some questions indeed.

Previously m_toElementIndex was as input. Built upon purely geometrical consideration (mesh info).
While toElementRegionList relies on simulation information (reservoir simulation, mechanics). And I'm not really sure where to put toElementSubRegionList.

Now you seem to be recalculating m_toElementIndex alongside the two others, somehow breaking the separation coming from the different responsibilities.

What was wrong with the previous separation that you now want the algorithm to be merged back?

Sure, it's up for debate, there's more than one way to approach building these maps.

My take is, m_toElementIndex was a questionable input. First of all, it's redundant w.r.t. element-to-node maps that already exist in subregions. Second, it contained some element indices which were neither global nor unique - they were element indices within some cell block, without specifying which one: for some node, you could have an element list like [5,5,5,5], which is just weird? Indeed, we later establish which cell block it was by traversing element-to-node maps, but if we're required to traverse them anyway, why not just build the whole toElement connectivity from it?

In other words, I think the previous version was doing more work than it needed to:

first, traverse element-to-node maps of each CellBlock in order to build toElementList but lose block indices

then, traverse element-to-node maps again (now of each SubRegion), but while visiting each node, also search through its toElementList to try and find current element in it and thus recover its (sub)region index. This is an extra inner loop, which doesn't change asymptotic complexity (it's bounded by the secondary size of node-to-element adjacency, and I admit constant factors in O-notation are hard to reason about without direct measurement), but still it's extra work that could be avoided.

As a second concern, I couldn't see a straightforward way to parallelize the existing algorithm. The inner loop and the tricks in it (like inserting a bunch of -1s) made it difficult to use atomics which are required since the outer parallel loop is over elements. While this function was not a bottleneck of map generation, I prefer not to leave any serial code when dealing with mesh things that may scale. The original code by @corbett5 was also not parallelized, but it contained a comment about doing it via tuples, which is what I implemented. It is now slower in serial because of the extra memory bandwidth involved (building a temporary structure with tuples), but I'm almost certain there is a (realistic) mesh size and number of threads at which it is faster - and that's when it may matter the most.

First of all, it's redundant w.r.t. element-to-node maps that already exist in subregions.

What do you mean by redundant? CellElementSubRegion provide their own index plus some element -> node mappings and this is the inverted relationship? Wouldn't this argument apply to all mappings that we invert?

What I'm OK with is to unite node to elem mapping and CellElementSubRegion indices because these can be considered as raw meshing concepts. What I'm far less comfortable with is doing the ElementRegion in the same time, because it's not geometrical.

Another issue is that your PR is changing the contract of the CellBlockManagerABC without doing it.
I'm OK to make changes but those changes need to be consistent.

Right now, we still have CellBlockManagerABC::getNodeToElements() which is used in NodeManager::setGeometricalRelations( const CellBlockManagerABC & ) to define m_toElements.m_toElementIndex = cellBlockManager.getNodeToElements().
But this action is now useless since m_toElements.m_toElementIndex is overwritten in NodeManager::buildRegionMaps( const ElementRegionManager & ).

So who's responsible for computing m_toElements.m_toElementIndex, m_toElements.m_toElementSubRegion and m_toElements.m_toElementRegion? The situation of the current PR does not settle things for sure and that makes me uncomfortable. I would advocate for

Pure geometry goes to CellBlockManangerABC side.

Simulation information is done in another place.

What do you mean by redundant? CellElementSubRegion provide their own index plus some element -> node mappings and this is the inverted relationship? Wouldn't this argument apply to all mappings that we invert?

Redundant as in, not containing any information that is not available from another data structure. In case of nodes, each CellElementSubRegions's elem-to-node map (which is built by cell block and copied over) acts as the single source of truth about geometrical relationship, and node-to-element map is built just as an alternative representation of the same data used for convenience and speed of access.

And indeed, the same applies to e.g. face-to-elem map, but because it's stored differently (array2d), I haven't yet found a satisfactory way to invert elem-to-face in parallel (perhaps building a temporary ArrayOfArrays first, using the same function buildElementMaps(), and then copying into array2d is the way to go).

Another issue is that your PR is changing the contract of the CellBlockManagerABC without doing it.
I'm OK to make changes but those changes need to be consistent.
So who's responsible for computing m_toElements.m_toElementIndex, m_toElements.m_toElementSubRegion and m_toElements.m_toElementRegion?

Ideally, I would have CellBlockManager classes build only downward-looking maps (cell-to-[face/edge/node], face-to-[edge/node], edge-to-node), and recover the up-looking maps by inverting them on the MeshLevel side of the interface, as needed. It's a pretty clean contract, in my opinion.

The downside of this approach is that we lose the ability to specialize the algorithms for the up-map construction, but if the generic algorithm is fast enough, maybe that's not too big of a deal.

Right now, we still have CellBlockManagerABC::getNodeToElements() which is used in NodeManager::setGeometricalRelations( const CellBlockManagerABC & ) to define m_toElements.m_toElementIndex = cellBlockManager.getNodeToElements().
But this action is now useless since m_toElements.m_toElementIndex is overwritten in NodeManager::buildRegionMaps( const ElementRegionManager & ).

It's an oversight for sure, I meant to delete the line

m_toElements.m_toElementIndex = cellBlockManager.getNodeToElements();

(which leaves getNodeToElements() unused except for a single unit-test assertion, so I would propose to remove it)

Ideally, I would have CellBlockManager classes build only downward-looking maps (cell-to-[face/edge/node], face-to-[edge/node], edge-to-node), and recover the up-looking maps by inverting them on the MeshLevel side of the interface, as needed. It's a pretty clean contract, in my opinion.

The downside of this approach is that we lose the ability to specialize the algorithms for the up-map construction, but if the generic algorithm is fast enough, maybe that's not too big of a deal.

Hi @jeannepellerin do you have any opinion about that?
Could this approach (in a nutshell, inverting the geometrical maps outside of the CellBlockManager zone) be compatible with your developments?
It could be fruitful to get from your expertise before we modify this architecture component.

@klevzoff Does it make sense to compute the m_toElements.m_toElementIndex, m_toElements.m_toElementSubRegion upstream while computing m_toElements.m_toElementRegion later on? I would expect this to be less a problem but I'm not sure?

Well, I don't insist we necessarily do this, and this PR does not touch the inversion any maps other than toElement ones - in fact, the way some of these upward maps are being constructed currently is more efficient than an explicit inversion. It was just my proposal for a possible contract for CellBlockManagerABC, which allows it to do most of the heavy lifting - such as identification of unique faces and edges - and return the processed connectivity information in non-redundant form.

The question is do we want the mappings to be independant from one another?

No, in my opinion. E.g. elementToNode and nodeToElement are not independent, they contain the exact same information (in different forms) and must be consistent with each other 100% of the time. Having a unique point of construction of nodeToElement (rather than delegating it to multiple CellBlockManagerABC implementations) aids in that goal. If there is an efficiency argument for the construction of nodeToElement to be handled by CellBlockManager rather than NodeManager, that's a different story. With the current develop version that is not the case, hence my changes.

@klevzoff Does it make sense to compute the m_toElements.m_toElementIndex, m_toElements.m_toElementSubRegion upstream while computing m_toElements.m_toElementRegion later on? I would expect this to be less a problem but I'm not sure?

CellBlockManager cannot construct m_toElementSubRegion, because it has no idea about subregion indices, only block indices (which are different!). It could potentially provide a

virtual std::pair< ArrayOfArrays< localIndex >, ArrayOfArrays< localIndex > > getNodeToCellAndBlock();

kind of API, but in order to efficiently turn it into the final form, we also need blockIndex -> (regionIndex, subRegionIndex) mapping that is currently missing (but could be constructed).

... elementToNode and nodeToElement are not independent, they contain the exact same information (in different forms) and must be consistent with each other 100% of the time. Having a unique point of construction of nodeToElement (rather than delegating it to multiple CellBlockManagerABC implementations) aids in that goal.

I do agree with you that consistency is required.
One way to get this is to impose a map transposition for everyone.
But

The less you see the better it is, so we could transpose all the mappings far away from the managers (which is basically what we are doing right now for most of the mappings, I presume)

It may not be the only way. For seismic propagation where there can be very large cartesian grids, a mapping and its inverse can surely be computed "analytically" (assuming that we need the two mappings, which I do not know about).

CellBlockManager cannot construct m_toElementSubRegion, because it has no idea about subregion indices, only block indices (which are different!).

When I did my #1418 refactoring (predestined id...) I think that I did at least the two following questionable choices:

The dynamic arrays in the computation to prepare Corner Point Grids. It was not the best idea and I'm glad we move backwards on this. There will be long since we support CPG so no need to burden the performances of everyone. I already told about it and it's not the core subject here.

The computation of the m_toElements triple mappings: nodesto element index, sub-region index, region index. I wanted to exclude the computation of the region index because I considered that regions are not a computational geometry concept but zones to define where to apply some physics. I decided not to compute the sub-region mappings behind the CellBlockManagerABC curtain, also because we do not really know about these sub-regions at that point. I've let this completely outside of the CellBlockManager zone and I did perform no intermediate computation regarding these mappings. But like for the first point, the current discussion makes me think that this cannot be a final answer (vide infra).

It could potentially provide a

virtual std::pair< ArrayOfArrays< localIndex >, ArrayOfArrays< localIndex > > getNodeToCellAndBlock();

kind of API

CellBlockMangerABC does (edit: not) know about SubRegions and this is fine.

It can be enriched and provide a getNodeToBlock() alongside the getNodesToElements() (probably what you mean with the std::pair).

I'd expect that those two mappings could be computed in the same efficient way as you currently do (except that we would not compute the node to region mapping).
Then it would be the duty of the ElementRegionManger which manipulates the SubRegions and the CellBlocks to transform the mappings consistently (maybe this can be done in-place?). If ElementRegionManager wants to keep the same indexing, great.
And I still don't know about the node to region mapping, I naively hope it would not be very complicated to build apart but I'n not sure.

@jeannepellerin Upon further discussion with @TotoGaz, we are leaving all maps and responsibilities where they are in develop (and your PR). The only change I will implement is this: in addition to noteToCell map, CellBlockManager will return nodeToBlock map to provide the complete information about this connectivity (same goes for faces and possibly edges). This map in each entry just contains the block index corresponding to the cell index in nodeToCell, but otherwise has the same shape and can be built simultaneously.

OK. Yes, this mapping to Blocks is needed. I do not know how the clients of the class were supposed to guess the CellBlock.
These mapping are all strongly interdependant, but we can make their computations independant from one another, not all of them for sure. And a very fast map inversion is definitely the best way to do some of them.

... elementToNode and nodeToElement are not independent, they contain the exact same information (in different forms) and must be consistent with each other 100% of the time. Having a unique point of construction of nodeToElement (rather than delegating it to multiple CellBlockManagerABC implementations) aids in that goal.

Oh, and I forgot to mention the most obvious use case 😵‍💫 (listed in #1762): "Implement a PrecomputedCellBlockManager that would read data from disk and could prevent from recomputing everything every time.". If everything is precomputed, and read from disk, no need to recompute the mappings! This could be a great feature for a lot of users or researchers: when you want to test numerical strategies, you do not want to recompute the meshing part too many times.

OK. Yes, this mapping to Blocks is needed. I do not know how the clients of the class were supposed to guess the CellBlock.

Funambulist algorithm there 🙄 🤣

TotoGaz · 2022-03-29T01:43:08Z

src/coreComponents/mesh/generators/CellBlockManager.cpp

+    n1( sortedNodes[1] ),
+    n2( sortedNodes[2] ),


Is it really a big improvement not to store n0 and retrieve all the informations for the struct?

No, it's not hugely important, I mean it saves about 15% of memory footprint by this struct which is not a big deal. But n0 would remain unused even if stored (due to how the outer ArrayOfArrays is organized), and we don't like unused variables/members?

It's just that I often prefer to regroup all the information in the same place so I'm sure it's consistent. But it's most likely irrelevant here.

TotoGaz · 2022-03-29T01:44:36Z

src/coreComponents/mesh/generators/CellBlockManager.cpp

-    result.appendArray( 0 );
-    result.setCapacityOfArray( nodeID, elemsPerNode[ nodeID ] + getElemMapOverAllocation() );
-  }
+  result.resizeFromCapacities< parallelHostPolicy >( m_numNodes, elemsPerNode.data() );


So you have removed all the over allocations here?

No that's done above in the line

elemsPerNode.setValues< serialPolicy >( elemMapExtraSpacePerNode() );

TotoGaz · 2022-03-29T01:48:20Z

src/coreComponents/mesh/generators/CellBlockManager.cpp

-    const CellBlockABC & cb = this->getCellBlock( iCellBlock );
-    array2d< localIndex, cells::NODE_MAP_PERMUTATION > const & elemToNode = cb.getElemToNodes();
-    forAll< parallelHostPolicy >( cb.numElements(), [&elemsPerNode, totalNodeElems, &elemToNode, &cb] ( localIndex const k )
+    CellBlock const & cb = this->getCellBlocks().getGroup< CellBlock >( blockIndex );


Maybe change const CellBlockABC & CellBlockManager::getCellBlock( localIndex const blockIndex ) const; such that it returns CellBlock?

Yes, good idea

TotoGaz · 2022-03-29T01:49:03Z

src/coreComponents/mesh/generators/CellBlockManager.cpp

-    const CellBlockABC & cb = this->getCellBlock( iCellBlock );
-    array2d< localIndex, cells::NODE_MAP_PERMUTATION > const elemToNode = cb.getElemToNodes();
-    for( localIndex iElem = 0; iElem < cb.numElements(); ++iElem )
+    const CellBlock & cb = this->getCellBlocks().getGroup< CellBlock >( blockIndex );


TotoGaz · 2022-03-29T02:04:23Z

src/coreComponents/mainInterface/ProblemManager.cpp

+      nodeManager.buildRegionMaps( elemManager );
+      faceManager.buildRegionMaps( elemManager );


Why do you have to move those 2 lines up?

For the nodeManager (and probably for the other managers in a similar way), I assume that the data (more precisely NodeManager::m_toElements.m_toElementIndex) that is defined in NodeManager::setGeometricalRelations are not valid anymore because they get overwritten in NodeManager::buildRegionMaps.

The problem with this PR is that it blurs the clear computational boundaries and responsibilities.

NodeManager::setGeometricalRelations is not alone responsible for setting up the geometrical relations anymore.

NodeManager::buildRegionMaps is not only building the region maps anymore since it also rewrites geometrical relations. This is why we end up inserting business (reservoir, mechanics) info in the middle of the geometrical process.

I think this is what concerns me the most about the PR.

I didn't move them above setGeometricalRelations() calls though, just above some other unrelated stuff like buildSets(), on which we have no dependency.

I do see your point, but I disagree that buildRegionMaps is rewriting geometrical relations, because it's taking its source data purely from element-to-(node/face) maps that were built from CellBlockManager, just not reusing the redundant (node/face)-to-element maps. In a way, this is better for ensuring consistency between direct/reverse mappings, rather than relying on all CellBlockManagerABC implementations to do that.

The only question I have is whether or not some CellBlockManagerX could build these mappings way faster than the general algorithm. I'm sure that's the case, but as I mentioned in #1847 (comment), we still have to do a variant of the general element-(node/face) loop in order to recover (sub)region indices, so even the speed advantage will be lost.

corbett5

Nice work!

corbett5 · 2022-03-29T03:53:54Z

src/coreComponents/mesh/CellElementSubRegion.hpp

-                     array1d< localIndex > & nodeIndices ) const;
+  localIndex getFaceNodes( localIndex const elementIndex,
+                           localIndex const localFaceIndex,
+                           Span< localIndex > const nodeIndices ) const;


Did you add span in this PR?

(If so I guess I'll see it in just a sec...)

corbett5 · 2022-03-29T03:56:27Z

src/coreComponents/mesh/EmbeddedSurfaceNodeManager.cpp

+  ArrayOfArrays< localIndex > nodeToEdges =
+    meshMapUtilities::transposeIndexMap< parallelHostPolicy >( edgeToNodeMap,
+                                                               size(),
+                                                               edgeMapOverallocation() );


I really like these two functions. Much cleaner.

corbett5 · 2022-03-29T04:04:46Z

src/coreComponents/mesh/generators/CellBlockUtilities.hpp

+ *         unique values from sub-array i in a global list of unique values, while the last value
+ *         contains the total number of unique edges (i.e. an exclusive scan + total reduction).
+ */
+template< typename T >


Do you think it would make sense to put add the execution policy as a template parameter. I imagine at some point down the line we may want to run as much of this initialization on the device as possible.

Sounds good, just added it (some changes will be needed to make forEqualRanges host-device, as it's currently calling std::find_if_not, but it's easy).

corbett5 · 2022-03-29T04:09:47Z

src/coreComponents/mesh/utilities/MeshMapUtilities.hpp

-inline T const & value( arrayView1d< arrayView1d< T const > const > const & map, localIndex const i0, localIndex const i1 )
+template< typename POLICY, typename VIEW_TYPE >
+ArrayOfArrays< std::remove_const_t< typename VIEW_TYPE::ValueType > >
+transposeIndexMap( VIEW_TYPE const & srcMap,


Nice work to put al these pieces of code in one place. Plus now it's way better documented too.

corbett5 · 2022-03-29T04:13:52Z

src/coreComponents/mesh/utilities/MeshMapUtilities.hpp

+
+  // Finally, split arrays-of-tuples into separate arrays
+  // TODO: this resizing ends up scanning elemCounts 3 times, need new ArrayOfArrays API?
+  toElementRegionList.resizeFromCapacities< parallelHostPolicy >( numObjects, elemCounts.data() );


Couldn't you do

toElementRegionList.resizeFromCapacities< parallelHostPolicy >( numObjects, elemCounts.data() ); toElementSubRegionList = toElementRegionList; toElementList = toElementRegionList;

I'm pretty sure that would be more efficient, but I doubt it really matters.

Edit:
Actually it will most likely be slower because operator= basically involves a memcpy for each of in inner arrays. If this pattern becomes prevalent we could easily add a function duplicateSparsity that would do what we want. As my naming suggests I think this would be particularly useful for the matrix classes.

corbett5 · 2022-03-29T04:23:36Z

src/coreComponents/mesh/generators/CellBlockManager.cpp

-    result.appendArray( 0 );
-    result.setCapacityOfArray( nodeID, elemsPerNode[ nodeID ] + getElemMapOverAllocation() );
-  }
+  result.resizeFromCapacities< parallelHostPolicy >( m_numNodes, elemsPerNode.data() );


No that's done above in the line

elemsPerNode.setValues< serialPolicy >( elemMapExtraSpacePerNode() );

corbett5 · 2022-03-29T04:25:04Z

src/coreComponents/mesh/generators/CellBlockManager.cpp

@@ -53,196 +54,111 @@ ArrayOfArrays< localIndex > CellBlockManager::getNodeToElements() const

  // First step: how many elements for each node, stored in the elemsPerNode array.
  array1d< localIndex > elemsPerNode( m_numNodes );
-  RAJA::ReduceSum< parallelHostReduce, localIndex > totalNodeElems = 0;
+  elemsPerNode.setValues< serialPolicy >( elemMapExtraSpacePerNode() );


This function getNodeToElements() seems pretty similar to the buildToElementMaps. Any way to combine the two?

I would like to, but they produce different types of values (one ArrayOfArrays vs a struct of three of them). With enough effort, the differences could be abstracted through user-provided functions, but I'm not sure the result would be pretty. The more important question though is whether we want to keep getNodeToElements() at all as in my opinion it produces some non-intuitive results (see #1847 (comment))

klevzoff added type: optimization flag: ready for review flag: requires rebaseline Requires rebaseline branch in integratedTests labels Mar 25, 2022

klevzoff requested review from rrsettgast and TotoGaz March 25, 2022 06:45

klevzoff self-assigned this Mar 25, 2022

rrsettgast requested a review from corbett5 March 25, 2022 16:07

klevzoff mentioned this pull request Mar 26, 2022

[Feature] Add ReadTheDocs to CI #1845

Closed

rrsettgast approved these changes Mar 27, 2022

View reviewed changes

TotoGaz reviewed Mar 29, 2022

View reviewed changes

corbett5 approved these changes Mar 29, 2022

View reviewed changes

klevzoff mentioned this pull request Mar 30, 2022

Enhancements for ArrayOf[Arrays|Sets] construction/assimilation GEOS-DEV/LvArray#256

Merged

klevzoff added the flag: requires updated submodule(s) label Mar 30, 2022

klevzoff force-pushed the bugfix/klevzoff/optimize-face-gen branch 2 times, most recently from bc0ba00 to f7d35c4 Compare March 30, 2022 08:08

klevzoff added 2 commits April 2, 2022 20:48

Fix performance regression of mesh map generation

537ca64

Rebaseline some integrated tests

85df35d

klevzoff force-pushed the bugfix/klevzoff/optimize-face-gen branch from ac9ae59 to 85df35d Compare April 3, 2022 05:53

klevzoff added 2 commits April 2, 2022 23:22

Update LvArray hash post-merge

cbb0e27

Bugfix

5504add

klevzoff merged commit fa2011a into develop Apr 3, 2022

klevzoff deleted the bugfix/klevzoff/optimize-face-gen branch April 3, 2022 21:43

klevzoff mentioned this pull request Apr 26, 2022

[Bug] triaxial Driver cannot be running with develop repo #1891

Closed

TotoGaz mentioned this pull request Jan 9, 2023

incorrect assumption (3-shared nodes) about element-face connectivity for prisms #2222

Closed

	std::forward< COMP >( comp ) );
	std::forward< COMP >( comp ) );

		nodeManager.buildRegionMaps( elemManager );
		faceManager.buildRegionMaps( elemManager );

Fix performance regression of mesh map generation #1847

Fix performance regression of mesh map generation #1847

Conversation

klevzoff commented Mar 25, 2022 • edited Loading

corbett5 commented Mar 25, 2022

TotoGaz commented Mar 25, 2022

klevzoff commented Mar 26, 2022

klevzoff commented Mar 26, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klevzoff Mar 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TotoGaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TotoGaz Mar 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klevzoff Mar 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klevzoff Mar 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TotoGaz Mar 31, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klevzoff Mar 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klevzoff Mar 29, 2022 • edited Loading

Choose a reason for hiding this comment

corbett5 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klevzoff commented Mar 25, 2022 •

edited

Loading

klevzoff Mar 28, 2022 •

edited

Loading

TotoGaz Mar 29, 2022 •

edited

Loading

klevzoff Mar 29, 2022 •

edited

Loading

klevzoff Mar 29, 2022 •

edited

Loading

TotoGaz Mar 31, 2022 •

edited

Loading

klevzoff Mar 29, 2022 •

edited

Loading

klevzoff Mar 29, 2022 •

edited

Loading