-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance regression of mesh map generation #1847
Conversation
I haven't had a chance to look at this yet but the numbers speak for themselves. Is most of this other than the rewriting of the construction of |
It's most likely my PR #1418 which in the very beginning was aiming at dealing with arbitrary cells |
95% of performance or more was lost in dynamic allocations occurring in Everything else more or less is just cleanup/refactoring, extracting repeating patterns (such as |
To be fair, a few of us (myself included) have looked at that PR and didn't spot any problems. It's only due to @jeannepellerin's work that I decided to take another look at status quo. I hope that this will provide a more meaningful baseline when measuring the effects of cell shape specialization against the general case. |
{ | ||
func( *r_first, std::distance( r_first, r_last ) ); | ||
}, | ||
std::forward< COMP >( comp ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::forward< COMP >( comp ) ); | |
std::forward< COMP >( comp ) ); |
I don't know what the best format for this is...but I like this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This formatting is forced by uncrustify (both shifting lambda body to the left and function argument alignment). Not even adding /* *INDENT-OFF* */
directive seems to help. The only way to make it better would be to switch argument order and put lambda in last place, but then I won't be able to have the comp
argument defaulted... which is kind of important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second though, I could also just declare the lambda in a separate statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a custom command for this specified in the uncrustify.cfg
file. It is UNCRUSTIFY-OFF
and UNCRUSTIFY-ON
. Of course, this isn't the right way to handle this case.
*/ | ||
template< typename POLICY, typename T, typename COMP = LvArray::sortedArrayManipulation::less< T > > | ||
ArrayOfSets< T > | ||
makeArrayOfSets( ArrayOfArrays< T > && input, bool valuesAreUnique, COMP comp = {} ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a function in LvArray
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason this function exists is because LvArray::ArrayOfSets::assimilate()
does sub-array sorting in serial, but there is a TODO
comment saying it should take an execution policy argument. I was trying to avoid making an LvArray
PR, but if it's the right thing to do, I will go ahead and just update assimilate()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well...I am not sure I would say it is the "right way", but it does make assimilate()
better, and removes this function...so maybe there are advantages to that.
array1d< localIndex > & nodeIndices ) const; | ||
localIndex getFaceNodes( localIndex const elementIndex, | ||
localIndex const localFaceIndex, | ||
Span< localIndex > const nodeIndices ) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering about the use of span
vs arraySlice1d
. Would it be fair to think of geosx::span
as a stride 1 arraySlice1d
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are mostly equivalent (span
is a bit lighter on the stack, but it's not too important). However, span
is a more general abstraction - it can be constructed from vector
, arraySlice1d
, stackArray1d
, or a C-array (this latter case is actually how I use it here). Constructing an arraySlice1d
from C-array requires too much setup (need to declare separate vars on stack to hold size and stride). Previously (before CellBlock
refactoring) we used to pass a raw pointer here, span
is basically just that but with miminal buffer overflow protection (at least in debug).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you add span
in this PR?
(If so I guess I'll see it in just a sec...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, been added a while ago and I'm slowly extending its use when I see fit. I guess it adds to code/style inconsistency in the way we pass various memory views (pointer, slice, view, span, array-ref), but I can't think of a better solution in some cases.
CellBlock const & cb = this->getCellBlocks().getGroup< CellBlock >( iCellBlock ); | ||
arrayView2d< localIndex const, cells::NODE_MAP_USD > const elemToNode = cb.getElemToNode(); | ||
forAll< parallelHostPolicy >( cb.numElements(), [elemsPerNode = elemsPerNode.toView(), | ||
elemToNode, &cb] ( localIndex const cellID ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that 'cellIndex' would be better than cellID
. I don't know...what is your preference in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I agree. We had a mix and match of naming styles across this part of the code, with k
, iCell
, cellID
, cellIndex
, etc. I tried to consistify and pick what seemed most commonly used, but it wasn't my top preference either. I'll go ahead and replace across all of mesh
component.
for( localIndex a = 0; a < numNodesPerElement; ++a ) | ||
{ | ||
localIndex const nodeIndex = elemToNode( k, a ); | ||
RAJA::atomicInc< parallelHostAtomic >( &elemsPerNode[ nodeIndex ] ); | ||
localIndex const nodeID = elemToNode( cellID, a ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question about ID
vs Index
.
*/ | ||
template< typename T > | ||
GEOSX_HOST_DEVICE | ||
inline localIndex size0( ArrayOfArraysView< T const > const & map ) | ||
inline localIndex size0( ArrayOfArraysView< T > const & map ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we shouldn't have these functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but they are currently necessary because ArrayOfArrays::size()
and array2d::size()
mean different things. I explained in a bit more detail and made a proposal here: GEOS-DEV/LvArray#253 but that change should be its own PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had forgot about this little piece of ugliness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about some architecture implications.
@@ -116,7 +116,7 @@ void populateRegions( ElementRegionManager const & elementRegionManager, | |||
for( localIndex iFaceLoc = 0; iFaceLoc < subRegion.numFacesPerElement(); ++iFaceLoc ) | |||
{ | |||
// iFaceLoc is the node index in the referential of each cell (0 to 5 for a cube, e.g.). | |||
// While iFace is the global index of the node. | |||
// While faceNumber is the global index of the node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you did not rename iFace
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've started the review prior to my latest commit, in which I did additional naming cleanup following @rrsettgast's suggestion. So now iFace
-> faceIndex
and iFaceLoc
-> faceNum
. Let me know if I still missed something though.
* but in most cases default value (std::equal_to<>) should be fine. | ||
*/ | ||
template< typename ITER, typename FUNC, typename COMP = std::equal_to<> > | ||
void forEqualRanges( ITER first, ITER const last, FUNC && func, COMP && comp = {} ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is this basically a zip
iterator with a shift of 1
on the iterable that filters out the pairs that are made of identical first
and second
?
I quite do not understand the forEqualRanges
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, the way you described it (if I read it correctly) the user function would receive pairs of not-equal elements. Whereas this function gives it sub-ranges of equal elements, so for input [1,1,2,2,2,3,3,1]
the function would be called with [1,1]
, [2,2,2]
, [3,3]
, [1]
(passed via pairs of iterators into the input). I prefer to think of it as similar to split_view, only instead of splitting on (and removing) a particular value, it splits on adjacent-pair predicate lhs != rhs
and preserves all values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I definitely copy-pasted the documentation from forUniqueValues
, which I will fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe forUniformRanges
makes more sense that forEqualRanges
? Plus providing small pseudo code example like you did in the doxygen can be of great help.
* but in most cases default value (std::equal_to<>) should be fine. | ||
*/ | ||
template< typename ITER, typename FUNC, typename COMP = std::equal_to<> > | ||
void forUniqueValues( ITER first, ITER const last, FUNC && func, COMP && comp = {} ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The forEqualRanges
func
acts on ITER
while the forUniqueValues
act on the "reference
".
Is there a real reason for this discrepancy?
And the two descriptions are absolutely identical while the behavior a not the same.
I would update the doc and maybe write some pseudo code on whats really happening (the equivalent func
calls for vector {1, 1, 3, 1, 4, 5, 5}
for example).
I do not think that forUniqueValues
really represent the behavior of this function. Maybe add some adjacent
or consecutive
in the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is a simple wrapper around forEqualRanges
. It's existence is not strictly necessary, it just saves some typing. In some cases we're not interested in the repeated values inside each sub-range, just one "representative" from each group and/or the group size. In this case, a reference to the "representative" is a reasonable thing to pass. When the whole sub-range is of interest, passing iterators is the only way.
Naming is hard! I'm struggling to express succinctly the idea of passing back each "different-from-previous" value. forUniqueValues
is definitely not telling the entire story, but it has certain similarity to std::unique which behaves in a comparable way when presented with unsorted input.
Great idea about adding an example in the doxygen block for both of these, will definitely do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the example you provided, I think that forUniqueValues
makes sense. But you'll need example un pseudo code to make the understanding easier 😉
* @param[in] elemManager element manager allowing access to the cell elements | ||
*/ | ||
void sortAllFaceNodes( NodeManager const & nodeManager, | ||
void sortAllFaceNodes( NodeManager const & faceIndex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😮
for( localIndex i = 0; i < map.size(); ++i ) | ||
ArrayOfArrays< localIndex > & toElementRegionList = toElements.m_toElementRegion; | ||
ArrayOfArrays< localIndex > & toElementSubRegionList = toElements.m_toElementSubRegion; | ||
ArrayOfArrays< localIndex > & toElementList = toElements.m_toElementIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line brings some questions indeed.
Previously m_toElementIndex
was as input. Built upon purely geometrical consideration (mesh info).
While toElementRegionList
relies on simulation information (reservoir simulation, mechanics). And I'm not really sure where to put toElementSubRegionList
.
Now you seem to be recalculating m_toElementIndex
alongside the two others, somehow breaking the separation coming from the different responsibilities.
What was wrong with the previous separation that you now want the algorithm to be merged back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it's up for debate, there's more than one way to approach building these maps.
My take is, m_toElementIndex
was a questionable input. First of all, it's redundant w.r.t. element-to-node maps that already exist in subregions. Second, it contained some element indices which were neither global nor unique - they were element indices within some cell block, without specifying which one: for some node, you could have an element list like [5,5,5,5]
, which is just weird? Indeed, we later establish which cell block it was by traversing element-to-node maps, but if we're required to traverse them anyway, why not just build the whole toElement
connectivity from it?
In other words, I think the previous version was doing more work than it needed to:
- first, traverse element-to-node maps of each
CellBlock
in order to buildtoElementList
but lose block indices - then, traverse element-to-node maps again (now of each
SubRegion
), but while visiting each node, also search through itstoElementList
to try and find current element in it and thus recover its (sub)region index. This is an extra inner loop, which doesn't change asymptotic complexity (it's bounded by the secondary size of node-to-element adjacency, and I admit constant factors in O-notation are hard to reason about without direct measurement), but still it's extra work that could be avoided.
As a second concern, I couldn't see a straightforward way to parallelize the existing algorithm. The inner loop and the tricks in it (like inserting a bunch of -1
s) made it difficult to use atomics which are required since the outer parallel loop is over elements. While this function was not a bottleneck of map generation, I prefer not to leave any serial code when dealing with mesh things that may scale. The original code by @corbett5 was also not parallelized, but it contained a comment about doing it via tuples, which is what I implemented. It is now slower in serial because of the extra memory bandwidth involved (building a temporary structure with tuples), but I'm almost certain there is a (realistic) mesh size and number of threads at which it is faster - and that's when it may matter the most.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, it's redundant w.r.t. element-to-node maps that already exist in subregions.
What do you mean by redundant? CellElementSubRegion
provide their own index
plus some element -> node
mappings and this is the inverted relationship? Wouldn't this argument apply to all mappings that we invert?
What I'm OK with is to unite node
to elem
mapping and CellElementSubRegion
indices because these can be considered as raw meshing concepts. What I'm far less comfortable with is doing the ElementRegion
in the same time, because it's not geometrical.
Another issue is that your PR is changing the contract of the CellBlockManagerABC
without doing it.
I'm OK to make changes but those changes need to be consistent.
Right now, we still have CellBlockManagerABC::getNodeToElements()
which is used in NodeManager::setGeometricalRelations( const CellBlockManagerABC & )
to define m_toElements.m_toElementIndex = cellBlockManager.getNodeToElements()
.
But this action is now useless since m_toElements.m_toElementIndex
is overwritten in NodeManager::buildRegionMaps( const ElementRegionManager & )
.
So who's responsible for computing m_toElements.m_toElementIndex
, m_toElements.m_toElementSubRegion
and m_toElements.m_toElementRegion
? The situation of the current PR does not settle things for sure and that makes me uncomfortable. I would advocate for
- Pure geometry goes to
CellBlockManangerABC
side. - Simulation information is done in another place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by redundant?
CellElementSubRegion
provide their ownindex
plus someelement -> node
mappings and this is the inverted relationship? Wouldn't this argument apply to all mappings that we invert?
Redundant as in, not containing any information that is not available from another data structure. In case of nodes, each CellElementSubRegions
's elem-to-node map (which is built by cell block and copied over) acts as the single source of truth about geometrical relationship, and node-to-element map is built just as an alternative representation of the same data used for convenience and speed of access.
And indeed, the same applies to e.g. face-to-elem map, but because it's stored differently (array2d), I haven't yet found a satisfactory way to invert elem-to-face in parallel (perhaps building a temporary ArrayOfArrays
first, using the same function buildElementMaps()
, and then copying into array2d
is the way to go).
Another issue is that your PR is changing the contract of the CellBlockManagerABC without doing it.
I'm OK to make changes but those changes need to be consistent.
So who's responsible for computing m_toElements.m_toElementIndex, m_toElements.m_toElementSubRegion and m_toElements.m_toElementRegion?
Ideally, I would have CellBlockManager
classes build only downward-looking maps (cell-to-[face/edge/node]
, face-to-[edge/node]
, edge-to-node
), and recover the up-looking maps by inverting them on the MeshLevel
side of the interface, as needed. It's a pretty clean contract, in my opinion.
The downside of this approach is that we lose the ability to specialize the algorithms for the up-map construction, but if the generic algorithm is fast enough, maybe that's not too big of a deal.
Right now, we still have CellBlockManagerABC::getNodeToElements() which is used in NodeManager::setGeometricalRelations( const CellBlockManagerABC & ) to define m_toElements.m_toElementIndex = cellBlockManager.getNodeToElements().
But this action is now useless since m_toElements.m_toElementIndex is overwritten in NodeManager::buildRegionMaps( const ElementRegionManager & ).
It's an oversight for sure, I meant to delete the line
m_toElements.m_toElementIndex = cellBlockManager.getNodeToElements();
(which leaves getNodeToElements()
unused except for a single unit-test assertion, so I would propose to remove it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I would have
CellBlockManager
classes build only downward-looking maps (cell-to-[face/edge/node]
,face-to-[edge/node]
,edge-to-node
), and recover the up-looking maps by inverting them on theMeshLevel
side of the interface, as needed. It's a pretty clean contract, in my opinion.The downside of this approach is that we lose the ability to specialize the algorithms for the up-map construction, but if the generic algorithm is fast enough, maybe that's not too big of a deal.
Hi @jeannepellerin do you have any opinion about that?
Could this approach (in a nutshell, inverting the geometrical maps outside of the CellBlockManager
zone) be compatible with your developments?
It could be fruitful to get from your expertise before we modify this architecture component.
@klevzoff Does it make sense to compute the m_toElements.m_toElementIndex
, m_toElements.m_toElementSubRegion
upstream while computing m_toElements.m_toElementRegion
later on? I would expect this to be less a problem but I'm not sure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I don't insist we necessarily do this, and this PR does not touch the inversion any maps other than toElement
ones - in fact, the way some of these upward maps are being constructed currently is more efficient than an explicit inversion. It was just my proposal for a possible contract for CellBlockManagerABC
, which allows it to do most of the heavy lifting - such as identification of unique faces and edges - and return the processed connectivity information in non-redundant form.
The question is do we want the mappings to be independant from one another?
No, in my opinion. E.g. elementToNode
and nodeToElement
are not independent, they contain the exact same information (in different forms) and must be consistent with each other 100% of the time. Having a unique point of construction of nodeToElement
(rather than delegating it to multiple CellBlockManagerABC
implementations) aids in that goal. If there is an efficiency argument for the construction of nodeToElement
to be handled by CellBlockManager
rather than NodeManager
, that's a different story. With the current develop
version that is not the case, hence my changes.
@klevzoff Does it make sense to compute the
m_toElements.m_toElementIndex
,m_toElements.m_toElementSubRegion
upstream while computingm_toElements.m_toElementRegion
later on? I would expect this to be less a problem but I'm not sure?
CellBlockManager
cannot construct m_toElementSubRegion
, because it has no idea about subregion indices, only block indices (which are different!). It could potentially provide a
virtual std::pair< ArrayOfArrays< localIndex >, ArrayOfArrays< localIndex > >
getNodeToCellAndBlock();
kind of API, but in order to efficiently turn it into the final form, we also need blockIndex -> (regionIndex, subRegionIndex)
mapping that is currently missing (but could be constructed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... elementToNode and nodeToElement are not independent, they contain the exact same information (in different forms) and must be consistent with each other 100% of the time. Having a unique point of construction of nodeToElement (rather than delegating it to multiple CellBlockManagerABC implementations) aids in that goal.
I do agree with you that consistency is required.
One way to get this is to impose a map transposition for everyone.
But
- The less you see the better it is, so we could transpose all the mappings far away from the managers (which is basically what we are doing right now for most of the mappings, I presume)
- It may not be the only way. For seismic propagation where there can be very large cartesian grids, a mapping and its inverse can surely be computed "analytically" (assuming that we need the two mappings, which I do not know about).
CellBlockManager cannot construct m_toElementSubRegion, because it has no idea about subregion indices, only block indices (which are different!).
When I did my #1418 refactoring (predestined id...) I think that I did at least the two following questionable choices:
- The dynamic arrays in the computation to prepare Corner Point Grids. It was not the best idea and I'm glad we move backwards on this. There will be long since we support CPG so no need to burden the performances of everyone. I already told about it and it's not the core subject here.
- The computation of the
m_toElements
triple mappings:nodes
toelement index
,sub-region index
,region index
. I wanted to exclude the computation of theregion index
because I considered that regions are not a computational geometry concept but zones to define where to apply some physics. I decided not to compute thesub-region
mappings behind theCellBlockManagerABC
curtain, also because we do not really know about thesesub-regions
at that point. I've let this completely outside of theCellBlockManager
zone and I did perform no intermediate computation regarding these mappings. But like for the first point, the current discussion makes me think that this cannot be a final answer (vide infra).
It could potentially provide a
virtual std::pair< ArrayOfArrays< localIndex >, ArrayOfArrays< localIndex > > getNodeToCellAndBlock();kind of API
CellBlockMangerABC
does (edit: not) know about SubRegions
and this is fine.
It can be enriched and provide a getNodeToBlock()
alongside the getNodesToElements()
(probably what you mean with the std::pair
).
I'd expect that those two mappings could be computed in the same efficient way as you currently do (except that we would not compute the node to region
mapping).
Then it would be the duty of the ElementRegionManger
which manipulates the SubRegions
and the CellBlocks
to transform the mappings consistently (maybe this can be done in-place?). If ElementRegionManager
wants to keep the same indexing, great.
And I still don't know about the node to region
mapping, I naively hope it would not be very complicated to build apart but I'n not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeannepellerin Upon further discussion with @TotoGaz, we are leaving all maps and responsibilities where they are in develop
(and your PR). The only change I will implement is this: in addition to noteToCell
map, CellBlockManager
will return nodeToBlock
map to provide the complete information about this connectivity (same goes for faces and possibly edges). This map in each entry just contains the block index corresponding to the cell index in nodeToCell
, but otherwise has the same shape and can be built simultaneously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Yes, this mapping to Blocks is needed. I do not know how the clients of the class were supposed to guess the CellBlock.
These mapping are all strongly interdependant, but we can make their computations independant from one another, not all of them for sure. And a very fast map inversion is definitely the best way to do some of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... elementToNode and nodeToElement are not independent, they contain the exact same information (in different forms) and must be consistent with each other 100% of the time. Having a unique point of construction of nodeToElement (rather than delegating it to multiple CellBlockManagerABC implementations) aids in that goal.
Oh, and I forgot to mention the most obvious use case 😵💫 (listed in #1762): "Implement a PrecomputedCellBlockManager that would read data from disk and could prevent from recomputing everything every time.". If everything is precomputed, and read from disk, no need to recompute the mappings! This could be a great feature for a lot of users or researchers: when you want to test numerical strategies, you do not want to recompute the meshing part too many times.
OK. Yes, this mapping to Blocks is needed. I do not know how the clients of the class were supposed to guess the CellBlock.
Funambulist algorithm there 🙄 🤣
n1( sortedNodes[1] ), | ||
n2( sortedNodes[2] ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really a big improvement not to store n0
and retrieve all the informations for the struct
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not hugely important, I mean it saves about 15% of memory footprint by this struct which is not a big deal. But n0
would remain unused even if stored (due to how the outer ArrayOfArrays
is organized), and we don't like unused variables/members?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that I often prefer to regroup all the information in the same place so I'm sure it's consistent. But it's most likely irrelevant here.
result.appendArray( 0 ); | ||
result.setCapacityOfArray( nodeID, elemsPerNode[ nodeID ] + getElemMapOverAllocation() ); | ||
} | ||
result.resizeFromCapacities< parallelHostPolicy >( m_numNodes, elemsPerNode.data() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you have removed all the over allocations here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that's done above in the line
elemsPerNode.setValues< serialPolicy >( elemMapExtraSpacePerNode() );
const CellBlockABC & cb = this->getCellBlock( iCellBlock ); | ||
array2d< localIndex, cells::NODE_MAP_PERMUTATION > const & elemToNode = cb.getElemToNodes(); | ||
forAll< parallelHostPolicy >( cb.numElements(), [&elemsPerNode, totalNodeElems, &elemToNode, &cb] ( localIndex const k ) | ||
CellBlock const & cb = this->getCellBlocks().getGroup< CellBlock >( blockIndex ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change const CellBlockABC & CellBlockManager::getCellBlock( localIndex const blockIndex ) const;
such that it returns CellBlock
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea
const CellBlockABC & cb = this->getCellBlock( iCellBlock ); | ||
array2d< localIndex, cells::NODE_MAP_PERMUTATION > const elemToNode = cb.getElemToNodes(); | ||
for( localIndex iElem = 0; iElem < cb.numElements(); ++iElem ) | ||
const CellBlock & cb = this->getCellBlocks().getGroup< CellBlock >( blockIndex ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here?
nodeManager.buildRegionMaps( elemManager ); | ||
faceManager.buildRegionMaps( elemManager ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you have to move those 2 lines up?
For the nodeManager
(and probably for the other managers in a similar way), I assume that the data (more precisely NodeManager::m_toElements.m_toElementIndex
) that is defined in NodeManager::setGeometricalRelations
are not valid anymore because they get overwritten in NodeManager::buildRegionMaps
.
The problem with this PR is that it blurs the clear computational boundaries and responsibilities.
NodeManager::setGeometricalRelations
is not alone responsible for setting up the geometrical relations anymore.NodeManager::buildRegionMaps
is not only building the region maps anymore since it also rewrites geometrical relations. This is why we end up inserting business (reservoir, mechanics) info in the middle of the geometrical process.
I think this is what concerns me the most about the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't move them above setGeometricalRelations()
calls though, just above some other unrelated stuff like buildSets()
, on which we have no dependency.
I do see your point, but I disagree that buildRegionMaps
is rewriting geometrical relations, because it's taking its source data purely from element-to-(node/face) maps that were built from CellBlockManager
, just not reusing the redundant (node/face)-to-element maps. In a way, this is better for ensuring consistency between direct/reverse mappings, rather than relying on all CellBlockManagerABC
implementations to do that.
The only question I have is whether or not some CellBlockManagerX
could build these mappings way faster than the general algorithm. I'm sure that's the case, but as I mentioned in #1847 (comment), we still have to do a variant of the general element-(node/face) loop in order to recover (sub)region indices, so even the speed advantage will be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
array1d< localIndex > & nodeIndices ) const; | ||
localIndex getFaceNodes( localIndex const elementIndex, | ||
localIndex const localFaceIndex, | ||
Span< localIndex > const nodeIndices ) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you add span
in this PR?
(If so I guess I'll see it in just a sec...)
ArrayOfArrays< localIndex > nodeToEdges = | ||
meshMapUtilities::transposeIndexMap< parallelHostPolicy >( edgeToNodeMap, | ||
size(), | ||
edgeMapOverallocation() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like these two functions. Much cleaner.
* unique values from sub-array i in a global list of unique values, while the last value | ||
* contains the total number of unique edges (i.e. an exclusive scan + total reduction). | ||
*/ | ||
template< typename T > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would make sense to put add the execution policy as a template parameter. I imagine at some point down the line we may want to run as much of this initialization on the device as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, just added it (some changes will be needed to make forEqualRanges
host-device, as it's currently calling std::find_if_not
, but it's easy).
inline T const & value( arrayView1d< arrayView1d< T const > const > const & map, localIndex const i0, localIndex const i1 ) | ||
template< typename POLICY, typename VIEW_TYPE > | ||
ArrayOfArrays< std::remove_const_t< typename VIEW_TYPE::ValueType > > | ||
transposeIndexMap( VIEW_TYPE const & srcMap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work to put al these pieces of code in one place. Plus now it's way better documented too.
|
||
// Finally, split arrays-of-tuples into separate arrays | ||
// TODO: this resizing ends up scanning elemCounts 3 times, need new ArrayOfArrays API? | ||
toElementRegionList.resizeFromCapacities< parallelHostPolicy >( numObjects, elemCounts.data() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you do
toElementRegionList.resizeFromCapacities< parallelHostPolicy >( numObjects, elemCounts.data() );
toElementSubRegionList = toElementRegionList;
toElementList = toElementRegionList;
I'm pretty sure that would be more efficient, but I doubt it really matters.
Edit:
Actually it will most likely be slower because operator=
basically involves a memcpy
for each of in inner arrays. If this pattern becomes prevalent we could easily add a function duplicateSparsity
that would do what we want. As my naming suggests I think this would be particularly useful for the matrix classes.
result.appendArray( 0 ); | ||
result.setCapacityOfArray( nodeID, elemsPerNode[ nodeID ] + getElemMapOverAllocation() ); | ||
} | ||
result.resizeFromCapacities< parallelHostPolicy >( m_numNodes, elemsPerNode.data() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that's done above in the line
elemsPerNode.setValues< serialPolicy >( elemMapExtraSpacePerNode() );
@@ -53,196 +54,111 @@ ArrayOfArrays< localIndex > CellBlockManager::getNodeToElements() const | |||
|
|||
// First step: how many elements for each node, stored in the elemsPerNode array. | |||
array1d< localIndex > elemsPerNode( m_numNodes ); | |||
RAJA::ReduceSum< parallelHostReduce, localIndex > totalNodeElems = 0; | |||
elemsPerNode.setValues< serialPolicy >( elemMapExtraSpacePerNode() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function getNodeToElements()
seems pretty similar to the buildToElementMaps
. Any way to combine the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to, but they produce different types of values (one ArrayOfArrays
vs a struct of three of them). With enough effort, the differences could be abstracted through user-provided functions, but I'm not sure the result would be pretty. The more important question though is whether we want to keep getNodeToElements()
at all as in my opinion it produces some non-intuitive results (see #1847 (comment))
bc0ba00
to
f7d35c4
Compare
ac9ae59
to
85df35d
Compare
This PR fixes performance regression of mesh map generation by:
byLowestNode
arrays (this affects memory footprint more than speed)In addition, it restores strict total ordering for
NodesAndElementOfFace
entries, which is required to have uniquely defined (race-condition-free) maps in case of multi-threaded construction.Timings (inclusive) for
geosx::CellBlockManager::buildMaps
, measured on anInternalMesh
with 400k elements:develop
(9f15819
)Mesh with 5M elements:
(only current PR is measured, I was unable to run meshes larger than ~500k with current
develop
as it hangs indefinitely).High watermark of mesh generation is about 2GB/1M cells (including intentional overallocations in the maps for fracture generation). I don't think we can get significantly lower than that without specialization for cell shapes.
LvArray PR: GEOS-DEV/LvArray#256
Rebaseline PR: https://github.com/GEOSX/integratedTests/pull/211
Rebaseline required for some of the tests (those using multiple cell blocks like "staircase" mesh, or any external meshes). No numeric diffs, but there are some changes in the order of mesh maps.