-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize coordinates._construct_face_centroids()
#1117
base: main
Are you sure you want to change the base?
Conversation
Good fix, will test welzl and see appropriate fix. |
ASV BenchmarkingBenchmark Comparison ResultsBenchmarks that have improved:
Benchmarks that have stayed the same:
|
Nice work @erogluorhan It appears that a few ASV benchmarks are failing (may or may not be due to the changes in this PR). I can look at it deeper into it on Monday. |
I can't see any failing benchmarks though! @philipc2 |
If you take a look at the GitHub Bot above, it shows which ones failed. If they failed before and after, it means something was wrong before this PR. Looks like this PR didn't break anything, must have been something with the benchmark written before. |
Ok, I see what you mean now, @philipc2 ! Also, re benchmarking, I'd expect |
centroid_x = np.sum(node_x[face_nodes] * mask, axis=1) / n_nodes_per_face | ||
centroid_y = np.sum(node_y[face_nodes] * mask, axis=1) / n_nodes_per_face | ||
centroid_z = np.sum(node_z[face_nodes] * mask, axis=1) / n_nodes_per_face |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This here is problematic for grids that have variable element sizes.
face_node_connectivity
will contain fill values, which will lead to an index error here.
IndexError: index -9223372036854775808 is out of bounds for axis 0 with size 83886080
This wasn't an issue for the SCREAM dataset because it is completely composed of quads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about fixing this and changing/standardizing the way we handle connectivity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing/standardizing the way we handle connectivity
Can you elaborate? I've explored storing connectivity as partitioned arrays in #978, although for something as simple as computing centroids, Numba seems to a significantly simpler implementation, performing closer to compiled language performance without needing to create an additional mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that with -1 etc. the mask approach fails, isn't that correct? so, I was thinking adopting a convention that allows for masking to work would simply things.
Not sure about the performance as I think NumPy's vectorized operations might be highly optimized for numerical computations and hopefully outperform the explicit loop here, even when those loops are parallelized with Numba.
Here is a minimal example that fails. Feel free to create a test case from this node_lon = np.array([-20.0, 0.0, 20.0, -20, -40])
node_lat = np.array([-10.0, 10.0, -10.0, 10, -10])
face_node_connectivity = np.array([[0, 1, 2, -1], [0, 1, 3, 4]])
uxgrid = ux.Grid.from_topology(
node_lon=node_lon,
node_lat=node_lat,
face_node_connectivity=face_node_connectivity,
fill_value=-1,
)
uxgrid.construct_face_centers() |
Consider this implementation with Numba. @njit(cache=True, parallel=True)
def _construct_face_centroids(node_x, node_y, node_z, face_nodes, n_nodes_per_face):
centroid_x = np.zeros((face_nodes.shape[0]), dtype=np.float64)
centroid_y = np.zeros((face_nodes.shape[0]), dtype=np.float64)
centroid_z = np.zeros((face_nodes.shape[0]), dtype=np.float64)
n_face = n_nodes_per_face.shape[0]
for i_face in prange(n_face):
n_max_nodes = n_nodes_per_face[i_face]
x = np.mean(node_x[face_nodes[i_face, 0:n_max_nodes]])
y = np.mean(node_y[face_nodes[i_face, 0:n_max_nodes]])
z = np.mean(node_z[face_nodes[i_face, 0:n_max_nodes]])
x, y, z = _normalize_xyz_scalar(x, y, z)
centroid_x[i_face] = x
centroid_y[i_face] = y
centroid_z[i_face] = z
return centroid_x, centroid_y, centroid_z Without partitioning our arrays, it's really difficult to find an elegant vectorized solution directly using NumPy. |
Yes, it was a sticky one, I left it for a later PR to optimize this. |
this also might use a bit more memory due to the creation of the mask array - but that should be fine for larger grids and the speedup would be pretty good considering it uses vectorized operations and broadcasting. Seems to work and I'm okay with approving this. |
Let me know if you have any thoughts about the Numba implementation that I shared. |
Fine with the Numba suggestion also, we have modify this PR to use that, we use Numba all over the project. |
Closes #1116.
Overview
Optimizes
coordinates.py/_construct_face_centroids()
by replacing the for-loop with vectorization with the help of a masking method thanks to @hongyuchen1030 's older suggestionFurther optimization of the code would be possible, but maybe it is handled in future PRs.
FYI @philipc2 : The masking method this PR uses might be useful for cases we previously dealt with via partitioning, e.g. the nodal_averaging problem we look into last year
FYI @rajeeja : This is where the cartesian-based centroid calculations are being fixed. Welzl is a bit different but you may still want to look into it.
Results
#1116 shows the profiling results for the existing code for a 4GB SCREAM dataset where the execution time is around 5 mins.
The optimized code gives around 6 seconds, see below:
PR Checklist
General
Testing