Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up writing output #841

Merged

Conversation

AminTorabi-NOAA
Copy link
Contributor

This PR, improve writing output. Before we used df.apply() in the aggregation process, which can be slow for large datasets because it iterates over the rows in a less optimized way. To improve performance, we restructured the code by:

Grouping Without .apply(): Instead of using apply(), which is slower, we leveraged vectorized operations like groupby() combined with direct masking and conditional aggregation. This allows for more efficient handling of grouped data.

Additions

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • No linting errors or warnings
  • JavaScript tests are passing

Copy link
Contributor

@kumdonoaa kumdonoaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great speed gain. With Lower Colorado domain, output writing took 18 sec vs 2 sec with 24hrs simulation period. With CONUS domain & 24 hrs simulation, the current code took this:
image

This PR took this:
image

@AminTorabi-NOAA AminTorabi-NOAA merged commit 470c767 into NOAA-OWP:master Sep 6, 2024
4 checks passed
taddyb33 pushed a commit to taddyb33/t-route-dev that referenced this pull request Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants