Releases: TACC/remora
2.0.0
Release v1.8.5
Converted all scripts to bash. Updated temperature module. Created unit tests and updated graphics for some modules.
Minor release to fix problems
This release fixes:
- Provide OPA support (changes in install.sh and fixes is opt)
- Put links to impi fraction and breakdown in remora_summary.html
- Fixed and changed numa to show THP hits. (use foreign instead of miss-- which are the real misses)
- updated in ib to include hfi1 devices
Minor release to fix a couple of annoying problems
This release fixes:
- The help message. Some people really care about help messages, so it's a good idea to cater for them too. We broke this a couple of releases back.
- If users are running locally, it's good to add a couple of extra checks when retrieving the hostname. If nothing works, set the hostname to localhost and give it a go.
House cleaning (and some bug fixing)
Adding MPI statistics
In this version we add MPI statistics for mvapich2 and Intel MPI. We are working on openmpi and cray-mpich, and they will be added in a future release. Two new plots will be generated in the MPI directory, a pie plot with the percent of time spent on MPI (aggregated communications and MPI-IO), and a bar plot with the top 5 most time consuming MPI calls.
Internally, we have also changed the way some of the scripts are invoked in order to be consistent. This change has the added benefit of allowing for a much cleaner and better working verbose mode. If you are thinking of adding a new module nothing changes for you, this modification took place only at the top level of the script hierarchy.
Now with csh / tcsh support
This is a bug fix release that extends remora support for users with default csh or tcsh shells. No other changes in functionality have been made.
Bells and whistles
Version 1.7.0 is a feature release that adds power and temperature monitoring, support for Infiniband devices other than mlx4, support for PBS schedulers, improved Inifiniband and Ethernet network monitoring, full support for Intel Knights Landing with automated NUMA node detection, protection from output directory being overwritten, and improved graphics.
This version also includes two new scripts, one that can generate a summary after a code crash, and one that can monitor memory utilization and kill the job before it exceeds the available memory on the node.
1.6.0.1
Introducing real-time monitoring
In this release we introduce a real-time monitoring mode that should be useful to track jobs that are suspected to be problematic. We also fixed a minor bug that could produce unnecessary warnings when using over 32GB of memory. This was due to a hardcoded value that is now calculated on the fly.