Add commentary on how the stats might be useful

Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
hpcc-systems · Oct 29, 2024 · e57d0f5 · e57d0f5
1 parent db09605
commit e57d0f5
Showing 1 changed file with 64 additions and 3 deletions.
diff --git a/tools/roxie/extract-roxie-timings.py b/tools/roxie/extract-roxie-timings.py
@@ -408,6 +408,67 @@ def sortFunc(cur):
         throughput = iops * 8192
         printRow(globalTotalRow)
         print()
-        print(f"Transactions {numGlobalRows}q {elapsed}s: Throughput={actualTransationsPerSecond:.3f}q/s Time={1/actualTransationsPerSecond:.3f}s/q  ExpectedCpuLoad={expectedCpuLoad:.3f} iops={iops:.3f} throughput={throughput/1000000:.3f}MB/s")
-        print("All times in ms unless explicitly stated")
-        print()
+        print(f"Transactions {numGlobalRows}q {elapsed}s: Throughput={actualTransationsPerSecond:.3f}q/s Time={1/actualTransationsPerSecond:.3f}s/q")
+        print(f"ExpectedCpuLoad={expectedCpuLoad:.3f} iops={iops:.3f}/s DiskThroughput={throughput/1000000:.3f}MB/s")
+
+        commentary = '''
+"How can the output of this script be useful?  Here are some suggestions:"
+""
+"%BranchMiss."
+"                Branches are accessed probably 100x more often than leaves.  I suspect <2% is a good number to aim for."
+"%LeafMiss."
+"                An indication of what proportion of the leaves are in the cache."
+"%LeafFetch."
+"                How often a leaf that wasn't in the cache had to be fetched from disk i.e. was not in the page cache."
+""
+"                For maximum THROUGHPUT it may be best to reduce %LeafMiss - since any leaf miss has a cpu overhead decompressing the leaves, and to a 1st order approximation"
+"                disk read time should not affect throughput.  (The time for a leaf page-cache read is a good estimate of the cpu impact of reading from disk.)"
+"                For minimum LATENCY you need to minimize %LeafMiss * (decompressTime + %LeafFetch * readTime).  Reducing the leaf cache size will increase leafMiss, but reduce leafFetch."
+"                Because the pages are uncompressed in the page cache and compressed in the leaf cache the total number of reads will likely reduce."
+"                If decompresstime << readTime, then it is more important to reduce %LeafMiss than %LeafFetch - which is particularly true of the new index format."
+""
+"avgTimeAgentProcess"
+"                How long does it take to process a query on the agent?  Only accurate on 9.8.x and later."
+""
+"TimeLocalCpu    (TimeLocalExecute - TimeAgentWait - TimeSoapcall).  How much cpu time is spent on the server when processing the query?"
+"RemoteCpuTime   (TimeAgentProcess - (TimeLeafRead - TimeBranchRead)).  How much cpu time is spent on the workers processing a query?"
+""
+"Load,Throughput"
+"                The sum of these cpu times allows us to calculate an upper estimate of the cpu load and the maximum throughput (reported in the summary line).  If there is contention within the"
+"                roxie process then the estimated cpu load will be higher than actual.  If so, it suggests some investigation of the roxie server is likely to improve throughput."
+""
+"MaxWorkerThreads"
+"                Assuming the system is CPU bound, all index operations occur on the worker, and there is no thread contention (all bug assumptions), then given the stats for branch and leaf"
+"                misses, what is the number of worker threads that would result in a cpu load that matches the number of cpus.  The actual limit can probably be slightly higher since there is"
+"                variation in the number of requests which actually come from disk, but much higher is in danger of overcommitting the number of cpus.  Provides an initial ballpark figure."
+"MaxTransactionsPerSecond, CpuLoad@10q/s"
+"                For the specified number of cpus, what is the maximum number of transactions per second that could be supported.  What cpu load would you expect if queries were submitted at 10q/s."
+"MaxServerThreads"
+"                Taking into account the expected cpu load and time to process queries, how many server threads should be configured to support that throughput?  This is likely to be an under-estimate,"
+"                but also an initial ball-park figure."
+""
+"From the final summary line:"
+""
+"Transactions    How many transactions in the sample, the elapsed time period, implied throughput and inverse time per transaction"
+"ExpectedCpuLoad If there was no time lost to contention, what is the expected cpu load when running these queries?  If the actual load is much lower then that suggests there is opportunity"
+"                for improving the roxie code to reduce contention and improve throughput/latency."
+"iops            How many iops are required to support all the *actual* disk reads (not including those satisfied from page cache)"
+"DiskThroughput  What rate does the disk need to support to transfer the data for all the queries?"
+""
+"Note: All times in ms unless explicitly stated"
+'''
+        print(commentary)
+
+# Thoughts for future enhancements:
+#
+# Calculate average time for a leaf page-cache read and see how it affects the derived stats.
+#
+# Take into account NumAgentRequests and generate some derived stats.
+#   avgQueue time - are the workers overloaded
+#   TimeAgentWait v TimeAgentProcess?
+#
+# Add summary stats for the network traffic (needs SizeAgentRequests from 9.8.x)
+#
+# Calculate some "what if" statistics
+# - If the decompression time was reduced by 50%
+# - if the leaf cache hit 50% less, but reduced reads by 5% what is the effect?