Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>
  • Loading branch information
richardkchapman committed Aug 16, 2023
1 parent a563e44 commit 3bd0f56
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 28 deletions.
25 changes: 25 additions & 0 deletions devdoc/roxie.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,3 +281,28 @@ Should the scope of the blacklist be different? Possible scopes are:

Options 2 and 4 above would allow all aspects of the blacklisting behaviour to be specified by options on the SOAPCALL. We could control whether or not the
blacklister is to be used at all via a SOAPCALL option with any of the above...

perftrace options
=================

The HPCC Platform includes a rudimentary performance tracing feature using periodic stack capture to generate flame graphs. Roxie supports this in 3 ways:

1. If expert/@profileStartup is set in roxie config, a flame graph is generated for operations during Roxie startup phase.
2. If @perf is set on an incoming query, a flame graph is generated for the lifetime of that query's execution, and returned along with the query results
3. If expert/perftrace is set in roxie config, one-shot roxie queries (e.g. eclagent mode) generate a flame graph (currently just to a text file).

The perf trace operates as follows:

1. A child process is launched that runs the doperf script. This samples the current stack(s) every 0.2s (configurable) to a series of text files.
2. When tracing is done, these text files are "folded" via a perl script that notes every unique stack and how many times it was seen, one line per unique stack
3. This folded stack list is filtered to suppress some stacks that are not very interesting
4. The filtered folded stack list is passed to another perl script that generates an svg file.

Gavin has indicated that the basic info captured at step 1 (or maybe 2) might also be analysed to give other insights:

1. A list of "time in function, time in children of function".
2. An expanded list of callers to __GI___lll_lock_wait and __GI___lll_lock_wake, to help spot contended critsecs.

Seems like there may also be some info present in the original stack text files that is lost in the folded summary - in particular related to the TID that the stack is on. Can we spot lifetimes of threads and/or should we treat "stacks" on different threads as different? Thread pools might render this difficult though.

I suspect to proceed I will need to recreate something not unlike stackcollapose-elfutils.pl, in Python (perl being too opaque...), thien consider extending it.
56 changes: 28 additions & 28 deletions roxie/ccd/ccdmain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1420,6 +1420,10 @@ int CCD_API roxie_main(int argc, const char *argv[], const char * defaultYaml)
if (runOnce)
{
// Avoid delaying the release of packages or queries - otherwise stand-alone queries can take a while to terminate
PerfTracer oneShotTracer;
bool traceOneShot = topology->getPropBool("expert/@perftrace", false); // MORE - check wu options too?
if (traceOneShot)
oneShotTracer.start();
agentQueryReleaseDelaySeconds = 0;
if (wuid)
{
Expand Down Expand Up @@ -1454,35 +1458,7 @@ int CCD_API roxie_main(int argc, const char *argv[], const char * defaultYaml)
}
StringBuffer query;
query.appendf("<roxie format='%s'/>", format);
PerfTracer oneShotTracer;
bool traceOneShot = topology->getPropBool("expert/@perftrace", false); // MORE - check wu options too?
if (traceOneShot)
oneShotTracer.start();
roxieServer->runOnce(query.str()); // MORE - should use the wu listener instead I suspect
if (traceOneShot)
{
oneShotTracer.stop();
const char *fname = topology->queryProp("expert/@perftraceFileName");
if (!fname)
fname = "perftrace.svg";
Owned<IFile> iFile = createIFile(fname);
try
{
Owned<IFileIO> iFileIO = iFile->open(IFOcreate);
if (iFileIO)
{
StringBuffer &svg = oneShotTracer.queryResult();
iFileIO->write(0, svg.length(), svg.str());
DBGLOG("Flame graph for query written to %s", fname);
}
}
catch (IException *E)
{
VStringBuffer msg("Failed to write flame graph to %s", fname);
EXCLOG(E, msg);
::Release(E);
}
}
fflush(stdout); // in windows if output is redirected results don't appear without flushing
}
catch (IException *E)
Expand All @@ -1491,6 +1467,30 @@ int CCD_API roxie_main(int argc, const char *argv[], const char * defaultYaml)
E->Release();
}
}
if (traceOneShot)
{
oneShotTracer.stop();
const char *fname = topology->queryProp("expert/@perftraceFileName");
if (!fname)
fname = "perftrace.svg";
Owned<IFile> iFile = createIFile(fname);
try
{
Owned<IFileIO> iFileIO = iFile->open(IFOcreate);
if (iFileIO)
{
StringBuffer &svg = oneShotTracer.queryResult();
iFileIO->write(0, svg.length(), svg.str());
DBGLOG("Flame graph for query written to %s", fname);
}
}
catch (IException *E)
{
VStringBuffer msg("Failed to write flame graph to %s", fname);
EXCLOG(E, msg);
::Release(E);
}
}
}
else
{
Expand Down
1 change: 1 addition & 0 deletions tools/doperf/doperf
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ EOF

finish_perf()
{
wait 2>/dev/null
stackcollapse-elfutils.pl ${tmpdir}/stack.* > ${tmpdir}/folded
grep -f ${tmpdir}/suppress -v ${tmpdir}/folded > ${tmpdir}/suppressed
flamegraph.pl ${tmpdir}/suppressed
Expand Down

0 comments on commit 3bd0f56

Please sign in to comment.