Skip to content

Feature: PCP Integration

Garrett LeSage edited this page Feb 27, 2018 · 20 revisions

Intro

PCP, short for "Performance Co-Pilot", is a system performance and analysis framework. It collects performance-related statistics from multiple hosts and operating systems, including popular Linux distributions, UNIX variants, Windows, and macOS.

Through its plugin architecture, PCP records data not just about host information (disk, network, memory), but also collects stats from Apache, MySQL, Java VM, KVM, etc.

PCP is used for both live and historical data.

Use cases

Use PCP for viewing data in Cockpit

Without PCP, Cockpit only displays while it is running. When PCP is installed, statistics may be pulled from before Cockpit has been signed into.

This is already implemented to some extent.

However, there are a few issues currently:

  • It is not obvious when PCP is installed and active versus standard charts in Cockpit
  • It is not obvious that installing PCP will improve the charts in Cockpit

Time

Graphs in Cockpit extend to the past 5 minutes, and this works with and without PCP.

PCP would let us look for specific historical events as well. We might want to consider checking the past 24 hours for problem activities.

Examples:

  • Low available memory
  • Active swap
  • High load average
  • High disk IO
  • Network-related issues (latency, outages, etc.)

Timeframe for these warnings should probably be between the past 24 hours to past week.

PCP dashboard

View simple PCP-based stats from current machine. We're not going to go into the ultra-configurable route like Grafana. (If people want that, Grafana exists and can be used in parallel.)

This would probably replace the separate CPU/Memory/Storage views.

Install PCP for usage with another tool

PCP is useful not just for Cockpit, but for other tools.

  • Should PCP be installable via the "Applications" section as well as through the upcoming PackageKit lib?

Combined statistics from multiple machines that use PCP

Simply combing all the data from multiple machines gets noisy. It should be possible to show exceptional events from various servers here as well, similar to the host-specific view.

New Concepts

In addition to modifying our charts, we might want to consider:

  • Review past 24 hours (week too?) in a sped-up playback
  • Show exceptional data (spikes and the times of spikes)
  • Instead of customization, have different modes of charts in tabs? (Example: Flip between representations of CPUs.)

Recommendations

  • Avoid using the phrase PCP in the interface; "PCP" is problematic and the common name for phencyclidine (aka: "angel dust")
  • Rework host summary page to de-emphasize charts and highlight essential machine information first
    • Use the rest of available space for overview charts
  • Make it obvious when PCP isn't available (by default) versus when it is available
    • Also make it easy to install PCP when it isn't available
  • Main graphs should have a quick overview of recent data
  • Secondary page should have more in-depth information
Clone this wiki locally