Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add age & machine and readable outputs for ldms_ls: json/tab #1568

Merged
merged 2 commits into from
Jan 21, 2025

Conversation

baallan
Copy link
Collaborator

@baallan baallan commented Dec 22, 2024

This extends (on main) ldms_ls to generate json or tabbed output for machine consumption such as python (json, pandas) and cut or spreadsheets.

In the json and tabbed formats, the age of sets at the last update is included in the outputs triggered by the -v flag. Age is reported in seconds and in intervals (where expected interval hint is available).
The addition of age data is not made to the base pretty-print -v, as this would break existing scripts that derive data from ldms_ls -v.

@baallan baallan added this to the v4.5.1 milestone Dec 22, 2024
@baallan baallan requested review from jennfshr and bschwal December 22, 2024 00:15
@tom95858
Copy link
Collaborator

@baallan, I'm assuming that this is to make the ldms_ls output easier to parse? Let's bring up this issue with the UG and see what they think. I'm fairly certain that there are many groups parsing this output. So for example: CSV, quoting some column values, putting a comment character in front of the heading and footer lines, may also be good options. So I think that this is a really good idea, but let's get some input on the various formatting options. I would think that JSON is an obviously good option, so maybe we do that first and then add the other formats in separate pull requests?

@baallan
Copy link
Collaborator Author

baallan commented Dec 23, 2024

@tom95858 the json and tab formats are both ready to go, specified with -f tab or -f json. All the permutations and extensions you mention are easily accommodated either by adding additional values other than 'tab' and 'json' (in another pr) or immediately by running the tab output through sed/awk/grep to replace tabs/headers as desired.

That said, no harm discussing at a UG meeting. I'm appending some examples of why tab delimited output is immediately interesting to folks writing production scripts that manage ldmsd. So immediate, that it may want back-
porting to 4.4.x.

interesting examples using the current implementation:

get the schema name list

ldms_ls -f tab -p 411 -a ovis -v | cut -f1 |sort -u |grep -v ^#|grep -v '^[0-9].*$'

dstat_37
filesingle_amber_login
jobid
lnet_stats
loadavg
lustre_client
lustre_mdc_ops_timing
meminfo
opa2
procnet
procnfs
procstat_224
vmstat

get the schemas sorted by length of time to get the sample

ldms_ls -f tab -p 411 -a ovis -v |cut -f 1,11 |sort -unr -k  2 |grep -v ^# 

filesingle_amber_login  0.013861
lustre_mdc_ops_timing   0.000565
lustre_mdc_ops_timing   0.000536
lustre_mdc_ops_timing   0.000534
lustre_mdc_ops_timing   0.000521
procstat_224    0.000306
opa2    0.000246
vmstat  0.000181
dstat_37        0.000150
meminfo 0.000027
procnfs 0.000017
lustre_client   0.000013
lustre_client   0.000011
loadavg 0.000008
procnet 0.000001

get the sets late in reporting (often due to an agg issue)

ldms_ls -h host -f tab -p 411 -a ovis -v |cut -f 2,13 | grep host | grep -Pv '\t0$'

host/meminfo    3
host/loadavg    3

get the interval and offset last hinted; check that agg config is working as expected

ldms_ls -f tab  -v -h host |grep host |cut -f 2,14

host/vmstat     "updt_hint_us"="60000000:0"
host/procstat_224       "updt_hint_us"="60000000:0"
host/procnfs    "updt_hint_us"="60000000:0"
host/procnet/lo "updt_hint_us"="60000000:0"
host/procnet/ib0        "updt_hint_us"="60000000:0"
host/procnet/eth4       "updt_hint_us"="60000000:0"
host/procnet/eth0       "updt_hint_us"="60000000:0"
host/procnet/cni-podman0        "updt_hint_us"="60000000:0"
host/meminfo    "updt_hint_us"="60000000:0"
host/loadavg    "updt_hint_us"="60000000:0"
host/lnet_stats "updt_hint_us"="60000000:0"
host/jobid      "updt_hint_us"="60000000:0"
host/hfi1_0/1   "updt_hint_us"="60000000:0"
host/filesingle "updt_hint_us"="60000000:0"
host/dstat      "updt_hint_us"="60000000:0"

@tom95858
Copy link
Collaborator

@baallan, the only one that gives me pause is TAB? Why not CSV instead?

@baallan
Copy link
Collaborator Author

baallan commented Dec 24, 2024

@baallan, the only one that gives me pause is TAB? Why not CSV instead?

Unix utilities (cut, sort, etc) work on tab separators by default.
excel and openoffice spreadsheets recognize tabs as csv-equivalent automatically or with an easily chosen option (depending on how old a version of the spreadsheet is being used).
The python csv library will parse tabbed data as csv with an option, if not automatically.
So basically TSV is CSV for all practical purposes, and, unlike CSV, TSV is not vulnerable to commas that appear in data values.

But that said, there is an ietf4180 standard that was added to the csv store and could easily be added to the ldms_ls -f option list, also. I would prefer not to hold up getting json/tab outputs for it, though.

@baallan
Copy link
Collaborator Author

baallan commented Jan 8, 2025

An example (with some metrics elided for length) of the json format with the -l option and after that with both -l -v:

ldms_ls -p 411 -a ovis  -I node-3/dstat -f json -l |json_pp 
{
   "sets" : [],
   "long_sets" : [
      {
         "metrics" : [
            {
               "value" : 2120003,
               "units" : "",
               "kind" : "u64",
               "type" : "M",
               "name" : "component_id"
            },
            {
               "kind" : "u64",
               "type" : "D",
               "value" : 0,
               "units" : "",
               "name" : "job_id"
            },
            {
               "type" : "D",
               "kind" : "u64",
               "units" : "",
               "value" : 0,
               "name" : "app_id"
            },
            {
               "name" : "rchar",
               "kind" : "u64",
               "type" : "D",
               "value" : 2398460260,
               "units" : ""
            },
            {
               "name" : "wchar",
               "value" : 46852717,
               "units" : "",
               "kind" : "u64",
               "type" : "D"
            },
            {
               "name" : "syscr",
               "type" : "D",
               "kind" : "u64",
               "units" : "",
               "value" : 8162209
            },
            {
               "name" : "syscw",
               "units" : "",
               "value" : 8162209,
               "type" : "D",
               "kind" : "u64"
            },
            {
               "units" : "",
               "value" : 843776,
               "type" : "D",
               "kind" : "u64",
               "name" : "read_bytes"
            },
            {
               "name" : "write_bytes",
               "type" : "D",
               "kind" : "u64",
               "units" : "",
               "value" : 0
            },
            {
               "kind" : "u64",
               "type" : "D",
               "value" : 0,
               "units" : "",
               "name" : "cancelled_write_bytes"
            },
            {
               "type" : "M",
               "kind" : "u64",
               "units" : "",
               "value" : 101598,
               "name" : "pid"
            },
            {
               "value" : 1,
               "units" : "",
               "kind" : "u64",
               "type" : "M",
               "name" : "ppid"
            },
            {
               "kind" : "u64",
               "type" : "D",
               "value" : 34182,
               "units" : "",
               "name" : "minflt"
            },
            {
               "units" : "",
               "value" : 0,
               "type" : "D",
               "kind" : "u64",
               "name" : "cmajflt"
            },
            {
               "name" : "utime",
               "value" : 12336,
               "units" : "",
               "kind" : "u64",
               "type" : "D"
            },
            {
               "type" : "D",
               "kind" : "u64",
               "units" : "",
               "value" : 31744,
               "name" : "stime"
            },
            {
               "type" : "D",
               "kind" : "s64",
               "units" : "",
               "value" : 0,
               "name" : "cutime"
            },
            {
               "name" : "cstime",
               "units" : "",
               "value" : 0,
               "type" : "D",
               "kind" : "s64"
            },
            {
               "value" : 20,
               "units" : "",
               "kind" : "s64",
               "type" : "D",
               "name" : "priority"
            },
            {
               "name" : "nice",
               "kind" : "s64",
               "type" : "D",
               "value" : 0,
               "units" : ""
            },
            {
               "name" : "num_threads",
               "type" : "D",
               "kind" : "u64",
               "units" : "",
               "value" : 12
            },
            {
               "name" : "vsize",
               "kind" : "u64",
               "type" : "D",
               "value" : 846499840,
               "units" : ""
            },
            {
               "value" : 8462,
               "units" : "",
               "kind" : "u64",
               "type" : "D",
               "name" : "rss"
            },
            {
               "type" : "D",
               "kind" : "u64",
               "units" : "",
               "value" : 45,
               "name" : "fd_count"
            },
            {
               "name" : "fd_max",
               "value" : 44,
               "units" : "",
               "kind" : "u64",
               "type" : "D"
            },
            {
               "type" : "D",
               "kind" : "u64",
               "units" : "",
               "value" : 20,
               "name" : "fd_socket"
            },
            {
               "value" : 2,
               "units" : "",
               "kind" : "u64",
               "type" : "D",
               "name" : "fd_dev"
            },
            {
               "kind" : "u64",
               "type" : "D",
               "value" : 9,
               "units" : "",
               "name" : "fd_anon_inode"
            },
            {
               "name" : "fd_pipe",
               "value" : 8,
               "units" : "",
               "kind" : "u64",
               "type" : "D"
            },
            {
               "type" : "D",
               "kind" : "u64",
               "units" : "",
               "value" : 6,
               "name" : "fd_path"
            }
         ],
         "dir_info" : {
            "meta_gn" : 4,
            "duration" : {
               "sec" : 0,
               "usec" : 345
            },
            "flags" : "CR ",
            "name" : "node-3/dstat",
            "heap_size" : 0,
            "uid" : 0,
            "info" : [
               {
                  "value" : "60000000:0",
                  "key" : "updt_hint_us"
               }
            ],
            "card" : 45,
            "timestamp" : {
               "usec" : 56480,
               "sec" : 1736359380
            },
            "schema" : "dstat_37",
            "data_gn" : 3147904,
            "gid" : 0,
            "array_card" : 1,
            "digest" : "1048B15B2630F4F6616A1E4D4D44C40FFCD55DE6ABB81165601D013650A7C883",
            "meta_size" : 2296,
            "data_size" : 432,
            "perm" : "-rw-r--r--"
         }
      }
   ]
}

with -l -v (again metrics elided for length)

ldms_ls -p 411 -a ovis  -I node-3/dstat -f json -v -l | json_pp

{
   "long_sets" : [
      {
         "dir_info" : {
            "meta_size" : 2296,
            "info" : [
               {
                  "value" : "60000000:0",
                  "key" : "updt_hint_us"
               }
            ],
            "name" : "node-3/dstat",
            "meta_gn" : 4,
            "uid" : 0,
            "flags" : "CR ",
            "data_size" : 432,
            "timestamp" : {
               "sec" : 1736359680,
               "usec" : 12762
            },
            "schema" : "dstat_37",
            "duration" : {
               "sec" : 0,
               "usec" : 324
            },
            "perm" : "-rw-r--r--",
            "gid" : 0,
            "data_gn" : 3148119,
            "heap_size" : 0,
            "digest" : "1048B15B2630F4F6616A1E4D4D44C40FFCD55DE6ABB81165601D013650A7C883",
            "card" : 45,
            "array_card" : 1
         },
         "metrics" : [
            {
               "units" : "",
               "type" : "M",
               "name" : "component_id",
               "kind" : "u64",
               "value" : 2120003
            },
            {
               "name" : "job_id",
               "value" : 0,
               "kind" : "u64",
               "units" : "",
               "type" : "D"
            },
            {
               "type" : "D",
               "units" : "",
               "kind" : "u64",
               "value" : 0,
               "name" : "app_id"
            },
            {
               "units" : "",
               "type" : "D",
               "name" : "rss",
               "kind" : "u64",
               "value" : 8462
            },
            {
               "type" : "D",
               "units" : "",
               "kind" : "u64",
               "value" : 45,
               "name" : "fd_count"
            },
            {
               "name" : "fd_max",
               "value" : 44,
               "kind" : "u64",
               "units" : "",
               "type" : "D"
            },
            {
               "units" : "",
               "type" : "D",
               "name" : "fd_pipe",
               "value" : 8,
               "kind" : "u64"
            },
            {
               "kind" : "u64",
               "value" : 6,
               "name" : "fd_path",
               "type" : "D",
               "units" : ""
            }
         ]
      }
   ],
   "sets" : [
      {
         "data_size" : 432,
         "heap_size" : 0,
         "flags" : "CL ",
         "uid" : 0,
         "instance" : "node-3/dstat",
         "gid" : 0,
         "perm" : "-rw-r--r--",
         "schema" : "dstat_37",
         "duration" : 0.000324,
         "update" : "1736359680.012762",
         "info" : {
            "updt_hint_us" : "60000000:0"
         },
         "age_seconds" : 47.963,
         "age_intervals" : 0,
         "meta_size" : 2296
      }
   ],
   "memory" : {
      "memory_kb" : 2.728,
      "data_kb" : 0.43,
      "meta_data_kb" : 2.3,
      "total_sets" : 1
   }
}

@baallan
Copy link
Collaborator Author

baallan commented Jan 8, 2025

And an example of the tab output without filters (-vvl -f tab).
Note the tab format, like -l, has multiple tables corresponding to multiple output blocks.
Each has its own header line prefixed with #.
In this example, the set does not have unit string, so the unit column is empty

ldms_ls -p 411 -a ovis  -I node-3/dstat -f tab  -vvl

#hostname       ip_address      port    transport
localhost       127.0.0.1       411     sock
#schema_digest  schema  instance        flags   msize   dsize   hsize   uid     gid     perm    update  duration        age_seconds     age_intervals   info
1048B15B2630F4F6616A1E4D4D44C40FFCD55DE6ABB81165601D013650A7C883        dstat_37        node-3/dstat    CL      2296    432     0       0       0       -rw-r--r--      1
736360700.060544        0.000333        11.327  0       "updt_hint_us"="60000000:0"
#total_sets     meta_data_kb    data_kb memory_kb
1       2.30     0.43   2.73
#instance       consistent      update  update.usec     push    last_push
node-3/dstat    1       Wed Jan 08 11:25:00 2025 -0700  60544   0       0
#type   kind    name    value   unit
M       u64     component_id    2120003
D       u64     job_id  0
D       u64     app_id  0
D       u64     rchar   2399089400
D       u64     wchar   46866797
D       u64     syscr   8164165
D       u64     syscw   8164165
D       u64     read_bytes      843776
D       u64     write_bytes     0
D       u64     cancelled_write_bytes   0
M       u64     pid     101598
M       u64     ppid    1
D       u64     minflt  34182
D       u64     cminflt 0
D       u64     majflt  55
D       u64     cmajflt 0
D       u64     utime   12339
D       u64     stime   31753
D       s64     cutime  0
D       s64     cstime  0
D       s64     priority        20
D       s64     nice    0
D       u64     num_threads     12
D       u64     vsize   846499840
D       u64     rss     8462
D       u64     rsslim  18446744073709551615
D       u64     signal  0
D       u64     processor       6
D       u64     rt_priority     0
D       u64     policy  0
D       u64     delayacct_blkio_ticks   0
D       u64     VmSize  206665
D       u64     VmRSS   8462
D       u64     share_pages     1262
D       u64     text_pages      66
D       u64     lib_pages       0
D       u64     data_pages      30720
D       u64     dirty_pages     0
D       u64     fd_count        45
D       u64     fd_max  44
D       u64     fd_socket       20
D       u64     fd_dev  2
D       u64     fd_anon_inode   9
D       u64     fd_pipe 8
D       u64     fd_path 6

@nichamon
Copy link
Collaborator

@baallan I found that the -f jsongenerates the -l output for sets containing list and records correctly. Here is an example of -l for procnetdev2.
image

The output for meminfo is correct.
image

ldms/man/ldms_ls.man Show resolved Hide resolved
.nf
.RS
ldms_ls -h vm1 -x sock -p 60000 -v -f tab
#schema instance flags msize dsize hsize uid gid perm update duration age_seconds age_intervals info
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 'age_seconds' and 'age_intervals' are not self-explanatory, they may benefit from brief description in the man page. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fixed that if i've missed it.

@baallan
Copy link
Collaborator Author

baallan commented Jan 14, 2025

@baallan I found that the -f jsongenerates the -l output for sets containing list and records [in]correctly. Here is an example of -l for procnetdev2.

@nichamon is there anything used to configure specially procnetdev2, or is it just typical instance/schema parameters?

@nichamon
Copy link
Collaborator

nichamon commented Jan 14, 2025

@baallan I found that the -f jsongenerates the -l output for sets containing list and records [in]correctly. Here is an example of -l for procnetdev2.

@nichamon is there anything used to configure specially procnetdev2, or is it just typical instance/schema parameters?

@baallan I gave <instance name> to tell ldms_ls to report the output of the set. procnetdev2 was an example and how I found that -f json -l doesn't work properly with list of records.

@tom95858 tom95858 merged commit 0c19380 into ovis-hpc:main Jan 21, 2025
14 checks passed
@baallan
Copy link
Collaborator Author

baallan commented Jan 23, 2025

@tom95858 @nichamon/the ldms-ug identified incompletenesses in this (formatting of record-list data, details in the man page, user-specified separator instead of just tab). If you want to back out the commits, that's fine. If you just want a second pr to fix the other items, i can open a bug to document the problems and finish the changes i'm already working on to address the feedback.

@tom95858
Copy link
Collaborator

@nichamon, can you please clarify your comment regarding the json formatting with lists of records. The comment says it formats the list of record data 'correctly.' Did you mean 'incorrectly?'

@nichamon
Copy link
Collaborator

@nichamon, can you please clarify your comment regarding the json formatting with lists of records. The comment says it formats the list of record data 'correctly.' Did you mean 'incorrectly?'

@tom95858 It was a typo like you pointed out. Yes, I meant the output was 'incorrect'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants