Skip to content

jsommers/someta

Repository files navigation

SoMeta

Automatic collection of network measurement metadata.

This is a complete rewrite of SoMeta in go. The earlier (Python) version of SoMeta can be found at https://github.com/jsommers/metameasurement.

Current version is v1.4.1.

Installing

The easiest way to build and install is to do: go install github.com/jsommers/someta@latest (or replace latest with a specific version tag). This command will download and compile all dependencies, and install the someta binary in your $GOPATH/bin directory.

You can, of course, also clone the source repository and use the go toolchain to build, etc.

Configuring

SoMeta can be configured using a YAML file. This method of configuration is new in v1.3, and subsumes (nearly) all command-line parameters indicated below. Configuring via a file allows (1) inclusion of URLs and details of associated data sets used in the measurement task(s), (2) inclusion of README-like details which aren't possible with the command-line, and (3) creation of a configuration template that can easily be reused.

An example configuration is provided in the repo, named someta_config_example.yaml. It should, in general, be self-explanatory (see command-line arguments discussion below for details regarding config parameters for individual monitors).

You can also create a new yaml config using the -I option. Any command-line flags to change defaults or to configure monitors will be parsed and included in the generated config, which will be written to stdout. The -c (command) option is required; if any monitors are configured (through -M command-line arguments), those configurations will be included in the generated yaml, but if no monitors are configured, default configurations for all monitors will be included in the generated yaml. (New in v1.4.0.)

To start SoMeta with a configuration file, use the -y option:

someta -y someta_config_example.yaml

Another useful command-line parameter is -Y, which does some simple sanity checks on the configuration and exits prior to starting any monitoring. This option can be used with or without a yaml configuration.

Note that non-monitor command-line parameters are overridden by config file options. Note also that any monitor configurations provided on the command-line are added to monitors configured via the yaml file. It is recommended for your own sanity to just use the configuration file when and where possible and not to mix the two configuration styles.

Running

There are several possible command-line options. See below for a listing of all parameters (i.e., the output of someta -a. Some additional detail is below, specifically regarding monitors and options.

Usage of ./someta:

  -C int
    	Set CPU affinity (default is not to set affinity) (default -1)
  -F duration
     	Time period after which in-memory metadata will be flushed to file (default 10m0s)
  -M value
    	Select monitors to include. Default=None. Valid monitors=cpu,io,mem,netstat,rtt
  -R duration
    	Time period after which metadata output will rollover to a new file (default 1h0m0s)
  -Y	Check configuration but don't start metadata collection
  -c string
    	Command line for external measurement program
  -d	Debug output (metadata is written to stdout)
  -f string
    	Output file basename; current date/time is included as part of the filename (default "metadata")
  -l	Send logging messages to a file (by default, they go to stdout)
  -m duration
    	Time interval on which to gather metadata from monitors (default 1s)
  -q	Quiet output
  -u duration
    	Time interval on which to show periodic status while running (default 5s)
  -v	Verbose output
  -w duration
    	Wait time before starting external tool, and wait time after external tool stops, during which metadata are collected (default 1s)
  -y string
    	Name of YAML configuration file

The -c option indicates the "external" measurement tool to start. By default, SoMeta starts sleep 5, which causes SoMeta simply to collect 5 seconds-worth of metadata, given what ever monitors have been configured. You'll almost certainly need to quote the command line for the external tool, and some escaping may be required if there are embedded quotes needed for the tool (see the example with scamper, below).

The -M option specifies a monitor to start. Standard available sources include cpu, mem, io, netstat, rtt (see the monitors/ directory).

To configure a monitor, parameters may be specified along with each monitor name, each separated by a colon (:) or a comma (,). Each parameter may be a single string, or a key=value pair. The order of parameters doesn't matter.

Note that if you are using the rtt monitor with IPv6, you'll need to use comma separators because the colon key-value separator can't be distinguished from the colon separator within an IPv6 address.

Here's an example with turning on all monitors (io, netstat, cpu, mem, rtt):

sudo ./someta -M=io,disk0 -M=netstat,en0 -M=cpu -M=me -M=rtt,type=hoplimited,dest=149.43.80.25,maxttl=3,interface=en0 -R 1m -F 20s -f fulltest -m 1s -w 2s -v -c "sleep 150"

Again, type ./someta -h for a list of command line options and their defaults.

Valid parameters for each standard monitor are:

  • -M=cpu:interval=X: set the periodic sampling interval (default 1 sec)

  • -M=io:interval=X: set the periodic sampling interval (default 1 sec)

  • -M=mem:interval=X: set the periodic sampling interval (default 1 sec)

  • -M=netstat:interval=X: set the periodic sampling interval.

    Note that the interval time value is parsed by go's time.parseDuration (https://golang.org/pkg/time/#ParseDuration), so any value must also include a unit, like interval=1s (1 second interval).

    Additional string arguments to the netstat monitor can specify interface names to monitor (all interfaces are included if none are specified). For example, to monitor en0's netstat counters every 5 seconds:

    • -M=netstat:interval=5s:en0
  • -M=rtt:interface=IfaceName:rate=R:dest=D:type=ProbeType:maxttl=MaxTTL:proto=Protocol:allhops:constflow

    Monitor RTT along a path to destination D out of interface IfaceName with probe rate R. Probe interval is gamma distributed. The default destination is 8.8.8.8 and default probe rate is 1/sec.

    ProbeType can either be ping or hoplimited (default is hoplimited)

    MaxTTL is maximum ttl for hop-limited probes (pointless for ping probes).
    Default is maxttl = 1.

    Protocol is (icmp | tcp | udp) (for hop-limited probes). Default is icmp.

    allhops: probe all hops up to maxttl (for hop-limited probes)

    constflow: manipulate packet contents to force first 4 bytes of transport header to be constant (to make probes follow a constant path). This parameter only has an affect on icmp; data are appended to force the checksum to be a constant value. Note: udp/tcp probes always have const first 4 bytes.

  • -M=ss

    Monitor socket statistics using the ss tool (linux only). Thanks to Ricky Mok (CAIDA) for contributing this module.

Here are some examples:

# Monitor only CPU performance while emitting 100 ICMP echo request (ping) probes to
# www.google.com.
$ sudo ./someta -M=cpu -c "ping -c 100 www.google.com" 

# Monitor CPU performance and netstat counters (for all interfaces) for traceroute
$ sudo ./someta -M=cpu -M=netstat -c "traceroute www.google.com" 

# Monitor CPU, IO and Netstat counters for ping
# Set the metadata output file to start with "ping_google"
$ sudo ./someta -M=io -M=netstat -c "ping www.google.com" -f ping_google

# Monitor everything, including RTT for the first 3 hops of the network path toward
# 8.8.8.8.  As the external tool, use scamper to emit ICMP echo requests, dumping
# its output to a warts file.
$ sudo ./someta -M=cpu -M=mem -M=io -M=netstat:eth0 -M=rtt:interface=eth0:type=hoplimited:maxttl=3:dest=8.8.8.8 -f ping_metadata -l -c "scamper -c \"ping -P icmp-echo -c 60 -s 64\" -o ping.warts -O warts -i 8.8.8.8"

# An example with using the RTT monitor w/IPv6 (with the dummy command `sleep`).
# Note that in my example below I used an IPv6 (6-in-4) tunnel interface.
$ sudo ./someta -c "sleep 5" -M=rtt,dest="2607:f8b0:4006:805::200e",type=hoplimited,interface=he-ipv6,maxttl=6  -v

Reconfiguring

Sending SIGHUP to SoMeta will cause it to re-read its YAML configuration file. This feature is in progress.

Analyzing metadata

The analyzemeta.py script performs some simple analysis on SoMeta metadata, printing results to the console.

Reading into a Pandas DataFrame

For more complex data analyses (or, if you prefer, metadata analyses), there is a Python module read_someta.py that provides a function read_someta for reading data in a SoMeta .json file into a dictionary of Pandas DataFrame objects. There will be a different DataFrame object associated with each monitor.

For example:

>>> from read_someta import read_someta
>>> d = read_someta('fulltest_2018-05-03T18:07:11-04:00.json')
>>> d.keys()
dict_keys(['someta', 'cpu', 'mem', 'rtt', 'io', 'netstat'])
>>> d['cpu']
                                     cpu0_idle  cpu1_idle  cpu2_idle  cpu3_idle
timestamp
2018-05-03 18:07:12.978601317-04:00  62.037037  87.735849  68.224299  89.719626
2018-05-03 18:07:13.979181597-04:00  70.000000  93.069307  71.000000  96.000000
2018-05-03 18:07:14.980990941-04:00  82.828283  97.979798  86.000000  98.000000
2018-05-03 18:07:15.980368940-04:00  74.000000  96.039604  79.000000  96.000000
2018-05-03 18:07:16.981288271-04:00  69.306931  89.000000  75.000000  91.089109
...                                        ...        ...        ...        ...
2018-05-03 18:08:08.981608769-04:00  80.808081  94.000000  83.838384  90.000000
2018-05-03 18:08:09.983457489-04:00  83.000000  94.000000  86.274510  89.000000
2018-05-03 18:08:10.981178466-04:00  87.000000  97.000000  93.000000  98.000000
2018-05-03 18:08:11.983964314-04:00  70.297030  92.079208  72.000000  91.000000
2018-05-03 18:08:12.981282530-04:00  90.909091  98.000000  95.959596  99.000000

[61 rows x 4 columns]
>>>

Plotting metadata

NB: plotting tools need some updating still from the earlier Python versions.

The plotmeta.py tool is designed to help plot various metrics collected through SoMeta monitors. To see what metrics may be plotted, you can run the following::

$ python3 plotmeta.py -l meta.json

where meta.json is a SoMeta metadata file. The output of plotmeta.py with the -l option shows various items that can be plotted. Each item is organized into groups. You can either plot any number of individual items (-i option), or plot each metric for an entire group (-g option). If you want everything, use the -a option. In addition, -t option can be used to change the type of output plot. Use ecdf for empirical CDF or timeseries for simple scatter plot with timeline (which is default output of the plot tool). See plotmeta.py -h for all options.

Here are some examples::

$ python3 plotmeta.py -t ecdf -i cpu:idle -i io:disk0_write_time meta.json
$ python3 plotmeta.py -t timeseries -g cpu meta.json
$ python3 plotmeta.py -a meta.json

Changes

Changes from the earlier Python version of SoMeta:

  • Because of Go's command-argument handling, flags to someta cannot be written like -Mcpu, but must rather be written as -M=cpu or -M cpu.
  • CPU affinity is not yet implemented
  • Metadata structure is changed to permit a less tightly-coupled architecture between the someta main and monitors
    • The plotting tool hasn't been updated yet to handle these changes, though the basic analysis tool has been updated.
  • There's even more rich data collected about the system when someta starts up

v1.3

  • Addition of yaml configuration method
  • Some minor other code cleanup
  • Documentation update
  • Addition of commandlinetool monitor, which subsumes the contributed ss monitor

v1.4

  • Addition of -I flag to create a new config and dump to stdout

Credits

I gratefully acknowledge support from the National Science Foundation. The materials here are based upon work supported by the NSF under grant 1814537 ("NeTS: Small: RUI: Automating Active Measurement Metadata Collection and Analysis").

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

License

Copyright 2018-21 SoMeta authors. All rights reserved.

The SoMeta software is distributed under terms of the GNU General Public License, version 3. See below for the standard GNU GPL v3 copying text.

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.