Skip to content

Scripts and Helper Functions

Alicia Hotovec-Ellis edited this page Mar 6, 2018 · 5 revisions

Usage for all scripts can be viewed from the command line as follows:

>> python script.py -h

where script.py is the name of a script below. The most important optional flags are generally -v for additional verbosity and -c config.cfg to specify a configuration file.

Primary Scripts

initialize.py

Run this script first to initialize the hdf5 table where everything will be stored. This must be run before either catfill.py or backfill.py. While its primary function is to create the hdf5 file and its tables, it also creates folders if they do not exist.

Warning: Running this script will overwrite an existing table with the same name defined by filename in the configuration file!

catfill.py

Run this script to fill the table with data from the past using a catalog of events in a csv file. The file must have a header row with a column named 'Time UTC', and if there are other columns everything should be comma separated. An example is given in mshcat.csv. The script retrieves a window of data around the time of the event and then processes it the same as backfill.py—including triggering. Orphans are given an expiration time, but are not expired until backfill.py is run. This script is mostly useful for adding known events of interest over many years without waiting to run years of continuous data through backfill.py. Because of the way it handles calling data, it is not well-suited to processing large numbers of events in short periods of time (e.g., swarms or eruptions).

backfill.py

Run this script to fill the table with continuous data from the past. If a start time is not specified, it will check the attributes of the repeater table to pick up where it left off. Additionally, if this is the first run and a start time is not specified, it will assume one time chunk prior to the end time. If an end time is not specified, "now" is assumed. The end time updates at the end of each time chunk processed (default: by hour, set in configuration). This script can be run as a cron job that will pick up where it left off if a chunk is missed. Use -n if you are backfilling with a large amount of time; it will consume less time downloading the data in small chunks if NSEC is an hour or a day instead of a few minutes, but at the cost of keeping orphans for longer.

Helper Scripts

forcePlot.py

Run this script to manually run the plotting functions. Did you kill a run partway through and not let it finish, yet still want to see the output? Did you update some settings corresponding to plotting? Run forcePlot.py and it will plot anything that needs plotting. If you wish to plot EVERYTHING (for example, if you decided to change checkComCat to True and want to update ALL families to reflect this), you will need to append the -a flag for 'all'. Depending on the size of your table this can be time consuming as the main script only updates plots that correspond to families that have changed to save time.

createReport.py

Run this script to manually produce a more detailed "report" page for a given family (or families). The report currently plots all stored waveforms on all stations, makes the timeline plots interactive, and fully computes the cross-correlation matrix. My intent with the report was to allow for slightly more depth in analysis for an individual family. Let me know if you want to see additional features here!

compareCatalog.py

Takes a csv file (same format requirements as catfill.py) and compares the times of events in this catalog with the contents of the repeater, orphan, junk, and trigger (for expired orphans) tables. It associates events within winlen/samprate seconds, and appends to the csv file what it thinks is the closest match and the number of seconds it is off by. If it is a repeater, it will append the number corresponding to the cluster, otherwise text (orphan, expired, or junk). The name of the output file will be matches_groupName.csv, where groupName is defined in the configuration file.

removeFamily.py

Run this script to manually remove families/clusters (e.g., correlated noise that made it past the 'junk' detector). Reclusters and remakes images when done. Basically, feed it the numbers of clusters you wish to remove and never see again. This will change the numbering scheme of your clusters!

removeFamilyGUI.py

The same as removeFamily.py, but with a GUI that displays the preview image from the top of each cluster page with a checkbox. If checked, the image will turn red and the corresponding will be up for deletion once the user either presses Enter or the button at the bottom to 'Remove Checked'. One of the tricky parts is that there are a limited number of clusters it can show, so you may need to use the -n and -m flags. -n controls how many columns are displayed (small monitor? use -n 2 or even -n 1). Only 255 rows may be displayed, so use -m to set the minimum cluster number to view (e.g., -m 512 if you are using -n 2, have more than 512 families, and wish to delete clusters with numbers above 512). You can use the Escape key to close the window or the 'Cancel' button at the bottm. Mouse scrolling is enabled, but my experience is that it only allows scrolling down.

extendTable.py

Run this script to create space for additional stations while preserving data in an existing table. Additional stations should always be included at the end of the station list; reordering that list is currently not supported. Running this script will overwrite any existing table with the same name defined by filename in the new .cfg file. If the table names in both .cfg files are the same, the original table will be renamed and then deleted. All output files are also remade to reflect the additional station. Use this if a new, important station is added to your network that you wish to use without rerunning anything. It fills all the data for that station with zeros (that is, if the station existed before it will not add data from it).

plotJunk.py

Run this script to output the contents of the junk table for troubleshooting. A folder inside groupName called junk is filled with images. The names of the images are the times of the triggers with a code at the end corresponding to the type of junk trigger: 0 for possible teleseisms, 1 for spikes/harmonic signals, and 2 for triggers that were labeled as both. The contents are the waveforms from all stations/channels concatenated together. The same times and codes are also output in junk.txt.

clearJunk.py

Run this script to clear the contents of the junk table, in case you wish to free up some space and no longer wish to keep the contents for troubleshooting.

Example Shell Script for Cron Job

Edit the contents of your crontab and add a line similar to:

*/15 * * * * /bin/bash -l -c '/path/to/runREDPy.sh'

to run the script runREDPy.sh every 15 minutes. Obviously, replace /path/to/ with the actual path to the script (same below with the path to REDPy).

Below are the contents of runREDPy.sh:

#!/bin/bash
# Script for running REDPy as crontab

# cd to correct path
cd /path/to/REDPy

# (Optional) Activate environment
source activate redpy

# Run REDPy
python backfill.py -v -c config.cfg > example/out.txt

This will automatically update the run specified by config.cfg every 15 minutes (including updating all of the outputs), and will write standard out to example/out.txt with the usual stats for the run. Change the paths and names of files to your use. If you have many instances (e.g., multiple volcanoes), you can either use multiple shell scripts and crontab lines or call python many times within a single shell script. If you have enough memory to handle multiple runs, go for the multiple scripts option. Otherwise, use a single script to run them in serial.