Shredder Guide

This guide will attempt to outline the use of the UBMoD shredder command line utility.

General Usage

In order to make data available to the UBMoD portal you will need to use the shredder utility. If you followed the install guide, you will have already used the shredder to populate your database. In addition to the install process, this program is typically used once a day to add jobs from the the previous day to the database.

Help

To display the shredder help text from the command line:

$ ubmod-shredder -h
Verbose Output

By default the UBMoD shredder only outputs what it considers to be warnings or errors. If you would like to see informational output about what is being performed, use the verbose option:

$ ubmod-shredder -v ...
Shredding and Updating

The UBMoD shredder performs two separate tasks. It parses log files and inserts the parsed data into the database (shredding) and updates the aggregate database tables which are used to speed up database queries that are performed by the portal and would otherwise be much slower. Both of these tasks must be completed to make data available through the portal.

It is possible to perform these two tasks sequently during one invocation of the UBMoD shredder by using both options at the same time. If you have more than one cluster (and therefore more than one set of log files), you may shred the log files for each cluster, then perform the update once after that.

$ ubmod-shredder -s ...
$ ubmod-shredder -u ...
$ ubmod-shredder -s -u ...
Log Format

You must specify the format of the log files you are shredding. This is dependant on the resource manager you use. For TORQUE and OpenPBS use "pbs" and for Sun Grid Engine use "sge".

$ ubmod-shredder -s -f pbs ...
$ ubmod-shredder -s -f sge ...
Input Source

Files may be shredded one at a time:

$ ubmod-shredder -s -i file ...

An entire directory of files can be shredded, but the names of the files must be formatted as YYYYMMDD (e.g. 20120101).

The log file for the current day will be ignored (along with any files that correspond to future dates). This is intended to prevent the shredding of partial log files.

If the database is empty all files that meet the above constraints will be shredded. If there is data in the database, only files dated after the date of the most recent job will be shredded. If a hostname is specified during the shredding process, the database will be checked for jobs that correspond to that host. This allows the shredder to be used with multiple directories that each correspond to a different host.

$ ubmod-shredder -s -d directory ...

The UBMoD shredder can also accept input from standard input:

$ ubmod-shredder -s ... - < file
$ cat file | ubmod-shredder -s ... -
Host Name

If you are using SGE, you must specify a host name (the name of your cluster) during the shredding process. This can be any string and will be used in the list of clusters. If you are using TORQUE or PBS is possible to override any host name that appears in the log file and use whatever name you prefer.

$ ubmod-shredder -s -H mycluster ...
Time Interval End Date

By default, the UBMoD shredder will use yesterday's date as the end date for the various time intervals (e.g. last 30 days) that are available in the portal. An arbitratry date in the YYYY-MM-DD format can be specified during the aggregation process.

$ ubmod-shredder -u -e 2012-01-01 ...