Preface

This is the manual for the NetLogger Toolkit. For more details, downloads, etc. please refer to the NetLogger web pages at http://acs.lbl.gov/NetLoggerWiki/

Conventions

Italic

Used for file and directory names, email addresses, and new terms where they are defined.

Constant Width

Used for code listings and for keywords, variables, functions, command options, parameters, class names, and HTML tags where they appear in the text. Used with double quotes for literal values like True, 10 and netlogger.modules. In code listings, user input to the terminal will be prefixed with a $.

Constant Width Italic

Used to indicate items that should be replaced by actual values.

Link text

Used for URLs and cross-references.

Overview

Anyone who has ever tried to debug or do performance analysis of complex distributed applications knows that it can be a very difficult task. Problems may be in many various software components, hardware components, networks, the OS, etc.

NetLogger is designed to make this easier. NetLogger is both a methodology for analyzing distributed systems, and a set of tools to help implement the methodology.

Methodology: Logging Best Practices

The NetLogger methodology, also called the Logging Best Practices (BP), is documented in detail at http://www.cedps.net/index.php/LoggingBestPractices. The following is a brief summary:

Terminology

For clarity here are some definitions of terms which are used throughout the NetLogger documentation.

event

A uniquely named point of interest within a given system occurring at a specific time. An event is also a required attribute of each NetLogger log entry.

log

A file containing logging events or a stream of such events.

log entry

A single line within a log corresponding to a single event.

attribute

A detailed characteristic of an event.

name/value pair

How attributes are identified within a log entry — a name with the given value separated by a “+=+” (equality symbol).

For example, the following shows that the log file my.log contains one log entry with an event of something.happened. This event has three attributes, represented in the log entry by the name/value pairs whose names are ts, event, and level.

$ cat my.log
ts=2008-10-10T19:24:35.508249Z event=something.happened level=Info
Practices

All logs should contain a unique event attribute and an ISO-formatted timestamp (See ISO8601). System operations that might fail or experience performance variations should be wrapped with start and end events. All logs from a given execution context should have a globally unique ID (or GUID) attribute, such as a Universal Unique Identifier (UUID) (see RFC4122). When multiple contexts are present, each one should use its own identifying attribute name ending in .id.

Errors

A reserved status integer attribute must be used for all end events, with "0" for success and any other value for failure or partial failure. The default severity of a log message is informational, other severities are indicated with a level attribute.

Format

Each log entry should be composed of a single line of ASCII name=value pairs (aka attributes); this format is highly portable, human-readable, and works well with line-oriented tools.

Naming

For event attribute names we recommend using a ‘.’ as a separator and go from general to specific; similar to Java class names.

A sample job submit start/end log in this format would look like the following:

ts=2006-12-08T18:39:19.372375Z event=org.job.submit.start user=dang job.id=37900
ts=2006-12-08T18:39:23.114369Z event=org.job.submit.end user=dang job.id=37900 status=0

The addition of log file grammar such as the name-value attribute pair structure encourages more regular and normalized representations than natural language sentences commonly found in ad-hoc logs.

For example, a message like error: read from socket on foobar.org:1234: remote host baz.org:4321 returned -1 would be:

ts=2006-12-08T18:48:27.598448Z event=org.my.myapp.socket.read.end level=ERROR status=-1 host=foobar.org:1234 peer=baz.org:4321

The open source NetLogger Toolkit is a set of tools to implement this methodology.

Tools

The tools included with NetLogger can be grouped in four main areas:

  • Logging APIs: C, Java, Perl, Python, and UNIX shell

  • NetLogger Pipeline: Parse, load, and analyze logs using a relational database and the R data analysis language.

  • Bottleneck detection: Test disk/network for bottleneck in WAN transfers.

  • Utilities: Monitoring probes, a log receiver (netlogd), and some other pieces that are occasionally useful.

Installation

The NetLogger Toolkit has separate installation instructions for each language. The data parsing and analysis tools are part of the Python installation.

For general download instructions, see https://sites.google.com/a/lbl.gov/netlogger/software.

System requirements

Operating System

NetLogger has been tested on UNIX and Mac OSX. The Python code should work on Windows with some modifications, but this is not a priority for our development.

NTP

All monitored hosts should use NTP (http://www.ntp.org), or the equivalent, for clock synchronization

Install C

Below are instructions for installing the C instrumentation API and the nlioperf program.

# Run configure; make; make install
cd c
./configure --prefix=/your_install_path
make
make install

Install Java

Prerequisites

Java 1.5 or above (http://java.sun.com) for the Java instrumentation

Install

Below are instructions for installing the Java instrumentation API.

# Build JAR file
cd java
ant jar
# Copy jarfile into desired spot
cp netlogger-java-trunk.jar /your_install_path/netlogger.jar
# Then set your classpath
csh% setenv CLASSPATH $CLASSPATH:/your_install_path
# .. OR ..
sh$ export CLASSPATH=$CLASSPATH:/your_install_path

Install PERL

Prerequisites

PERL version 5 or higher (http://www.perl.org) is required.

The PERL UUID module is required. You can install this from CPAN:

perl -MCPAN -e "install Data::UUID"

Install

Below are instructions for installing the PERL instrumentation API.

cd perl
# Run PERL's standard install sequence
perl Makefile.PL
make
make test
make install

Install Python

Prerequisites

The following Python modules may be needed by the NetLogger pipeline to interact with the database. To install these modules, either use a package manager such as Debian’s APT, the RedHat/etc. yum, FreeBSD ports, etc., use Python’s easy_install command from setuptools or download and install from source. The easy_install command and download URL are given below.

Install

Below are instructions for installing the Python instrumentation API and tools.

  • Install from PyPi

    easy_install netlogger
  • Install from source

    cd python
    # Run Python's standard install sequence
    python setup.py build
    python setup.py install

Install R

There is no NetLogger R instrumentation API, but we do use R to analyze the data (see the SQL and R analysis section).

Prerequisites

Version

R version 2.6.0 or higher is required. The latest version of R is recommended, though, particularly if you are going to use ggplot2. Windows binaries and Debian, Redhat, Ubuntu and SuSE packages are available. For other platforms or the latest/greatest, R compiles from source on most platforms. See your local Comprehensive R Archive Network (CRAN) mirror to download any of the above.

Packages

A number of R packages are required to run the NetLogger R programs. Instructions follow on how to install them from within R.

  • Start R

    $ R
  • Choose a mirror (you only need to do this once):

    > chooseCRANmirror()
  • Download and install the packages:

    install.packages(c("lattice","latticeExtra", "Hmisc","RMySQL", "RSQLite", "ggplot2"), dependencies = TRUE)

Install

To use the package in R, simply load it by name. To get help, use the standard R help facility.

Instrumentation APIs

NetLogger has instrumentation APIs to produce Best Practices (BP) formatted logs for C/C++, Java, Perl, and Python.

C API

The C API documentation is auto-generated from the source code using Doxygen.

Java API

The Java API documentation is auto-generated from the source code using Javadoc.

Perl API

The Perl API documentation is auto-generated from the source code using pod2html.

Python API

The Python API documentation is auto-generated from the source code using epydoc.

Syslog-NG

Syslog-NG, available from http://www.balabit.com/network-security/syslog-ng/, is a flexible and scalable system logging application that can act a as a drop-in replacement for standard syslog.

A syslog-ng server can send local data over the network (TCP or UDP), receive network data and log it locally, or do both. syslog-ng receivers can be configured to aggregate and filter logs based on program name, log level and even a regular expression on message contents. It is very scalable: if a particular receiver gets over-loaded, one can just bring up another receiver on a another machine and send half the logs to each. syslog-ng supports fully qualified host names and time zones, which standard syslog does not. Standard syslog could also be used, but only for single site deployments.

We recommend syslog-ng 2.0 over syslog-ng 1.6 because of the new ISO date option, which is needed for logging across multiple time zones. To download syslog-ng, go to: http://www.balabit.com/downloads/files/syslog-ng/sources/stable/src/.

Here is a commented sample syslog-ng 2.0 sender configuration file. For pre-packaged sample configuration files, see the next section and also look in the NetLogger source code in pacman/syslog-ng/.

# Global options

options {
   # Polling interval, in ms (helps reduce CPU)
   time_sleep(50);

   # Use fully qualified domain names
   use_fqdn(yes);

   # Use ISO8601 timestamps
   ts_format(iso);

   # Number of line to buffer before writing to disk
   # (a) for normal load
   flush_lines (10);
   log_fifo_size(100);
   # (b) for heavy load
   #flush_lines (1000);
   #log_fifo_size(1000);

   # Number of seconds between syslog-ng internal stats events.
   # These are useful for watching the load.
   stats_freq(3600);
};

# Data sources: file, TCP or UDP socket, or internal

# Tail /var/log/gridftp.log, prefix copy of input with
# the prefix 'gridfp_log '
source gridftp_log {
  file ("/var/log/gridftp.log" follow-freq(1) flags(no-parse) log_prefix('gridftp_log ') );
};
# ..etc..
# Syslog-ng's own logs; for testing syslog-ng config
source syslog_ng { internal(); };

# Data sinks: file, TCP, or UDP socket

# Send "grid" logs to a remote host on TCP port 5141
destination gridlog_dst {
       tcp("remote.loghost.org" port(5141));
};
# Send other logs to a local file
destination syslog_ng_dst {
  file ("/tmp/syslog-ng.log" perm(0644) );
};

# Data pipelines
# Combine a source and a destination to make a pipeline

# Send the gridftp logs to the remote "grid" host
log {
  source(gridftp_log); destination(gridlog_dst); flags(flow-control);
};
# (and so on for the other "grid" sources)

# Send the internal logs to the local file
log {
  source(syslog_ng); destination(syslog_ng_dst);
};

VDT and OSG Package

This section described how to install and configure the syslog-ng package that the CEDPS project developed for the Virtual Data Toolkit (VDT). This package can be used alone (ie: no other VDT services running), but it does depend on many other components in VDT.

First you must install pacman:

$ wget http://physics.bu.edu/pacman/sample_cache/tarballs/pacman-latest.tar.gz
$ tar xvzf pacman-latest.tar.gz
$ cd pacman-version
# For C-shell derivatives
$ source setup.csh
# For Bourne-shell derivatives
$ source setup.sh

Then install the OSG:Syslog-ng package:

$ P=/path/to/install
$ mkdir $P && cd $P
$ pacman -get OSG:Syslog-ng

To configure/start a syslog-ng sender, you need to first set your VDT_LOCATION; this is a standard part of setting up the VDT on your system. When this is done, do this:

$ P=/path/to/install
# For C-shell derivatives
$ source $P/setup.csh
# For Bourne-shell derivatives
$ source $P/setup.sh
$ $VDT_LOCATION/vdt/setup/configure_syslog_ng_sender --local-collector myloghost.foo.gov
$ $VDT_LOCATION/vdt/setup/configure_syslog_ng_sender --add-source "/tmp/testfile"
$ $VDT_LOCATION/vdt/setup/configure_syslog_ng_sender --server y
$ vdt-control --on syslog-ng-sender

To configure/start a syslog-ng receiver, do this:

$ L=/path/to/logs
$ $VDT_LOCATION/vdt/setup/configure_syslog_ng_receiver --server y
$ $VDT_LOCATION/vdt/setup/configure_syslog_ng_receiver --dir $L
$ vdt-control --on syslog-ng-receiver

NetLogger Web Services APIs

NetLogger provides Web Services APIs to allow non-NetLogger clients an easy way to use the NetLogger analysis functions. Currently the only API is for troubleshooting Pegasus workflows, but plans are in the works for more, and more general-purpose, interfaces.

Pegasus Web API

The Pegasus web API provides access to NetLogger functionality for troubleshooting and analysis of Pegasus workflows. The Pegasus web API is a “REST”-style API, which means that it encodes the method and arguments directly in the URL. It is influenced by the Splunk REST API.

Getting started

All of the Pegasus web API calls use a common format which includes the name of the workflow as well as the service to be invoked.

Currently, there are five available services, all of which are within the "search" module:

  • Tasks: get information on a particular task.

  • FailedTasks: get all failed tasks

  • Mappings: get all mappings between Pegasus tasks and COndor jobs

  • Children: get all child tasks of a given task.

  • Parents: get all parent tasks of a given task.

All data is returned as XML, so that it can either be viewed directly or easily consumed by a client.

Tasks

To get information on a specific task, a user must know the name of the workflow and the task ID. The other task information can be retrieved via the following URL:

http://name.of.server/pegasusAPI/search/(workflow)/tasks/(taskID)

This will return XML in the following format:

<?xml version="1.0" encoding="utf-8" ?>
  <task>
    <run> (workflowID) </run>
    <id> (TaskID) </id>
    <class> (task class) </class>
    <description>job description</description>
    <transform>type of transformation</transform>
    <status> did the job succeed?</status>
    <duration>time</duration>
  </task>

FailedTasks

This will return all tasks that have failed (i.e. have a non-zero status). The user must know the name of the workflow. A list of tasks (using the same XML format as presented above) will be returned.

http://name.of.server/pegasusAPI/search/(workflow)/FailedTasks

returns:

<?xml version="1.0" encoding="utf-8" ?>
<tasklist>
  <task>
  ...
   </task>
  <task>
  ...
  </task>
...
</tasklist>

Mappings

This returns a list of all the mappings between Pegasus clusters and individual tasks. Often, a "merged" job is created for submission to Condor to increase parallelism. This allows a user to pull apart this merging and find out the specific tasks executed on each cluster. The user must know the name of the workflow.

http://name.of.server/pegasusAPI/search/(workflow)/GetMappings

returns:

<?xml version="1.0" encoding="utf-8" ?>
<mappinglist>
  <mapping>
    <jobid> name of job </jobid>
    <xform> type of transformation </xform>
    <jobclass> class </jobclass>
    <tasks> tasks that compose this job
      <task>
        ...
      </task>
      ...
    </tasks>
  </mapping>
   ....
</mappinglist>

Children

Pegasus tasks are related to each other via a directed acyclic graph (DAG). Often, it is useful to know parent-child relationships within this DAG. This service returns all the child tasks of a given task. The user must know the workflow ID and task ID.

http://name.of.server/pegasusAPI/search/(workflow)/Children/(taskID)

returns:

<?xml version="1.0" encoding="utf-8" ?>
<tasklist>
  <task>
  ...
   </task>
  <task>
  ...
  </task>
...
</tasklist>

Parents

Similar to Children, Parents will return all parent tasks of a given task. The user must know the workflow ID and task ID.

http://name.of.server/pegasusAPI/search/(workflow)/Parents/(taskID)

returns:

<?xml version="1.0" encoding="utf-8" ?>
<tasklist>
  <task>
  ...
   </task>
  <task>
  ...
  </task>
...
</tasklist>

Examples

To get information on task 403 in workflow ranger0:

http://krusty.lbl.gov/pegasusAPI/search/ranger0/Tasks/403:

returns:

<?xml version="1.0" encoding="utf-8" ?>
  <task>
    <run>ranger0</run>
    <id>403</id>
    <class></class>
    <description>merge_scec-PeakValCalc_Okaya-1.0_PID3_ID2</description>
    <transform>scec::PeakValCalc_Okaya:1.0</transform>
    <status>0</status>
    <duration>0.108000</duration>
  </task>

To find all failed tasks in workflow ranger0:

http://krusty.lbl.gov/pegasusAPI/search/ranger0/FailedTasks:

returns:

<?xml version="1.0" encoding="utf-8" ?>
      <tasklist>
        <task>
        <run>ranger0</run>
        <id>50</id>
        <class></class>
        <description>register_ranger_0_0</description>
        <transform></transform>
            <status>2</status>
            <duration>0.0</duration>
        </task>
etc.
   </tasklist>

to find all mappings for workflow run0016:

http://krusty.lbl.gov/pegasusAPI/search/run0016/Mappings

returns:

<?xml version="1.0" encoding="utf-8" ?>
<mappinglist>
  <mapping>
    <jobid> findrange_ID000002 </jobid>
    <xform> vahi::findrange:1.0 </xform>
    <jobclass> 1 </jobclass>
    <tasks>
      <task> ID000002 </task>
    </tasks>
  </mapping>
  etc.
</mappinglist>

to find all children of task 013 for workflow run0016:

http://krusty.lbl.gov/pegasusAPI/search/run0016/Children/013

returns:

<?xml version="1.0" encoding="utf-8" ?>
<tasklist>
  <task>
    <run>run0016</run>
    <id>129</id>
    <class></class>
    <description>findrange_ID000003</description>
    <transform></transform>
    <status>0</status>
    <duration>6.032000</duration>
  </task>
etc.
</tasklist>

to find all parents of task 013 for workflow run0016:

http://krusty.lbl.gov/pegasusAPI/search/run0016/Parents/013

Frequently Asked Questions

What is NetLogger?

NetLogger is a methodology for troubleshooting and analyzing distributed application. The NetLogger Toolkit is a set of tools that help deploy this methodology. The methodology is described in more detail here .

Is the current version compatible with previous version(s)?

In a word, no. NetLogger has been in existence, in one form or another, since 1994. Since that time it has been rewritten and renamed, so that the body of software now labeled NetLogger has little or no relation to the software distributed in the early years of research and development.

Why is it called NetLogger?

NetLogger is short for "Networked Application Logger". NetLogger is NOT just about monitoring the Network.

Is NetLogger Open Source?

Yes! It is under a BSD-style open source license.

What happened to the binary format, the activation service, and other features described in some of the NetLogger papers?

They were not used by anyone, and so they were removed to make NetLogger smaller and easier to install.

Is NetLogger compatible with Java’s logging package (aka log4j)?

Yes.

What is the overhead of adding NetLogger?

The overhead is very low. You can generate up to 5000 events/second using the C API, 500 events/second using the Java API, and 80 events/second using the python API which negligible impact on your application.

How do I analyze NetLogger log files?

This is what the NetLogger Pipeline does. There is also a text-based viewer called "nl_view" that can make human browsing of the logs easier.

Questions we have not yet addressed?

Please e-mail us at netlogger-dev@george.lbl.gov

Tool manual pages

This section provides a version of the manpage documentation available, via the UNIX man command, for each of the tools in the NetLogger Python distribution.

netlogd(1)

NAME

netlogd - Receive logs over TCP or UDP and write them to a file.

SYNOPSIS

netlogd [options]

DESCRIPTION

The netlogd program combines one or more streams of newline-delimited log records into a single file. No checking is done as to the format of the records. Records are freely interleaved in a first-come, first-written manner. UDP and TCP mode cannot be used together.

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-b, --fork

fork into the background after starting up

-f, --flush

flush all outputs after each record

-k TIME, --kill=TIME

Kill self after some time. Time can be given in units 's', 'm' or 'h' for seconds, minutes or hours. Default units are minutes ('m')

-o URL, --output=URL

Output file(s), repeatable (default=stdout)

-p PORT, --port=PORT

port number (default=14380)

-r SIZE, --rollover=SIZE

roll over files at given file size (units allowed)

-U, --udp

listen on a UDP instead of TCP socket

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

EXAMPLES

To receive records on the default TCP port and write them to standard output:

$ netlogd

To receive records on UDP port 44351 and write them to file /tmp/combined.log:

$ netlogd -U -p 44351 -o /tmp/combined.log

EXIT STATUS

netlogd returns zero on success, non-zero on error

BUGS

None known.

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_broker(1)

NAME

nl_broker - Runs an "information broker" that accepts streams of NetLogger best-practices formatted data and forwards the streams to one or more loader clients.

SYNOPSIS

nl_broker [options]

DESCRIPTION

This program accepts incoming streams of NetLogger (a.k.a. CEDPS Best-Practices) formatted data over TCP from one or more sources. The streamed data are then passed to one or more attached nl_load processes. The nl_load processes then take the data and reformat to a output format, load into a database back-end or filter based on a client defined criteria.

This program would generally be invoked on the command line and run in the background. It would normally be invoked first, followed by the attachment of one or more nl_load processes, before receiving incoming streamed data. The broker does not buffer information, so if there are no nl_load processes attached to harvest and process the streams the data will not be processed.

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-l ADDR

Bind to local interface ADDR (default=localhost

-p PORT

Listen for incoming data streams on PORT (default=14380)

-P PORT

Listen for client connections on PORT (default=15380)

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

USAGE

When run without options, nl_broker will bind to the localhost interface on the machine that it is being run on, with a default ports for incoming data and client connections. The interface and port bindings may be overridden on the command line.

SIGNALS

  • SIGTERM, SIGINT, SIGUSR2: Terminate gracefully

BUGS

None known.

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_check(1)

NAME

nl_check - Check a log file for correctness.

SYNOPSIS

nl_check [options] [filename]

DESCRIPTION

Checks that a log file is formatted according to the CEDPS project "Best Practices" guide format (see RESOURCES).

Files are read from a list given on the command line or, if no files are listed, from standard input. Each line that does not conform is reported to standard output. Warnings and errors are printed to standard error, as well as the optional "progress" (useful for large files). In addition, the user may opt to make a copy of each input file with the offending lines removed (see -c option for details).

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-c, --clean

write a copy of all 'clean' lines to stdout

-f, --fast

Do a quick-and-dirty check

-p, --progress

report progress to stderr

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

EXAMPLES

To print out errors in files a.log, b.log, and c.log to stdout:

nl_check a.log b.log c.log

To combine valid lines from files a.log, b.log, and c.log into cleaned.log, printing out errors to stderr:

nl_check -cx < a.log b.log c.log > cleaned.log

To check file big.log, copying valid lines to big.log.cleaned, showing progress (and validation errors) to stderr:

nl_check -p -c .cleaned big.log

EXIT STATUS

nl_check returns zero on success, non-zero on failure

BUGS

None known.

RESOURCES

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_config_verify(1)

NAME

nl_config_verify - Verify a configuration file using a user-provided specification file.

SYNOPSIS

nl_config_verify specification-file [files ..]

DESCRIPTION

Use a single specification file to check one or more configuration files. The results of the check are reported to standard output, and success or failure of the validations is also reflected in the exit status.

The program arguments are simply a specification file and zero or more configuration files to validate. If zero configuration files are given, then standard input is used; please note that in this case, if the configuration uses the "@include" mechanism, this will only work if the included files are in the current directory.

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

SPECIFICATION SYNTAX

The specification file must itself be a valid configuration file. Special keywords in the specification file use % as their first character, so keywords starting with % must not be used elsewhere. The overall syntax for a specification file is a list of configuration file fragments (%spec), then a list of boolean expressions using these fragments (%rule), and finally a single expression giving the order to apply the rules (%apply).

Overall specification syntax
%spec NAME1
..config file fragment..
%spec NAME2
..another fragment..
%rule RULE1 %NAME1 or %NAME2 # expression
%rule RULE2 %NAME1 and %NAME2 # expression
# order to apply rules
%apply RULE1 RULE2

A configuration file fragment starts with a line containing ‘`%spec NAME'', where 'NAME' is a valid Python variable identifier (but otherwise arbitrary). The configuration fragment continues until another ``%spec’' or “%rule” are encountered at the start of the line.

Each configuration file fragment lists all valid sections and valid keywords of each section. If you want to allow other sections, use the special section wildard, [*]. Within a section, arbitrary keywords can be allowed by adding the keyword wildcard __ANY__. Conversely, if you want to require that a listed section is present in all inputs, prefix either the section or keyword with required_.

Wildcard and required sections and keywords
%spec myspec
[required_foo] # 'foo' section is required
required_bar = int # 'bar' keyword is required
__ANY__ =  # any other keywords are allowed
[*] # any other sections are allowed

The values for the keywords are a type name indicating the range of allowable values. The possible type names are:

  • str: String, i.e., anything.

  • int: Integer

  • float: Floating-point number

  • bool: Boolean value — yes/no, true/false, 0/1, on/off

  • path: Same as str, really just documentation. Does not cause the validator to look for the file in the current filesystem.

  • uri: Minimal URL requirements: a sequence of word characters, followed by ://, followed by one or more non-slashes, then anything. This allows http(s), ftp, and all the database URIs.

  • enum: Enumeration. This one is special in that after the type name there should be a list of one or more valid strings. The input must match one of those strings.

After all the fragments, rules are listed, each on a new line containing “%rule NAME EXPR”. The NAME should be a valid Python identifier, and the EXPR is a boolean expression using (boolean) operators, parentheses for grouping, and %NAME references to configuration fragments. The %apply directive comes after all the rules. The first token indicates when to declare success: if it is all then all rules must match; if it is any, then any matching rule stops the validation. After this token comes a list of previously defined rules; this is the order in which they will be tried.

Although the preceding paragraph may seem complex, in most cases the usage of the %rule and %apply directive will be straightforward. For example, if there is only a single %spec section, it will look like this:

Single %spec section
%spec myspec
# .. config fragment here
%rule rule1 %myspec
%apply all rule1

Thats all there is to the specification syntax. For a full example, see the Examples section.

EXAMPLES

Validate my.conf with the specification my.spec.

nl_config_verify my.spec my.conf

Validate my.conf1 and my.conf2 with the specification my.spec, and print informational messages to standard error.

nl_config_verify -v my.spec my.conf1 my.conf2

Validate ./path/to/my.conf (from stdin) with the specification my.spec, and print debugging messages to standard error. Because the path to my.conf is not known to nl_config_verify, this will not work if my.conf tries to "@include" files from its own directory.

nl_config_verify -v -v my.spec < ./path/to/my.conf

Below is an example of a specification file for the nl_parser configuration. Its syntax is valid, but the contents may have drifted out of date.

%spec static
[global]
files_root = path
state_file = path
tail = bool
output_file = path
[parsers]
files = path
[[*]]
__ANY__ = str
[logging]
[[loggers]]
[[[*]]]
level = enum ERROR WARN INFO DEBUG TRACE
handlers = str
qualname = str
propagate = int
[[required_handlers]]
[[[h1]]]
level = enum ERROR WARN INFO DEBUG TRACE
handlers = str
class = str
args = str

%spec dynamic
[global]
files_root = path
state_file = path
tail = bool
output_file = path
[parsers]
files = str
pattern = str
[[bp]]
[[[match]]]
app = str
[[[parameters]]]
has_gid = bool
[logging]
[[loggers]]
[[[netlogger]]]
level = enum ERROR WARN INFO DEBUG TRACE
handlers = str
qualname = str
propagate = int
[[handlers]]
[[[h1]]]
level = enum ERROR WARN INFO DEBUG TRACE
handlers = str
class = str
args = str

%rule static_rule (%static)
%rule dynamic_rule (%dynamic)
%apply any static_rule dynamic_rule

EXIT STATUS

nl_config_verify returns zero if all validations succeeded, a positive number less than 255 if one or more configuration files failed to validate (equal to the smaller of the number that failed and 254), and 255 if there was some other error like a non-existent file or invalid specification syntax.

BUGS

None known.

RESOURCES

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_cpuprobe(1)

NAME

nl_cpuprobe - Measure CPU availability by active probing.

SYNOPSIS

nl_cpuprobe [options]

DESCRIPTION

Measure CPU availability by periodically spawning off a process that spins in a tight loop, and measuring the amount of the CPU we were able to get during that time. This should in theory be similar to the amount of resources a user application could claim.

For each probe, output is a line with a single floating-point number representing the estimated available CPU, in the range 0 to 1.

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-m MS, --millis=MS

number of milliseconds out of every second to run the probe (default=100)

-n NICE, --nice=NICE

nice value to give to the process while probing (default=0)

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

EXAMPLES

To run with spin-interval 50ms and nice value of 0:

$ nl_cpuprobe -m 50

To run, as root, with spin-interval 100ms and nice value of -5:

$ sudo nl_cpuprobe -m 100 -n -5

EXIT STATUS

nl_cpuprobe returns zero on success, non-zero on error

BUGS

None known.

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_date(1)

NAME

nl_date - Convert floating-point dates to NetLogger string dates, and vice-versa

SYNOPSIS

nl_date [dates…]

DESCRIPTION

This utility just converts one or more dates from the number of seconds since the Epoch (1/1/1970 00:00:00) to the ISO8601 string representation YYYY-MM-DDThh:mm:ss.ffffffZ, or vice-versa. The type of a given input is auto-detected. NetLogger’s own parsing and formatting routines are used, so this utility doubles as a sanity-check of those functions.

The date to convert is read from the command line, and output is printed to standard output in the form: "input => output". If no date is provided, then the output shows the current date in both formats, with the prefix "now => ".

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-u

interpret given date or default 'now' as being in UTC (default=False, local timezone).

-U

show result in UTC (default=False, local timezone)

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

EXAMPLES

To print out the current date in both formats:

$ nl_date
now => 2008-09-24T20:17:40.594915-08:00 => 1222316260.594915

To convert a floating-point date to a string:

$ nl_date -s 1185733072.567627
1185733072.567627 => 2007-07-29T18:17:52.567627Z

To convert a string date to a floating-point date:

$ nl_date -d 2007-07-29T18:17:52.567627Z
2007-07-29T18:17:52.567627Z => 1185733072.567627

EXIT STATUS

nl_date always returns zero (success). If the arguments are not understood it just prints the current date.

BUGS

None known.

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_dup(1)

NAME

nl_dup - Count duplicate lines in a file

SYNOPSIS

nl_dup [file]

DESCRIPTION

This utility counts the number of duplicated lines in a log file. The definition of "duplicated" is whether the line has the same (MD5) hash as any other.

A simple report at the end tells how many unique, total, and duplicated lines were in the file.

Each line is hashed and the hash digest is stored in a dictionary, so very large files will use very large amounts of memory.

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-g

Show a progress bar

-o FILE

Write unique lines to FILE (default=no)

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

EXAMPLES

To count the number of duplicates from standard input

$ printf "hello\nhello\ngoodbye\n" | nl_dup
2 unique lines out of 3 (1 duplicates)

To count the number of duplicates in a file, with progress

$ nl_write -n 100000 > /tmp/myfile
$ cat /tmp/myfile >> /tmp/my2files
$ cat /tmp/myfile >> /tmp/my2files
$ nl_dup /tmp/my2files -g
100000 unique lines out of 200000 (100000 duplicates)

EXIT STATUS

nl_dup returns zero (success) if the input file can be read, and it is not interrupted with a signal. If the input file cannot be read, it returns 2. If it is interrupted with a signal or by keyboard interrupt, it prints a report of what it knows so far and returns 1.

BUGS

None known.

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_findbottleneck(1)

NAME

nl_findbottleneck - Find bottleneck from NetLogger transfer summary logs.

Synopsis

nl_findbottleneck [options] [log-file]

DESCRIPTION

Determine the bottleneck from NetLogger logs that show the disk and network read and write bandwidths. The input is a NetLogger log, specifically the one produced by NetLogger’s "transfer" API, although in reality the only fields that need to be present are the correct event name (see below) and:

r.s: sum of bytes/sec ratio

nv: number of items in the sum for r.s

The event name is expected to contain one of four values indicating the component being measured; "disk.read", "disk.write", "net.read", and "net.write". As long as this string appears somewhere in the event name, it will be recognized.

The output is the bottleneck, or "unknown". Optionally (with -v), the sorted list of bandwidths is written as well.

Although the options provide for multiple bottleneck algorithms, at present only one is implemented — the "simple" algorithm that basically looks for the smallest number and labels that the bottleneck if it is more than 15% smaller than the next smallest. For details see the netlogger.analysis.bottleneck module.

Note that parse errors in the input files will be silently ignored. If the -d flag is given, then parse errors will show up as debug messages in the log, but they still will not stop the program.

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-a ALG, --algorithm=ALG

choose bottleneck algorithm by name (default=simple)

-d, --debug

log debugging information, including parsing errors

-r, --report

print a longer report to the console

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

EXAMPLES

To determine the bottleneck from my_transfer.log:

nl_findbottleneck my_transfer.log

EXIT STATUS

nl_findbottleneck returns zero on success, and non-zero on error

BUGS

None known.

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_findmissing(1)

NAME

nl_findmissing - Find and display "missing" events in NetLogger (CEDPS Best-Practices format) logs.

SYNOPSIS

nl_findmissing [options] [files..]

DESCRIPTION

Read NetLogger logs as input and produce as output any .start/.end events that are missing their matching event. The user specifies what fields of a logged event are used for comparison, and this is even flexible enough to even allow different event names to be matched to each other.

Logs are read from standard input or a file, and output is written to standard output. Input lines in the logfile that are not understood, are silently ignored.

The -i/--ids option can be used to specify which fields should be used to match a starting event with its ending event. Optionally, a pattern can be placed before a ‘`:'' to filter the events that are being considered at all. If this option is not provided, then all events are considered and the fields 'event' and 'guid' (i.e. as if the user specified ``-i event,guid’') are used to match starting and ending events. This option may be repeated, so that different sets of events can use different sets of identifiers.

There are three output formats (see EXAMPLES):

  • Human-readable

  • Comma-separated values (CSV)

  • Best Practices logging format (BP)

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-i IDS, --ids=IDS

Set of identifying fields for a given event pattern, using the syntax: [EVENT_REGEX:]FIELD1,..,FIELDN (default='guid')

-t FMT, --type=FMT

Output type (default=human)

-p, --progress

report progress to stderr

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

EXAMPLES

To process the logs and produce human-readable output:

$ nl_findmissing  -t human log2
log2: lala.13 missing end
log2: po.34 missing end

To process the logs and produce CSV output:

$ nl_findmissing  -t csv log2
file,event,missing,key
log2,lala.13,end,lala.13/A2C4144D-7684-FA3E-8F5B-F0E34D8BC18E
log2,po.34,end,po.34/6275D71E-D023-A9F6-742E-6512DD90A1F1

To process the logs and produce BPoutput:

$ nl_findmissing  -t log log2
ts=2008-09-25T18:42:13.635438Z event=lala.13.start level=Info
guid=A2C4144D-7684-FA3E-8F5B-F0E34D8BC18E nl.missing=end mode=random
file=log2 guid=b09f6896-8b41-11dd-964e-001b63926e0d
ts=2008-09-25T18:42:13.635929Z event=po.34.start level=Info
guid=6275D71E-D023-A9F6-742E-6512DD90A1F1 nl.missing=end mode=random
file=log2 p.guid=A2C4144D-7684-FA3E-8F5B-F0E34D8BC18E
guid=b09f6896-8b41-11dd-964e-001b63926e0d

To match events starting with airplane on attributes flightno and airline, and all other events on a combination of country and city:

nl_findmissing -t log -i airplane:flightno,airplane -i country,city in.log > out.log

EXIT STATUS

nl_findmissing returns zero on success, non-zero on failure

BUGS

None known.

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_ganglia(1)

NAME

nl_ganglia - Read Ganglia in, write NetLogger out

SYNOPSIS

nl_ganglia [options]

DESCRIPTION

Contact a Ganglia gmetad, parse the returned XML document, and convert the information into NetLogger-formatted output, with one log entry per metric.

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-e REGEX, --filter=REGEX

regular expression to use as a filter. This expression operates on the formatted output, i.e. name=value pairs

-i SEC, --interval=SEC

poll interval in seconds (default=run once)

-m METRICS, --metrics=METRICS

set of metrics to display (default=base)

-o FILE, --output=FILE

output file (default=stdout)

-s SERVER, --server=SERVER

gmetad server host (default=localhost)

-p PORT, --port=PORT

gmetad server port (default=8651)

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

EXAMPLES

To contact ganglia on default port and dump one set of default metrics to the console:

nl_ganglia

To contact ganglia on server foobar.org once every 15 seconds, and write the subset of returned metrics that contains cpu in the event name to the file /tmp/ganglia.out:

nl_ganglia -e event='.*cpu' -s foobar.org -o /tmp/ganglia.out -i 15

EXIT STATUS

nl_ganglia returns zero on success, non-zero on failure

BUGS

None known.

RESOURCES

Ganglia Monitoring System - http://ganglia.info

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_interval(1)

NAME

nl_interval - Read NetLogger logs as input and output the interval between the .start and .end events.

SYNOPSIS

nl_interval [options] [files..]

DESCRIPTION

Read NetLogger logs as input and produce as output intervals between .start/.end events. The user specifies what fields of a logged event are used for comparison, and this is even flexible enough to even allow different event names to be matched to each other.

Logs are read from standard input or a file, and output is written to standard output. Input lines in the logfile that are not understood, are silently ignored.

The -i/--ids option can be used to specify which fields should be used to match a starting event with its ending event. Optionally, a pattern can be placed before a ‘`:'' to filter the events that are being considered at all. If this option is not provided, then all events are considered and the fields 'event' and 'guid' (i.e. as if the user specified ``-i event,guid’') are used to match starting and ending events. This option may be repeated, so that different sets of events can use different sets of identifiers.

There are three output formats (see EXAMPLES):

  • Human-readable

  • Comma-separated values (CSV)

  • Best Practices logging format (BP)

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-d, --duplicates

Allow duplicate start events without end events, or end events without a start, and match them in FIFO order. Default is to drop old .start or .end events when new ones come in

-c COLUMNS, --columns=COLUMNS

For type 'csv', comma-separated list of additional columns that should be in the output

-g, --progress

report progress to stderr

-i IDS, --ids=IDS

Set of identifying fields for a given event pattern, using the syntax: [EVENT_REGEX:]FIELD1,..,FIELDN (default=.*:event,guid). May be repeated.

-n NBINS, --nbins=NBINS

For --type=hist, number of histogram bins. The default is to automatically choose the number of bins using the standard 'Scott' formula

-r, --ordered

Process data in file order: drop duplicate ends, replace duplicated starts

-s FILE, --save-file=FILE

Write unfinished events to FILE (default=drop them)

-t FMT, --type=FMT

Output type (default=csv). Other choices are: csv =Comma-separated values, log=NetLogger log format, hist=Histogram

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

EXAMPLES

Process in.log and produce human-readable output:

$ nl_interval < in.log
lala.24 0.000059
po.13 0.000041
tinkywinky.81 0.000039
tinkywinky.55 0.000042

Process in.log and produce CSV output:

$ nl_interval -t csv < in.log
event,key,interval_sec
lala.24,lala.24/C24391AA-4D28-78B1-D59C-9C96627F256F,0.000059
po.13,po.13/1C746366-6C8A-3238-7CF2-313C417ECF96,0.000041
tinkywinky.81,tinkywinky.81/31A15BAD-4AEE-1E63-7ACD-C6EB8CF8547B,0.000039
tinkywinky.55,tinkywinky.55/9A16401D-5643-69BF-DFE9-A95692A349A4,0.000042

Process in.log and produce log output:

$ nl_interval -t log < in.log
ts=2008-09-25T18:42:13.636326Z event=lala.24.intvl level=Info status=0
guid=C24391AA-4D28-78B1-D59C-9C96627F256F nl.intvl=0.000059
mode=random p.guid=6275D71E-D023-A9F6-742E-6512DD90A1F1
ts=2008-09-25T18:42:13.636653Z event=po.13.intvl level=Info status=0
guid=1C746366-6C8A-3238-7CF2-313C417ECF96 nl.intvl=0.000041
mode=random p.guid=6275D71E-D023-A9F6-742E-6512DD90A1F1
ts=2008-09-25T18:42:13.636927Z event=tinkywinky.81.intvl level=Info
status=-1 guid=31A15BAD-4AEE-1E63-7ACD-C6EB8CF8547B nl.intvl=0.000039
mode=random p.guid=6275D71E-D023-A9F6-742E-6512DD90A1F1
ts=2008-09-25T18:42:13.637220Z event=tinkywinky.55.intvl level=Info
status=0 guid=9A16401D-5643-69BF-DFE9-A95692A349A4 nl.intvl=0.000042
mode=random p.guid=6275D71E-D023-A9F6-742E-6512DD90A1F1

Match events starting with airplane on flightno and all other events on a combination of country and city.

nl_interval -i airplane:flightno -i country,city < in.log > out.log

EXIT STATUS

nl_interval returns zero on success and non-zero on failure

BUGS

None known.

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_load(1)

NAME

nl_load - Process incoming streams of NetLogger formatted data.

SYNOPSIS

nl_load {-a HOST | -c HOST | -f FILE} module_name [option=value..] [prefix1 prefix2 ..]

DESCRIPTION

This program processes streams of NetLogger (a.k.a. Best-Practices) formatted data. It may transform the data to a different file format (CSV, for example), load the data into a database, or act as a filtering mechanism based on client defined needs. Processing logic is encapsulated in “analysis modules”, which are Python modules that follow some simple conventions. The framework can stream input to these modules from one of standard input, a file, the NetLogger broker (nl_broker), or an AMQP broker such as RabbitMQ.

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-c HOST, --host=HOST

Connect to NetLogger info-broker at HOST (default=localhost)

-f FILE, --infile=FILE

Read NetLogger logs from FILE (default=stdin)

-g, --progress

report progress to stderr

-i, --info

Print information on selected module

-l, --list

List available modules

-M FILE, --module-opt=FILE

Read module options from a file with one name=value pair per line (default=No file; use command-line)

-p PORT, --port=PORT

For info_broker or amqp server, the port to connect to (default=info_broker 15380, amqp broker 5672)

-r SEC, --reconnect=SEC

If connection to broker at HOST fails, try again every SEC seconds (default=10). 0=don't retry

-t, --tail

With -f, tail the file instead of stopping at EOF

AMQP-specific options:
-a HOST, --amqp-host=HOST

Connect to AMQP server at HOST (default=127.0.0.1)

-A name

=val|:file, --amqp_option=name=val|:file AMQP options; repeatable. Known options: auto_delete (delete queues/exchanges when done), durable (save messages to disk), exchange (exchange name), exchange_type (direct, fanout, or topic), insist (no redirect), pw (password), queue (queue name), route (routing key, @event to use event), user (user name), vhost (virtual host). May also be of the form ':<filename>', e.g. ':/tmp/passwd', which reads the options from a file with one name=value pair per line.

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

USAGE

The nl_load program is comprised of two parts. The main program, nl_load, and the analysis modules. Every invocation of nl_load uses one analysis module to process the log. Inputs are selected by providing one of the -c/--host, -f/--file, or -a/--amqp-host options. To write your own module, see existing modules under netlogger/analysis/modules.

SIGNALS

  • SIGTERM, SIGINT, SIGUSR2: Terminate gracefully

EXAMPLES

No-op: load with the "bp" loader from standard input to standard output.

nl_load bp < infile > outfile

To invoke nl_load, attach to a nl_broker process on the local machine, load the csv processing module and output the transformed data to a file:

nl_load -c localhost csv > bp_outfile.csv

Load data from an AMQP broker, with given exchange and queue, into MongoDB.

nl_load -a my.data.broker -A exchange=myex -A queue=bpdata mongodb database=mydb collection=mycollection host=my.db.host

Load data from an AMQP broker, configured from a file, into MongoDB, also configured from a file.

nl_load -a my.data.broker -A :/tmp/amqp.conf -M /tmp/mongo.conf mongodb

BUGS

None known.

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_notify(1)

NAME

nl_notify - Run a command and notify by email if it fails.

SYNOPSIS

nl_notify [options] command args..

DESCRIPTION

Runs a given command with its arguments. If return status from the command is non-zero, send the standard output and standard error, with an appropriate subject line, to the provided email address. If the return status from the command is zero, do nothing.

Email is sent by default to localhost, port 25. Values for the "From:" and "To:" fields must be provided by the user.

Note: If the command’s arguments include a dash then they need to be quoted (you can quote the whole command if you want).

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-b SUBJECT, --subject=SUBJECT

Email subject (default=Error on %host from '%prog')

-f user@host, --from=user@host

Set 'From:' to user@host (required)

-g, --nagios

Nagios mode. Put first line of standard output in '%status'. Add this to default subject line (default=No)

-n, --test

Print to stdout instead of sending email

-p SERVER_PORT, --port=SERVER_PORT

SMTP server port (default=25)

-s HOST, --server=HOST

SMTP server host (default=localhost)

-t user@host, --to=user@host

Set 'To:' to user@host (required)

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

EXAMPLES

To write what would have happened to standard output:

$ nl_notify --from user@somehost.com --to user@otherhost.org  --test /usr/bin/false
Connect to localhost:25
To: user@otherhost.org
From: user@somehost.com
Subject: Error on 192.168.1.101 (Macintosh-8.local) from '/usr/bin/false'
Output from '/usr/bin/false':
-- stdout --

-- stderr --

To run nl_check_pipeline in “nagios mode”, which allows you to include the status in the subject line:

$ nl_notify -b "Hey: %host says \'%status\'" \
  -f user@somehost.org -t user@otherhost.com \
  -g -p 9999 nl_check_pipeline
Subject: Hey: 192.168.1.101 (Macintosh-8.local) says 'CRITICAL: 3 components not running'
Output from '../../scripts/nagios/nl_check_pipeline':
-- stdout --
CRITICAL: 3 components not running

-- stderr --

EXIT STATUS

nl_notify returns zero on success, nonzero on an error.

BUGS

None known.

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_parse(1)

NAME

nl_parse - Program to read from a variety of log formats, reformat to NetLogger best-practices format, and sent the results to a file or information broker (nl_broker).

SYNOPSIS

nl_parse [options] module [params..] [files..]

DESCRIPTION

This program converts from known log formats to NetLogger (a.k.a. CEDPS Best-Practices) format and sends the results to either a file, stdout, the NetLogger information broker (nl_broker), or an AMQP broker. There are a number of built-in parsers and these may be listed by invoking with the -l/--list flag. The nl_parse program can operate on a single file, a directory of files/filename pattern-matching, can rescan a directory for new files, and can tail files.

This program can either be invoked for a "single run" to process a file or number of files, or can be run in the background to watch new and/or changing input files.

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-c HOST, --broker=HOST

Write parsed data to NetLogger broker at HOST(default port=15380)

-d, --amqp_disconnect

send disconnect message to AMQP server when done. No effect if not used with -a.

-f INTERVAL, --flush=INTERVAL

Flush output file after INTERVAL seconds of inactivity (default=1)

-g, --progress

report progress to stderr

-i, --info

Print information on selected module

-l, --list

List available modules

-o FILE, --output=FILE

Write NetLogger logs to FILE (default=stdout)

-O FILE, --offset-file=FILE

Load/maintain file offsets in FILE, so that subsequent runs don't process duplicate data (default=none)

-p PORT, --port=PORT

For info_broker or amqp server, the port to connect to (default=info_broker 15380, amqp broker 5672)

-r SEC, --reconnect=SEC

If connection to broker at HOST fails, try again every SEC seconds (default=10). 0=don't retry

-s SEC, --rescan=SEC

Rescan directory for files matching the input patterns every SEC seconds (default=10)

-t, --tail

Tail input files instead of stopping at EOF

AMQP-specific options:
-a HOST, --amqp-host=HOST

Connect to AMQP server at HOST (default=127.0.0.1)

-A name

=val|:file, --amqp_option=name=val|:file AMQP options; repeatable. Known options: auto_delete (delete queues/exchanges when done), durable (save messages to disk), exchange (exchange name), exchange_type (direct, fanout, or topic), insist (no redirect), pw (password), queue (queue name), route (routing key, @event to use event), user (user name), vhost (virtual host). May also be of the form ':<filename>', e.g. ':/tmp/passwd', which reads the options from a file with one name=value pair per line.

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

SIGNALS

  • SIGTERM, SIGINT, SIGUSR2 - Terminate gracefully

EXAMPLES

Parse a directory (“logdir”) of gridftp formatted logs (load gridftp parser module as an arg) that end with a .log extension, and send the best-practices formatted data to an information broker running on localhost.

nl_parse -c localhost gridftp "logdir/*.log"

Similar to previous example, but watch a directory of best-practices formatted logs (bp parser module), re-scan the directory every 30 seconds looking for new logs, tail the log files rather than stopping at EOF, and write the results out to an output file.

nl_parse -f output.bp -t bp -s 30 "logdir/*.log"

Parse all the log files under directories "a", "b", and "c", and send them to an AMQP broker on some.host.org to an exchange named "logs" with routing key "log.gridftp". Also save the position in each file in the file "/tmp/offsets.dat" so that subsequent runs won’t re-send old data.

nl_parse -O /tmp/offsets.dat -t -a some.host.org -A exchange=logs -A
route=log.gridftp gridftp hostnames=yes "{a,b,c}/*.log"

BUGS

None known.

RESOURCES

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_view(1)

NAME

nl_view - Re-format NetLogger logs.

SYNOPSIS

nl_view [options] [files..]

DESCRIPTION

Reformats the semi-structured keyword and value pairs of the NetLogger format for readability or importing into Excel, R, or other programs that require tabular data.

The time and event is always shown, although the time can be formatted either as an absolute ISO timestamp (the default), or as a number of seconds since the first or previous event. An arbitrary prefix can be stripped from event names (names without that prefix are of course left alone).

The default delimiter between columns is a space, but this can be changed to make, e.g., comma-separated values. Currently no quoting is done.

Special support for identifiers is provided with the -t/--tiny-id option, which replaces the value of the identifier with a short (4-character) locally unique value. This value is random, but the seed is always the same and the algorithm is deterministic, so the chosen value will be the same for successive invocations.

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-a ATTR, --attr=ATTR

add attribute ATTR to output line, repeatable

-A, --all

add all attributes to output line

-c, --cum-delta

show times as deltas since first (defalt=False)

-d, --delta

show times as deltas from previous (defalt=False)

-D DELIM, --delimiter=DELIM

column delimiter (default=' ')

-e, --long

Break each attribute onto its own line. Voids other formatting options and implies '-A'.

-g, --guid

add 'guid' attribute

-H, --header

add header row (default=False)

-i, --host

add 'host' attribute

-I, --identifiers

add any attribute ending in '.id'

-l, --level

add 'level' attribute'

-m

add 'msg' attribute

-n PREFIX, --namespace=PREFIX

strip namespace PREFIX if found

-N, --no-names

Do not show attribute names

-s, --status

add 'status' attribute

-t, --tiny-id

replace *.id and guid values with shorter id's, like tinyurl

-w NUM, --width=NUM

set event column width to NUM (default=40)

-x

ignore non-NetLogger lines

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

EXAMPLES

To put the viewer in a pipeline between the application and a pager:

my-application | nl_view -gi | less

To run the viewer on a bunch of files, showing some user-defined attributes:

nl_view -a foo -a bar *.log > combined.log

To run the viewer so that it displays time-deltas, guid, event name with a prefix stripped, and any "identifier" attributes (this particular set of values is useful for the Globus 4.2 containerLog):

nl_view -diIgmt --namespace=org.globus.  containerLog

EXIT STATUS

Always succeeds, returning 0.

BUGS

None known.

RESOURCES

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_wflowgen(1)

NAME

nl_wflowgen - Generate simulated workflow logs.

SYNOPSIS

nl_wflowgen [options] [-h]

DESCRIPTION

Generate random workflow logs in BP (NetLogger) format.

Two distinct types of simulated workflows can be generated. The random workflow is simply a random tree of events, linked together with GUIDs. The globus type workflow is not entirely like the logs from a Globus (GT4.2+) job submission.

How deeply workflows are nested is determined by the --mindepth and --maxdepth options, whereas the probability that the next event in any given workflow will be nested (if allowed by the min/max depth) is controlled by the --nest option.

Each ending event for a workflow has an associated status attribute. The probability of that being non-zero, i.e. indicating failure, is controlled with the --fail option.

OPTIONS

--version

show program's version number and exit

-h, --help

show this help message and exit

-m MODE, --mode=MODE

Run mode (default=random). Modes: 'random' = a random workflow 'tree'; 'globus' = Globus job submit

-o OFILE, --output=OFILE

output filename. use stdout if not given

--num=NUM

[random, globus] number of events, total (default=100)

--mindepth=MIN_DEPTH

[random] minimum number of nested events in a workflow (default=1)

--maxdepth=MAX_DEPTH

[random] maximum number of nested events in a workflow (default=5)

--fail=FAIL

[random] probability of failure for a .end event (default=0.1)

--nest=NEST

[random] probability of nesting events, at any point (default=0.5)

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

EXAMPLES

To produce a default random workflow to standard output:

nl_wflowgen

To produce a default globus workflow to standard output:

nl_wflowgen -m globus

EXIT STATUS

Returns zero on success, non-zero on error

BUGS

None known.

RESOURCES

The Globus Alliance - http://www.globus.org

AUTHOR

Dan Gunter <dkgunter@lbl.gov>

............................................................

nl_write(1)

NAME

nl_write - Write a NetLogger-formatted message.

SYNOPSIS

nl_write [options] name=value..

DESCRIPTION

Write one NetLogger-formatted message to standard output, TCP, or UDP. Any number of name=value pairs can be given as arguments. These will be copied to the output along with the standard values of ts=<timestamp> and event=<event_name>, to form a properly formatted log message.

OPTIONS

Note: single-letter options in upper-case control how things are logged, whereas lower-case options control what is logged.

--version

show program's version number and exit

-h, --help

show this help message and exit

-g, --guid

add guid=GUID to message. This is overridden by an explicit guid=GUID argument.

-i, --ip

add 'host=IP' to message. This is overridden by an explicit host=HOST argument.

-n NUM, --num=NUM

Write NUM messages, each with n=<1..NUM> in them (default=1)

-H HOST, --host=HOST

for UDP/TCP/AMQP, the remote host (default=localhost)

-P PORT, --port=PORT

For UDP/TCP/AMQP, the port to write to (default=UDP 514, TCP 14380, AMQP 5672)

-S, --syslog

add a header for syslog (default=False unless -U is given, then True)

-T, --tcp

write message to TCP (default port=14380)

-U, --udp

write message to UDP (default port=514)

-A, --amqp

write message to AMQP server (default port=5672)

-D, --amqp_disconnect

send disconnect message to AMQP server when done. No effect if not used with -A.

-O name

=val, --amqp_option=name=val optional arg to feed name/value options to amqp connection/producer. (repeatable: -O user=foo -O pw=bar)

Logging options:
-L FILE, --log=FILE

write logs to FILE (default=stderr)

-R TIME, --logrotate=TIME

rotate logs at an interval (<N>d or <N>h or <N>m)

-v, --verbose

more verbose logging

-q, --quiet

quiet mode, no logging

EXAMPLES

To write the default message:

nl_write

To write a message with a host, guid, and attributes foo and bar:

nl_write -g -i foo=12345 bar='hello, world'"

To write a syslog-formatted message to the standard syslog UDP port (514):

nl_write -g -U msg='hello, world'

EXIT STATUS

Returns zero on success, non-zero on error

BUGS

The host option always uses the default interface.

There is no way to write a message with a user-defined timestamp, the time is always "now".

AUTHOR

Dan Gunter <dkgunter@lbl.gov>