Preface
This is the manual for the NetLogger Toolkit. For more details, downloads, etc. please refer to the NetLogger web pages at http://acs.lbl.gov/NetLoggerWiki/
Conventions
- Italic
-
Used for file and directory names, email addresses, and new terms where they are defined.
- Constant Width
-
Used for code listings and for keywords, variables, functions, command options, parameters, class names, and HTML tags where they appear in the text. Used with double quotes for literal values like True, 10 and netlogger.modules. In code listings, user input to the terminal will be prefixed with a $.
- Constant Width Italic
-
Used to indicate items that should be replaced by actual values.
- Link text
-
Used for URLs and cross-references.
Overview
Anyone who has ever tried to debug or do performance analysis of complex distributed applications knows that it can be a very difficult task. Problems may be in many various software components, hardware components, networks, the OS, etc.
NetLogger is designed to make this easier. NetLogger is both a methodology for analyzing distributed systems, and a set of tools to help implement the methodology.
Methodology: Logging Best Practices
The NetLogger methodology, also called the Logging Best Practices (BP), is documented in detail at http://www.cedps.net/index.php/LoggingBestPractices. The following is a brief summary:
- Terminology
-
For clarity here are some definitions of terms which are used throughout the NetLogger documentation.
- event
-
A uniquely named point of interest within a given system occurring at a specific time. An event is also a required attribute of each NetLogger log entry.
- log
-
A file containing logging events or a stream of such events.
- log entry
-
A single line within a log corresponding to a single event.
- attribute
-
A detailed characteristic of an event.
- name/value pair
-
How attributes are identified within a log entry — a name with the given value separated by a “+=+” (equality symbol).
For example, the following shows that the log file my.log contains one log entry with an event of something.happened. This event has three attributes, represented in the log entry by the name/value pairs whose names are ts, event, and level.
$ cat my.log ts=2008-10-10T19:24:35.508249Z event=something.happened level=Info
- Practices
-
All logs should contain a unique event attribute and an ISO-formatted timestamp (See ISO8601). System operations that might fail or experience performance variations should be wrapped with start and end events. All logs from a given execution context should have a globally unique ID (or GUID) attribute, such as a Universal Unique Identifier (UUID) (see RFC4122). When multiple contexts are present, each one should use its own identifying attribute name ending in .id.
- Errors
-
A reserved status integer attribute must be used for all end events, with "0" for success and any other value for failure or partial failure. The default severity of a log message is informational, other severities are indicated with a level attribute.
- Format
-
Each log entry should be composed of a single line of ASCII name=value pairs (aka attributes); this format is highly portable, human-readable, and works well with line-oriented tools.
- Naming
-
For event attribute names we recommend using a ‘.’ as a separator and go from general to specific; similar to Java class names.
A sample job submit start/end log in this format would look like the following:
ts=2006-12-08T18:39:19.372375Z event=org.job.submit.start user=dang job.id=37900 ts=2006-12-08T18:39:23.114369Z event=org.job.submit.end user=dang job.id=37900 status=0
The addition of log file grammar such as the name-value attribute pair structure encourages more regular and normalized representations than natural language sentences commonly found in ad-hoc logs.
For example, a message like error: read from socket on foobar.org:1234: remote host baz.org:4321 returned -1 would be:
ts=2006-12-08T18:48:27.598448Z event=org.my.myapp.socket.read.end level=ERROR status=-1 host=foobar.org:1234 peer=baz.org:4321
The open source NetLogger Toolkit is a set of tools to implement this methodology.
Tools
The tools included with NetLogger can be grouped in four main areas:
-
Logging APIs: C, Java, Perl, Python, and UNIX shell
-
NetLogger Pipeline: Parse, load, and analyze logs using a relational database and the R data analysis language.
-
Bottleneck detection: Test disk/network for bottleneck in WAN transfers.
-
Utilities: Monitoring probes, a log receiver (netlogd), and some other pieces that are occasionally useful.
Installation
The NetLogger Toolkit has separate installation instructions for each language. The data parsing and analysis tools are part of the Python installation.
For general download instructions, see https://sites.google.com/a/lbl.gov/netlogger/software.
System requirements
- Operating System
-
NetLogger has been tested on UNIX and Mac OSX. The Python code should work on Windows with some modifications, but this is not a priority for our development.
- NTP
-
All monitored hosts should use NTP (http://www.ntp.org), or the equivalent, for clock synchronization
Install C
Below are instructions for installing the C instrumentation API and the nlioperf program.
# Run configure; make; make install cd c ./configure --prefix=/your_install_path make make install
Install Java
Prerequisites
Java 1.5 or above (http://java.sun.com) for the Java instrumentation
Install
Below are instructions for installing the Java instrumentation API.
# Build JAR file cd java ant jar # Copy jarfile into desired spot cp netlogger-java-trunk.jar /your_install_path/netlogger.jar # Then set your classpath csh% setenv CLASSPATH $CLASSPATH:/your_install_path # .. OR .. sh$ export CLASSPATH=$CLASSPATH:/your_install_path
Install PERL
Prerequisites
PERL version 5 or higher (http://www.perl.org) is required.
The PERL UUID module is required. You can install this from CPAN:
perl -MCPAN -e "install Data::UUID"
Install
Below are instructions for installing the PERL instrumentation API.
cd perl # Run PERL's standard install sequence perl Makefile.PL make make test make install
Install Python
Prerequisites
The following Python modules may be needed by the NetLogger pipeline to interact with the database. To install these modules, either use a package manager such as Debian’s APT, the RedHat/etc. yum, FreeBSD ports, etc., use Python’s easy_install command from setuptools or download and install from source. The easy_install command and download URL are given below.
-
MySQLdb for MySQL
-
easy_install MySQLdb
-
-
psycopg2 or pgdb for PostgreSQL
-
easy_install psycopg2
-
Install
Below are instructions for installing the Python instrumentation API and tools.
-
Install from PyPi
easy_install netlogger
-
Install from source
cd python # Run Python's standard install sequence python setup.py build python setup.py install
Install R
There is no NetLogger R instrumentation API, but we do use R to analyze the data (see the SQL and R analysis section).
Prerequisites
- Version
-
R version 2.6.0 or higher is required. The latest version of R is recommended, though, particularly if you are going to use ggplot2. Windows binaries and Debian, Redhat, Ubuntu and SuSE packages are available. For other platforms or the latest/greatest, R compiles from source on most platforms. See your local Comprehensive R Archive Network (CRAN) mirror to download any of the above.
- Packages
-
A number of R packages are required to run the NetLogger R programs. Instructions follow on how to install them from within R.
-
Start R
$ R
-
Choose a mirror (you only need to do this once):
> chooseCRANmirror()
-
Download and install the packages:
install.packages(c("lattice","latticeExtra", "Hmisc","RMySQL", "RSQLite", "ggplot2"), dependencies = TRUE)
-
Install
To use the package in R, simply load it by name. To get help, use the standard R help facility.
Instrumentation APIs
NetLogger has instrumentation APIs to produce Best Practices (BP) formatted logs for C/C++, Java, Perl, and Python.
C API
The C API documentation is auto-generated from the source code using Doxygen.
It is available online from http://acs.lbl.gov/NetLogger-releases/doc/api/c-trunk/
Java API
The Java API documentation is auto-generated from the source code using Javadoc.
It is available online from http://acs.lbl.gov/NetLogger-releases/doc/api/java-trunk/
Perl API
The Perl API documentation is auto-generated from the source code using pod2html.
It is available online from http://acs.lbl.gov/NetLogger-releases/doc/api/perl-trunk/
Python API
The Python API documentation is auto-generated from the source code using epydoc.
It is available online from http://acs.lbl.gov/NetLogger-releases/doc/api/python-trunk/
Syslog-NG
Syslog-NG, available from http://www.balabit.com/network-security/syslog-ng/, is a flexible and scalable system logging application that can act a as a drop-in replacement for standard syslog.
A syslog-ng server can send local data over the network (TCP or UDP), receive network data and log it locally, or do both. syslog-ng receivers can be configured to aggregate and filter logs based on program name, log level and even a regular expression on message contents. It is very scalable: if a particular receiver gets over-loaded, one can just bring up another receiver on a another machine and send half the logs to each. syslog-ng supports fully qualified host names and time zones, which standard syslog does not. Standard syslog could also be used, but only for single site deployments.
We recommend syslog-ng 2.0 over syslog-ng 1.6 because of the new ISO date option, which is needed for logging across multiple time zones. To download syslog-ng, go to: http://www.balabit.com/downloads/files/syslog-ng/sources/stable/src/.
Here is a commented sample syslog-ng 2.0 sender configuration file. For pre-packaged sample configuration files, see the next section and also look in the NetLogger source code in pacman/syslog-ng/.
# Global options options { # Polling interval, in ms (helps reduce CPU) time_sleep(50); # Use fully qualified domain names use_fqdn(yes); # Use ISO8601 timestamps ts_format(iso); # Number of line to buffer before writing to disk # (a) for normal load flush_lines (10); log_fifo_size(100); # (b) for heavy load #flush_lines (1000); #log_fifo_size(1000); # Number of seconds between syslog-ng internal stats events. # These are useful for watching the load. stats_freq(3600); }; # Data sources: file, TCP or UDP socket, or internal # Tail /var/log/gridftp.log, prefix copy of input with # the prefix 'gridfp_log ' source gridftp_log { file ("/var/log/gridftp.log" follow-freq(1) flags(no-parse) log_prefix('gridftp_log ') ); }; # ..etc.. # Syslog-ng's own logs; for testing syslog-ng config source syslog_ng { internal(); }; # Data sinks: file, TCP, or UDP socket # Send "grid" logs to a remote host on TCP port 5141 destination gridlog_dst { tcp("remote.loghost.org" port(5141)); }; # Send other logs to a local file destination syslog_ng_dst { file ("/tmp/syslog-ng.log" perm(0644) ); }; # Data pipelines # Combine a source and a destination to make a pipeline # Send the gridftp logs to the remote "grid" host log { source(gridftp_log); destination(gridlog_dst); flags(flow-control); }; # (and so on for the other "grid" sources) # Send the internal logs to the local file log { source(syslog_ng); destination(syslog_ng_dst); };
VDT and OSG Package
This section described how to install and configure the syslog-ng package that the CEDPS project developed for the Virtual Data Toolkit (VDT). This package can be used alone (ie: no other VDT services running), but it does depend on many other components in VDT.
First you must install pacman:
$ wget http://physics.bu.edu/pacman/sample_cache/tarballs/pacman-latest.tar.gz $ tar xvzf pacman-latest.tar.gz $ cd pacman-version # For C-shell derivatives $ source setup.csh # For Bourne-shell derivatives $ source setup.sh
Then install the OSG:Syslog-ng package:
$ P=/path/to/install $ mkdir $P && cd $P $ pacman -get OSG:Syslog-ng
To configure/start a syslog-ng sender, you need to first set your VDT_LOCATION; this is a standard part of setting up the VDT on your system. When this is done, do this:
$ P=/path/to/install # For C-shell derivatives $ source $P/setup.csh # For Bourne-shell derivatives $ source $P/setup.sh $ $VDT_LOCATION/vdt/setup/configure_syslog_ng_sender --local-collector myloghost.foo.gov $ $VDT_LOCATION/vdt/setup/configure_syslog_ng_sender --add-source "/tmp/testfile" $ $VDT_LOCATION/vdt/setup/configure_syslog_ng_sender --server y $ vdt-control --on syslog-ng-sender
To configure/start a syslog-ng receiver, do this:
$ L=/path/to/logs $ $VDT_LOCATION/vdt/setup/configure_syslog_ng_receiver --server y $ $VDT_LOCATION/vdt/setup/configure_syslog_ng_receiver --dir $L $ vdt-control --on syslog-ng-receiver
NetLogger Web Services APIs
NetLogger provides Web Services APIs to allow non-NetLogger clients an easy way to use the NetLogger analysis functions. Currently the only API is for troubleshooting Pegasus workflows, but plans are in the works for more, and more general-purpose, interfaces.
Pegasus Web API
The Pegasus web API provides access to NetLogger functionality for troubleshooting and analysis of Pegasus workflows. The Pegasus web API is a “REST”-style API, which means that it encodes the method and arguments directly in the URL. It is influenced by the Splunk REST API.
Getting started
All of the Pegasus web API calls use a common format which includes the name of the workflow as well as the service to be invoked.
Currently, there are five available services, all of which are within the "search" module:
-
Tasks: get information on a particular task.
-
FailedTasks: get all failed tasks
-
Mappings: get all mappings between Pegasus tasks and COndor jobs
-
Children: get all child tasks of a given task.
-
Parents: get all parent tasks of a given task.
All data is returned as XML, so that it can either be viewed directly or easily consumed by a client.
Tasks
To get information on a specific task, a user must know the name of the workflow and the task ID. The other task information can be retrieved via the following URL:
http://name.of.server/pegasusAPI/search/(workflow)/tasks/(taskID)
This will return XML in the following format:
<?xml version="1.0" encoding="utf-8" ?> <task> <run> (workflowID) </run> <id> (TaskID) </id> <class> (task class) </class> <description>job description</description> <transform>type of transformation</transform> <status> did the job succeed?</status> <duration>time</duration> </task>
FailedTasks
This will return all tasks that have failed (i.e. have a non-zero status). The user must know the name of the workflow. A list of tasks (using the same XML format as presented above) will be returned.
http://name.of.server/pegasusAPI/search/(workflow)/FailedTasks
returns:
<?xml version="1.0" encoding="utf-8" ?> <tasklist> <task> ... </task> <task> ... </task> ... </tasklist>
Mappings
This returns a list of all the mappings between Pegasus clusters and individual tasks. Often, a "merged" job is created for submission to Condor to increase parallelism. This allows a user to pull apart this merging and find out the specific tasks executed on each cluster. The user must know the name of the workflow.
http://name.of.server/pegasusAPI/search/(workflow)/GetMappings
returns:
<?xml version="1.0" encoding="utf-8" ?> <mappinglist> <mapping> <jobid> name of job </jobid> <xform> type of transformation </xform> <jobclass> class </jobclass> <tasks> tasks that compose this job <task> ... </task> ... </tasks> </mapping> .... </mappinglist>
Children
Pegasus tasks are related to each other via a directed acyclic graph (DAG). Often, it is useful to know parent-child relationships within this DAG. This service returns all the child tasks of a given task. The user must know the workflow ID and task ID.
http://name.of.server/pegasusAPI/search/(workflow)/Children/(taskID)
returns:
<?xml version="1.0" encoding="utf-8" ?> <tasklist> <task> ... </task> <task> ... </task> ... </tasklist>
Parents
Similar to Children, Parents will return all parent tasks of a given task. The user must know the workflow ID and task ID.
http://name.of.server/pegasusAPI/search/(workflow)/Parents/(taskID)
returns:
<?xml version="1.0" encoding="utf-8" ?> <tasklist> <task> ... </task> <task> ... </task> ... </tasklist>
Examples
To get information on task 403 in workflow ranger0:
http://krusty.lbl.gov/pegasusAPI/search/ranger0/Tasks/403:
returns:
<?xml version="1.0" encoding="utf-8" ?> <task> <run>ranger0</run> <id>403</id> <class></class> <description>merge_scec-PeakValCalc_Okaya-1.0_PID3_ID2</description> <transform>scec::PeakValCalc_Okaya:1.0</transform> <status>0</status> <duration>0.108000</duration> </task>
To find all failed tasks in workflow ranger0:
http://krusty.lbl.gov/pegasusAPI/search/ranger0/FailedTasks:
returns:
<?xml version="1.0" encoding="utf-8" ?> <tasklist> <task> <run>ranger0</run> <id>50</id> <class></class> <description>register_ranger_0_0</description> <transform></transform> <status>2</status> <duration>0.0</duration> </task> etc. </tasklist>
to find all mappings for workflow run0016:
http://krusty.lbl.gov/pegasusAPI/search/run0016/Mappings
returns:
<?xml version="1.0" encoding="utf-8" ?> <mappinglist> <mapping> <jobid> findrange_ID000002 </jobid> <xform> vahi::findrange:1.0 </xform> <jobclass> 1 </jobclass> <tasks> <task> ID000002 </task> </tasks> </mapping> etc. </mappinglist>
to find all children of task 013 for workflow run0016:
http://krusty.lbl.gov/pegasusAPI/search/run0016/Children/013
returns:
<?xml version="1.0" encoding="utf-8" ?> <tasklist> <task> <run>run0016</run> <id>129</id> <class></class> <description>findrange_ID000003</description> <transform></transform> <status>0</status> <duration>6.032000</duration> </task> etc. </tasklist>
to find all parents of task 013 for workflow run0016:
http://krusty.lbl.gov/pegasusAPI/search/run0016/Parents/013
Frequently Asked Questions
NetLogger is a methodology for troubleshooting and analyzing distributed application. The NetLogger Toolkit is a set of tools that help deploy this methodology. The methodology is described in more detail here .
In a word, no. NetLogger has been in existence, in one form or another, since 1994. Since that time it has been rewritten and renamed, so that the body of software now labeled NetLogger has little or no relation to the software distributed in the early years of research and development.
NetLogger is short for "Networked Application Logger". NetLogger is NOT just about monitoring the Network.
Yes! It is under a BSD-style open source license.
They were not used by anyone, and so they were removed to make NetLogger smaller and easier to install.
Yes.
The overhead is very low. You can generate up to 5000 events/second using the C API, 500 events/second using the Java API, and 80 events/second using the python API which negligible impact on your application.
This is what the NetLogger Pipeline does. There is also a text-based viewer called "nl_view" that can make human browsing of the logs easier.
Please e-mail us at netlogger-dev@george.lbl.gov
Tool manual pages
This section provides a version of the manpage documentation available, via the UNIX man command, for each of the tools in the NetLogger Python distribution.
netlogd(1)
NAME
netlogd - Receive logs over TCP or UDP and write them to a file.
SYNOPSIS
netlogd [options]
DESCRIPTION
The netlogd program combines one or more streams of newline-delimited log records into a single file. No checking is done as to the format of the records. Records are freely interleaved in a first-come, first-written manner. UDP and TCP mode cannot be used together.
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -b, --fork
-
fork into the background after starting up
- -f, --flush
-
flush all outputs after each record
- -k TIME, --kill=TIME
-
Kill self after some time. Time can be given in units 's', 'm' or 'h' for seconds, minutes or hours. Default units are minutes ('m')
- -o URL, --output=URL
-
Output file(s), repeatable (default=stdout)
- -p PORT, --port=PORT
-
port number (default=14380)
- -r SIZE, --rollover=SIZE
-
roll over files at given file size (units allowed)
- -U, --udp
-
listen on a UDP instead of TCP socket
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
EXAMPLES
To receive records on the default TCP port and write them to standard output:
$ netlogd
To receive records on UDP port 44351 and write them to file /tmp/combined.log:
$ netlogd -U -p 44351 -o /tmp/combined.log
EXIT STATUS
netlogd returns zero on success, non-zero on error
BUGS
None known.
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_broker(1)
NAME
nl_broker - Runs an "information broker" that accepts streams of NetLogger best-practices formatted data and forwards the streams to one or more loader clients.
SYNOPSIS
nl_broker [options]
DESCRIPTION
This program accepts incoming streams of NetLogger (a.k.a. CEDPS Best-Practices) formatted data over TCP from one or more sources. The streamed data are then passed to one or more attached nl_load processes. The nl_load processes then take the data and reformat to a output format, load into a database back-end or filter based on a client defined criteria.
This program would generally be invoked on the command line and run in the background. It would normally be invoked first, followed by the attachment of one or more nl_load processes, before receiving incoming streamed data. The broker does not buffer information, so if there are no nl_load processes attached to harvest and process the streams the data will not be processed.
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -l ADDR
-
Bind to local interface ADDR (default=localhost
- -p PORT
-
Listen for incoming data streams on PORT (default=14380)
- -P PORT
-
Listen for client connections on PORT (default=15380)
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
USAGE
When run without options, nl_broker will bind to the localhost interface on the machine that it is being run on, with a default ports for incoming data and client connections. The interface and port bindings may be overridden on the command line.
SIGNALS
-
SIGTERM, SIGINT, SIGUSR2: Terminate gracefully
BUGS
None known.
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_check(1)
NAME
nl_check - Check a log file for correctness.
SYNOPSIS
nl_check [options] [filename]
DESCRIPTION
Checks that a log file is formatted according to the CEDPS project "Best Practices" guide format (see RESOURCES).
Files are read from a list given on the command line or, if no files are listed, from standard input. Each line that does not conform is reported to standard output. Warnings and errors are printed to standard error, as well as the optional "progress" (useful for large files). In addition, the user may opt to make a copy of each input file with the offending lines removed (see -c option for details).
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -c, --clean
-
write a copy of all 'clean' lines to stdout
- -f, --fast
-
Do a quick-and-dirty check
- -p, --progress
-
report progress to stderr
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
EXAMPLES
To print out errors in files a.log, b.log, and c.log to stdout:
nl_check a.log b.log c.log
To combine valid lines from files a.log, b.log, and c.log into cleaned.log, printing out errors to stderr:
nl_check -cx < a.log b.log c.log > cleaned.log
To check file big.log, copying valid lines to big.log.cleaned, showing progress (and validation errors) to stderr:
nl_check -p -c .cleaned big.log
EXIT STATUS
nl_check returns zero on success, non-zero on failure
BUGS
None known.
RESOURCES
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_config_verify(1)
NAME
nl_config_verify - Verify a configuration file using a user-provided specification file.
SYNOPSIS
nl_config_verify specification-file [files ..]
DESCRIPTION
Use a single specification file to check one or more configuration files. The results of the check are reported to standard output, and success or failure of the validations is also reflected in the exit status.
The program arguments are simply a specification file and zero or more configuration files to validate. If zero configuration files are given, then standard input is used; please note that in this case, if the configuration uses the "@include" mechanism, this will only work if the included files are in the current directory.
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
SPECIFICATION SYNTAX
The specification file must itself be a valid configuration file. Special keywords in the specification file use % as their first character, so keywords starting with % must not be used elsewhere. The overall syntax for a specification file is a list of configuration file fragments (%spec), then a list of boolean expressions using these fragments (%rule), and finally a single expression giving the order to apply the rules (%apply).
%spec NAME1 ..config file fragment.. %spec NAME2 ..another fragment.. %rule RULE1 %NAME1 or %NAME2 # expression %rule RULE2 %NAME1 and %NAME2 # expression # order to apply rules %apply RULE1 RULE2
A configuration file fragment starts with a line containing ‘`%spec NAME'', where 'NAME' is a valid Python variable identifier (but otherwise arbitrary). The configuration fragment continues until another ``%spec’' or “%rule” are encountered at the start of the line.
Each configuration file fragment lists all valid sections and valid keywords of each section. If you want to allow other sections, use the special section wildard, [*]. Within a section, arbitrary keywords can be allowed by adding the keyword wildcard __ANY__. Conversely, if you want to require that a listed section is present in all inputs, prefix either the section or keyword with required_.
%spec myspec [required_foo] # 'foo' section is required required_bar = int # 'bar' keyword is required __ANY__ = # any other keywords are allowed [*] # any other sections are allowed
The values for the keywords are a type name indicating the range of allowable values. The possible type names are:
-
str: String, i.e., anything.
-
int: Integer
-
float: Floating-point number
-
bool: Boolean value — yes/no, true/false, 0/1, on/off
-
path: Same as str, really just documentation. Does not cause the validator to look for the file in the current filesystem.
-
uri: Minimal URL requirements: a sequence of word characters, followed by ://, followed by one or more non-slashes, then anything. This allows http(s), ftp, and all the database URIs.
-
enum: Enumeration. This one is special in that after the type name there should be a list of one or more valid strings. The input must match one of those strings.
After all the fragments, rules are listed, each on a new line containing “%rule NAME EXPR”. The NAME should be a valid Python identifier, and the EXPR is a boolean expression using (boolean) operators, parentheses for grouping, and %NAME references to configuration fragments. The %apply directive comes after all the rules. The first token indicates when to declare success: if it is all then all rules must match; if it is any, then any matching rule stops the validation. After this token comes a list of previously defined rules; this is the order in which they will be tried.
Although the preceding paragraph may seem complex, in most cases the usage of the %rule and %apply directive will be straightforward. For example, if there is only a single %spec section, it will look like this:
%spec myspec # .. config fragment here %rule rule1 %myspec %apply all rule1
Thats all there is to the specification syntax. For a full example, see the Examples section.
EXAMPLES
Validate my.conf with the specification my.spec.
nl_config_verify my.spec my.conf
Validate my.conf1 and my.conf2 with the specification my.spec, and print informational messages to standard error.
nl_config_verify -v my.spec my.conf1 my.conf2
Validate ./path/to/my.conf (from stdin) with the specification my.spec, and print debugging messages to standard error. Because the path to my.conf is not known to nl_config_verify, this will not work if my.conf tries to "@include" files from its own directory.
nl_config_verify -v -v my.spec < ./path/to/my.conf
Below is an example of a specification file for the nl_parser configuration. Its syntax is valid, but the contents may have drifted out of date.
%spec static [global] files_root = path state_file = path tail = bool output_file = path [parsers] files = path [[*]] __ANY__ = str [logging] [[loggers]] [[[*]]] level = enum ERROR WARN INFO DEBUG TRACE handlers = str qualname = str propagate = int [[required_handlers]] [[[h1]]] level = enum ERROR WARN INFO DEBUG TRACE handlers = str class = str args = str %spec dynamic [global] files_root = path state_file = path tail = bool output_file = path [parsers] files = str pattern = str [[bp]] [[[match]]] app = str [[[parameters]]] has_gid = bool [logging] [[loggers]] [[[netlogger]]] level = enum ERROR WARN INFO DEBUG TRACE handlers = str qualname = str propagate = int [[handlers]] [[[h1]]] level = enum ERROR WARN INFO DEBUG TRACE handlers = str class = str args = str %rule static_rule (%static) %rule dynamic_rule (%dynamic) %apply any static_rule dynamic_rule
EXIT STATUS
nl_config_verify returns zero if all validations succeeded, a positive number less than 255 if one or more configuration files failed to validate (equal to the smaller of the number that failed and 254), and 255 if there was some other error like a non-existent file or invalid specification syntax.
BUGS
None known.
RESOURCES
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_cpuprobe(1)
NAME
nl_cpuprobe - Measure CPU availability by active probing.
SYNOPSIS
nl_cpuprobe [options]
DESCRIPTION
Measure CPU availability by periodically spawning off a process that spins in a tight loop, and measuring the amount of the CPU we were able to get during that time. This should in theory be similar to the amount of resources a user application could claim.
For each probe, output is a line with a single floating-point number representing the estimated available CPU, in the range 0 to 1.
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -m MS, --millis=MS
-
number of milliseconds out of every second to run the probe (default=100)
- -n NICE, --nice=NICE
-
nice value to give to the process while probing (default=0)
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
EXAMPLES
To run with spin-interval 50ms and nice value of 0:
$ nl_cpuprobe -m 50
To run, as root, with spin-interval 100ms and nice value of -5:
$ sudo nl_cpuprobe -m 100 -n -5
EXIT STATUS
nl_cpuprobe returns zero on success, non-zero on error
BUGS
None known.
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_date(1)
NAME
nl_date - Convert floating-point dates to NetLogger string dates, and vice-versa
SYNOPSIS
nl_date [dates…]
DESCRIPTION
This utility just converts one or more dates from the number of seconds since the Epoch (1/1/1970 00:00:00) to the ISO8601 string representation YYYY-MM-DDThh:mm:ss.ffffffZ, or vice-versa. The type of a given input is auto-detected. NetLogger’s own parsing and formatting routines are used, so this utility doubles as a sanity-check of those functions.
The date to convert is read from the command line, and output is printed to standard output in the form: "input => output". If no date is provided, then the output shows the current date in both formats, with the prefix "now => ".
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -u
-
interpret given date or default 'now' as being in UTC (default=False, local timezone).
- -U
-
show result in UTC (default=False, local timezone)
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
EXAMPLES
To print out the current date in both formats:
$ nl_date now => 2008-09-24T20:17:40.594915-08:00 => 1222316260.594915
To convert a floating-point date to a string:
$ nl_date -s 1185733072.567627 1185733072.567627 => 2007-07-29T18:17:52.567627Z
To convert a string date to a floating-point date:
$ nl_date -d 2007-07-29T18:17:52.567627Z 2007-07-29T18:17:52.567627Z => 1185733072.567627
EXIT STATUS
nl_date always returns zero (success). If the arguments are not understood it just prints the current date.
BUGS
None known.
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_dup(1)
NAME
nl_dup - Count duplicate lines in a file
SYNOPSIS
nl_dup [file]
DESCRIPTION
This utility counts the number of duplicated lines in a log file. The definition of "duplicated" is whether the line has the same (MD5) hash as any other.
A simple report at the end tells how many unique, total, and duplicated lines were in the file.
Each line is hashed and the hash digest is stored in a dictionary, so very large files will use very large amounts of memory.
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -g
-
Show a progress bar
- -o FILE
-
Write unique lines to FILE (default=no)
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
EXAMPLES
To count the number of duplicates from standard input
$ printf "hello\nhello\ngoodbye\n" | nl_dup 2 unique lines out of 3 (1 duplicates)
To count the number of duplicates in a file, with progress
$ nl_write -n 100000 > /tmp/myfile $ cat /tmp/myfile >> /tmp/my2files $ cat /tmp/myfile >> /tmp/my2files $ nl_dup /tmp/my2files -g 100000 unique lines out of 200000 (100000 duplicates)
EXIT STATUS
nl_dup returns zero (success) if the input file can be read, and it is not interrupted with a signal. If the input file cannot be read, it returns 2. If it is interrupted with a signal or by keyboard interrupt, it prints a report of what it knows so far and returns 1.
BUGS
None known.
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_findbottleneck(1)
NAME
nl_findbottleneck - Find bottleneck from NetLogger transfer summary logs.
Synopsis
nl_findbottleneck [options] [log-file]
DESCRIPTION
Determine the bottleneck from NetLogger logs that show the disk and network read and write bandwidths. The input is a NetLogger log, specifically the one produced by NetLogger’s "transfer" API, although in reality the only fields that need to be present are the correct event name (see below) and:
r.s: sum of bytes/sec ratio
nv: number of items in the sum for r.s
The event name is expected to contain one of four values indicating the component being measured; "disk.read", "disk.write", "net.read", and "net.write". As long as this string appears somewhere in the event name, it will be recognized.
The output is the bottleneck, or "unknown". Optionally (with -v), the sorted list of bandwidths is written as well.
Although the options provide for multiple bottleneck algorithms, at present only one is implemented — the "simple" algorithm that basically looks for the smallest number and labels that the bottleneck if it is more than 15% smaller than the next smallest. For details see the netlogger.analysis.bottleneck module.
Note that parse errors in the input files will be silently ignored. If the -d flag is given, then parse errors will show up as debug messages in the log, but they still will not stop the program.
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -a ALG, --algorithm=ALG
-
choose bottleneck algorithm by name (default=simple)
- -d, --debug
-
log debugging information, including parsing errors
- -r, --report
-
print a longer report to the console
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
EXAMPLES
To determine the bottleneck from my_transfer.log:
nl_findbottleneck my_transfer.log
EXIT STATUS
nl_findbottleneck returns zero on success, and non-zero on error
BUGS
None known.
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_findmissing(1)
NAME
nl_findmissing - Find and display "missing" events in NetLogger (CEDPS Best-Practices format) logs.
SYNOPSIS
nl_findmissing [options] [files..]
DESCRIPTION
Read NetLogger logs as input and produce as output any .start/.end events that are missing their matching event. The user specifies what fields of a logged event are used for comparison, and this is even flexible enough to even allow different event names to be matched to each other.
Logs are read from standard input or a file, and output is written to standard output. Input lines in the logfile that are not understood, are silently ignored.
The -i/--ids option can be used to specify which fields should be used to match a starting event with its ending event. Optionally, a pattern can be placed before a ‘`:'' to filter the events that are being considered at all. If this option is not provided, then all events are considered and the fields 'event' and 'guid' (i.e. as if the user specified ``-i event,guid’') are used to match starting and ending events. This option may be repeated, so that different sets of events can use different sets of identifiers.
There are three output formats (see EXAMPLES):
-
Human-readable
-
Comma-separated values (CSV)
-
Best Practices logging format (BP)
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -i IDS, --ids=IDS
-
Set of identifying fields for a given event pattern, using the syntax: [EVENT_REGEX:]FIELD1,..,FIELDN (default='guid')
- -t FMT, --type=FMT
-
Output type (default=human)
- -p, --progress
-
report progress to stderr
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
EXAMPLES
To process the logs and produce human-readable output:
$ nl_findmissing -t human log2 log2: lala.13 missing end log2: po.34 missing end
To process the logs and produce CSV output:
$ nl_findmissing -t csv log2 file,event,missing,key log2,lala.13,end,lala.13/A2C4144D-7684-FA3E-8F5B-F0E34D8BC18E log2,po.34,end,po.34/6275D71E-D023-A9F6-742E-6512DD90A1F1
To process the logs and produce BPoutput:
$ nl_findmissing -t log log2 ts=2008-09-25T18:42:13.635438Z event=lala.13.start level=Info guid=A2C4144D-7684-FA3E-8F5B-F0E34D8BC18E nl.missing=end mode=random file=log2 guid=b09f6896-8b41-11dd-964e-001b63926e0d ts=2008-09-25T18:42:13.635929Z event=po.34.start level=Info guid=6275D71E-D023-A9F6-742E-6512DD90A1F1 nl.missing=end mode=random file=log2 p.guid=A2C4144D-7684-FA3E-8F5B-F0E34D8BC18E guid=b09f6896-8b41-11dd-964e-001b63926e0d
To match events starting with airplane on attributes flightno and airline, and all other events on a combination of country and city:
nl_findmissing -t log -i airplane:flightno,airplane -i country,city in.log > out.log
EXIT STATUS
nl_findmissing returns zero on success, non-zero on failure
BUGS
None known.
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_ganglia(1)
NAME
nl_ganglia - Read Ganglia in, write NetLogger out
SYNOPSIS
nl_ganglia [options]
DESCRIPTION
Contact a Ganglia gmetad, parse the returned XML document, and convert the information into NetLogger-formatted output, with one log entry per metric.
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -e REGEX, --filter=REGEX
-
regular expression to use as a filter. This expression operates on the formatted output, i.e. name=value pairs
- -i SEC, --interval=SEC
-
poll interval in seconds (default=run once)
- -m METRICS, --metrics=METRICS
-
set of metrics to display (default=base)
- -o FILE, --output=FILE
-
output file (default=stdout)
- -s SERVER, --server=SERVER
-
gmetad server host (default=localhost)
- -p PORT, --port=PORT
-
gmetad server port (default=8651)
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
EXAMPLES
To contact ganglia on default port and dump one set of default metrics to the console:
nl_ganglia
To contact ganglia on server foobar.org once every 15 seconds, and write the subset of returned metrics that contains cpu in the event name to the file /tmp/ganglia.out:
nl_ganglia -e event='.*cpu' -s foobar.org -o /tmp/ganglia.out -i 15
EXIT STATUS
nl_ganglia returns zero on success, non-zero on failure
BUGS
None known.
RESOURCES
Ganglia Monitoring System - http://ganglia.info
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_interval(1)
NAME
nl_interval - Read NetLogger logs as input and output the interval between the .start and .end events.
SYNOPSIS
nl_interval [options] [files..]
DESCRIPTION
Read NetLogger logs as input and produce as output intervals between .start/.end events. The user specifies what fields of a logged event are used for comparison, and this is even flexible enough to even allow different event names to be matched to each other.
Logs are read from standard input or a file, and output is written to standard output. Input lines in the logfile that are not understood, are silently ignored.
The -i/--ids option can be used to specify which fields should be used to match a starting event with its ending event. Optionally, a pattern can be placed before a ‘`:'' to filter the events that are being considered at all. If this option is not provided, then all events are considered and the fields 'event' and 'guid' (i.e. as if the user specified ``-i event,guid’') are used to match starting and ending events. This option may be repeated, so that different sets of events can use different sets of identifiers.
There are three output formats (see EXAMPLES):
-
Human-readable
-
Comma-separated values (CSV)
-
Best Practices logging format (BP)
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -d, --duplicates
-
Allow duplicate start events without end events, or end events without a start, and match them in FIFO order. Default is to drop old .start or .end events when new ones come in
- -c COLUMNS, --columns=COLUMNS
-
For type 'csv', comma-separated list of additional columns that should be in the output
- -g, --progress
-
report progress to stderr
- -i IDS, --ids=IDS
-
Set of identifying fields for a given event pattern, using the syntax: [EVENT_REGEX:]FIELD1,..,FIELDN (default=.*:event,guid). May be repeated.
- -n NBINS, --nbins=NBINS
-
For --type=hist, number of histogram bins. The default is to automatically choose the number of bins using the standard 'Scott' formula
- -r, --ordered
-
Process data in file order: drop duplicate ends, replace duplicated starts
- -s FILE, --save-file=FILE
-
Write unfinished events to FILE (default=drop them)
- -t FMT, --type=FMT
-
Output type (default=csv). Other choices are: csv =Comma-separated values, log=NetLogger log format, hist=Histogram
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
EXAMPLES
Process in.log and produce human-readable output:
$ nl_interval < in.log lala.24 0.000059 po.13 0.000041 tinkywinky.81 0.000039 tinkywinky.55 0.000042
Process in.log and produce CSV output:
$ nl_interval -t csv < in.log event,key,interval_sec lala.24,lala.24/C24391AA-4D28-78B1-D59C-9C96627F256F,0.000059 po.13,po.13/1C746366-6C8A-3238-7CF2-313C417ECF96,0.000041 tinkywinky.81,tinkywinky.81/31A15BAD-4AEE-1E63-7ACD-C6EB8CF8547B,0.000039 tinkywinky.55,tinkywinky.55/9A16401D-5643-69BF-DFE9-A95692A349A4,0.000042
Process in.log and produce log output:
$ nl_interval -t log < in.log ts=2008-09-25T18:42:13.636326Z event=lala.24.intvl level=Info status=0 guid=C24391AA-4D28-78B1-D59C-9C96627F256F nl.intvl=0.000059 mode=random p.guid=6275D71E-D023-A9F6-742E-6512DD90A1F1 ts=2008-09-25T18:42:13.636653Z event=po.13.intvl level=Info status=0 guid=1C746366-6C8A-3238-7CF2-313C417ECF96 nl.intvl=0.000041 mode=random p.guid=6275D71E-D023-A9F6-742E-6512DD90A1F1 ts=2008-09-25T18:42:13.636927Z event=tinkywinky.81.intvl level=Info status=-1 guid=31A15BAD-4AEE-1E63-7ACD-C6EB8CF8547B nl.intvl=0.000039 mode=random p.guid=6275D71E-D023-A9F6-742E-6512DD90A1F1 ts=2008-09-25T18:42:13.637220Z event=tinkywinky.55.intvl level=Info status=0 guid=9A16401D-5643-69BF-DFE9-A95692A349A4 nl.intvl=0.000042 mode=random p.guid=6275D71E-D023-A9F6-742E-6512DD90A1F1
Match events starting with airplane on flightno and all other events on a combination of country and city.
nl_interval -i airplane:flightno -i country,city < in.log > out.log
EXIT STATUS
nl_interval returns zero on success and non-zero on failure
BUGS
None known.
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_load(1)
NAME
nl_load - Process incoming streams of NetLogger formatted data.
SYNOPSIS
nl_load {-a HOST | -c HOST | -f FILE} module_name [option=value..] [prefix1 prefix2 ..]
DESCRIPTION
This program processes streams of NetLogger (a.k.a. Best-Practices) formatted data. It may transform the data to a different file format (CSV, for example), load the data into a database, or act as a filtering mechanism based on client defined needs. Processing logic is encapsulated in “analysis modules”, which are Python modules that follow some simple conventions. The framework can stream input to these modules from one of standard input, a file, the NetLogger broker (nl_broker), or an AMQP broker such as RabbitMQ.
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -c HOST, --host=HOST
-
Connect to NetLogger info-broker at HOST (default=localhost)
- -f FILE, --infile=FILE
-
Read NetLogger logs from FILE (default=stdin)
- -g, --progress
-
report progress to stderr
- -i, --info
-
Print information on selected module
- -l, --list
-
List available modules
- -M FILE, --module-opt=FILE
-
Read module options from a file with one name=value pair per line (default=No file; use command-line)
- -p PORT, --port=PORT
-
For info_broker or amqp server, the port to connect to (default=info_broker 15380, amqp broker 5672)
- -r SEC, --reconnect=SEC
-
If connection to broker at HOST fails, try again every SEC seconds (default=10). 0=don't retry
- -t, --tail
-
With -f, tail the file instead of stopping at EOF
AMQP-specific options:
- -a HOST, --amqp-host=HOST
-
Connect to AMQP server at HOST (default=127.0.0.1)
- -A name
-
=val|:file, --amqp_option=name=val|:file AMQP options; repeatable. Known options: auto_delete (delete queues/exchanges when done), durable (save messages to disk), exchange (exchange name), exchange_type (direct, fanout, or topic), insist (no redirect), pw (password), queue (queue name), route (routing key, @event to use event), user (user name), vhost (virtual host). May also be of the form ':<filename>', e.g. ':/tmp/passwd', which reads the options from a file with one name=value pair per line.
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
USAGE
The nl_load program is comprised of two parts. The main program, nl_load, and the analysis modules. Every invocation of nl_load uses one analysis module to process the log. Inputs are selected by providing one of the -c/--host, -f/--file, or -a/--amqp-host options. To write your own module, see existing modules under netlogger/analysis/modules.
SIGNALS
-
SIGTERM, SIGINT, SIGUSR2: Terminate gracefully
EXAMPLES
No-op: load with the "bp" loader from standard input to standard output.
nl_load bp < infile > outfile
To invoke nl_load, attach to a nl_broker process on the local machine, load the csv processing module and output the transformed data to a file:
nl_load -c localhost csv > bp_outfile.csv
Load data from an AMQP broker, with given exchange and queue, into MongoDB.
nl_load -a my.data.broker -A exchange=myex -A queue=bpdata mongodb database=mydb collection=mycollection host=my.db.host
Load data from an AMQP broker, configured from a file, into MongoDB, also configured from a file.
nl_load -a my.data.broker -A :/tmp/amqp.conf -M /tmp/mongo.conf mongodb
BUGS
None known.
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_notify(1)
NAME
nl_notify - Run a command and notify by email if it fails.
SYNOPSIS
nl_notify [options] command args..
DESCRIPTION
Runs a given command with its arguments. If return status from the command is non-zero, send the standard output and standard error, with an appropriate subject line, to the provided email address. If the return status from the command is zero, do nothing.
Email is sent by default to localhost, port 25. Values for the "From:" and "To:" fields must be provided by the user.
Note: If the command’s arguments include a dash then they need to be quoted (you can quote the whole command if you want).
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -b SUBJECT, --subject=SUBJECT
-
Email subject (default=Error on %host from '%prog')
- -f user@host, --from=user@host
-
Set 'From:' to user@host (required)
- -g, --nagios
-
Nagios mode. Put first line of standard output in '%status'. Add this to default subject line (default=No)
- -n, --test
-
Print to stdout instead of sending email
- -p SERVER_PORT, --port=SERVER_PORT
-
SMTP server port (default=25)
- -s HOST, --server=HOST
-
SMTP server host (default=localhost)
- -t user@host, --to=user@host
-
Set 'To:' to user@host (required)
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
EXAMPLES
To write what would have happened to standard output:
$ nl_notify --from user@somehost.com --to user@otherhost.org --test /usr/bin/false Connect to localhost:25 To: user@otherhost.org From: user@somehost.com Subject: Error on 192.168.1.101 (Macintosh-8.local) from '/usr/bin/false' Output from '/usr/bin/false': -- stdout -- -- stderr --
To run nl_check_pipeline in “nagios mode”, which allows you to include the status in the subject line:
$ nl_notify -b "Hey: %host says \'%status\'" \ -f user@somehost.org -t user@otherhost.com \ -g -p 9999 nl_check_pipeline Subject: Hey: 192.168.1.101 (Macintosh-8.local) says 'CRITICAL: 3 components not running' Output from '../../scripts/nagios/nl_check_pipeline': -- stdout -- CRITICAL: 3 components not running -- stderr --
EXIT STATUS
nl_notify returns zero on success, nonzero on an error.
BUGS
None known.
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_parse(1)
NAME
nl_parse - Program to read from a variety of log formats, reformat to NetLogger best-practices format, and sent the results to a file or information broker (nl_broker).
SYNOPSIS
nl_parse [options] module [params..] [files..]
DESCRIPTION
This program converts from known log formats to NetLogger (a.k.a. CEDPS Best-Practices) format and sends the results to either a file, stdout, the NetLogger information broker (nl_broker), or an AMQP broker. There are a number of built-in parsers and these may be listed by invoking with the -l/--list flag. The nl_parse program can operate on a single file, a directory of files/filename pattern-matching, can rescan a directory for new files, and can tail files.
This program can either be invoked for a "single run" to process a file or number of files, or can be run in the background to watch new and/or changing input files.
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -c HOST, --broker=HOST
-
Write parsed data to NetLogger broker at HOST(default port=15380)
- -d, --amqp_disconnect
-
send disconnect message to AMQP server when done. No effect if not used with -a.
- -f INTERVAL, --flush=INTERVAL
-
Flush output file after INTERVAL seconds of inactivity (default=1)
- -g, --progress
-
report progress to stderr
- -i, --info
-
Print information on selected module
- -l, --list
-
List available modules
- -o FILE, --output=FILE
-
Write NetLogger logs to FILE (default=stdout)
- -O FILE, --offset-file=FILE
-
Load/maintain file offsets in FILE, so that subsequent runs don't process duplicate data (default=none)
- -p PORT, --port=PORT
-
For info_broker or amqp server, the port to connect to (default=info_broker 15380, amqp broker 5672)
- -r SEC, --reconnect=SEC
-
If connection to broker at HOST fails, try again every SEC seconds (default=10). 0=don't retry
- -s SEC, --rescan=SEC
-
Rescan directory for files matching the input patterns every SEC seconds (default=10)
- -t, --tail
-
Tail input files instead of stopping at EOF
AMQP-specific options:
- -a HOST, --amqp-host=HOST
-
Connect to AMQP server at HOST (default=127.0.0.1)
- -A name
-
=val|:file, --amqp_option=name=val|:file AMQP options; repeatable. Known options: auto_delete (delete queues/exchanges when done), durable (save messages to disk), exchange (exchange name), exchange_type (direct, fanout, or topic), insist (no redirect), pw (password), queue (queue name), route (routing key, @event to use event), user (user name), vhost (virtual host). May also be of the form ':<filename>', e.g. ':/tmp/passwd', which reads the options from a file with one name=value pair per line.
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
SIGNALS
-
SIGTERM, SIGINT, SIGUSR2 - Terminate gracefully
EXAMPLES
Parse a directory (“logdir”) of gridftp formatted logs (load gridftp parser module as an arg) that end with a .log extension, and send the best-practices formatted data to an information broker running on localhost.
nl_parse -c localhost gridftp "logdir/*.log"
Similar to previous example, but watch a directory of best-practices formatted logs (bp parser module), re-scan the directory every 30 seconds looking for new logs, tail the log files rather than stopping at EOF, and write the results out to an output file.
nl_parse -f output.bp -t bp -s 30 "logdir/*.log"
Parse all the log files under directories "a", "b", and "c", and send them to an AMQP broker on some.host.org to an exchange named "logs" with routing key "log.gridftp". Also save the position in each file in the file "/tmp/offsets.dat" so that subsequent runs won’t re-send old data.
nl_parse -O /tmp/offsets.dat -t -a some.host.org -A exchange=logs -A route=log.gridftp gridftp hostnames=yes "{a,b,c}/*.log"
BUGS
None known.
RESOURCES
ConfigObj home page - http://www.voidspace.org.uk/python/configobj.html
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_view(1)
NAME
nl_view - Re-format NetLogger logs.
SYNOPSIS
nl_view [options] [files..]
DESCRIPTION
Reformats the semi-structured keyword and value pairs of the NetLogger format for readability or importing into Excel, R, or other programs that require tabular data.
The time and event is always shown, although the time can be formatted either as an absolute ISO timestamp (the default), or as a number of seconds since the first or previous event. An arbitrary prefix can be stripped from event names (names without that prefix are of course left alone).
The default delimiter between columns is a space, but this can be changed to make, e.g., comma-separated values. Currently no quoting is done.
Special support for identifiers is provided with the -t/--tiny-id option, which replaces the value of the identifier with a short (4-character) locally unique value. This value is random, but the seed is always the same and the algorithm is deterministic, so the chosen value will be the same for successive invocations.
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -a ATTR, --attr=ATTR
-
add attribute ATTR to output line, repeatable
- -A, --all
-
add all attributes to output line
- -c, --cum-delta
-
show times as deltas since first (defalt=False)
- -d, --delta
-
show times as deltas from previous (defalt=False)
- -D DELIM, --delimiter=DELIM
-
column delimiter (default=' ')
- -e, --long
-
Break each attribute onto its own line. Voids other formatting options and implies '-A'.
- -g, --guid
-
add 'guid' attribute
- -H, --header
-
add header row (default=False)
- -i, --host
-
add 'host' attribute
- -I, --identifiers
-
add any attribute ending in '.id'
- -l, --level
-
add 'level' attribute'
- -m
-
add 'msg' attribute
- -n PREFIX, --namespace=PREFIX
-
strip namespace PREFIX if found
- -N, --no-names
-
Do not show attribute names
- -s, --status
-
add 'status' attribute
- -t, --tiny-id
-
replace *.id and guid values with shorter id's, like tinyurl
- -w NUM, --width=NUM
-
set event column width to NUM (default=40)
- -x
-
ignore non-NetLogger lines
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
EXAMPLES
To put the viewer in a pipeline between the application and a pager:
my-application | nl_view -gi | less
To run the viewer on a bunch of files, showing some user-defined attributes:
nl_view -a foo -a bar *.log > combined.log
To run the viewer so that it displays time-deltas, guid, event name with a prefix stripped, and any "identifier" attributes (this particular set of values is useful for the Globus 4.2 containerLog):
nl_view -diIgmt --namespace=org.globus. containerLog
EXIT STATUS
Always succeeds, returning 0.
BUGS
None known.
RESOURCES
Apache Common Log Format - http://httpd.apache.org/docs/2.2/logs.html
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_wflowgen(1)
NAME
nl_wflowgen - Generate simulated workflow logs.
SYNOPSIS
nl_wflowgen [options] [-h]
DESCRIPTION
Generate random workflow logs in BP (NetLogger) format.
Two distinct types of simulated workflows can be generated. The random workflow is simply a random tree of events, linked together with GUIDs. The globus type workflow is not entirely like the logs from a Globus (GT4.2+) job submission.
How deeply workflows are nested is determined by the --mindepth and --maxdepth options, whereas the probability that the next event in any given workflow will be nested (if allowed by the min/max depth) is controlled by the --nest option.
Each ending event for a workflow has an associated status attribute. The probability of that being non-zero, i.e. indicating failure, is controlled with the --fail option.
OPTIONS
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -m MODE, --mode=MODE
-
Run mode (default=random). Modes: 'random' = a random workflow 'tree'; 'globus' = Globus job submit
- -o OFILE, --output=OFILE
-
output filename. use stdout if not given
- --num=NUM
-
[random, globus] number of events, total (default=100)
- --mindepth=MIN_DEPTH
-
[random] minimum number of nested events in a workflow (default=1)
- --maxdepth=MAX_DEPTH
-
[random] maximum number of nested events in a workflow (default=5)
- --fail=FAIL
-
[random] probability of failure for a .end event (default=0.1)
- --nest=NEST
-
[random] probability of nesting events, at any point (default=0.5)
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
EXAMPLES
To produce a default random workflow to standard output:
nl_wflowgen
To produce a default globus workflow to standard output:
nl_wflowgen -m globus
EXIT STATUS
Returns zero on success, non-zero on error
BUGS
None known.
RESOURCES
The Globus Alliance - http://www.globus.org
AUTHOR
Dan Gunter <dkgunter@lbl.gov>
............................................................
nl_write(1)
NAME
nl_write - Write a NetLogger-formatted message.
SYNOPSIS
nl_write [options] name=value..
DESCRIPTION
Write one NetLogger-formatted message to standard output, TCP, or UDP. Any number of name=value pairs can be given as arguments. These will be copied to the output along with the standard values of ts=<timestamp> and event=<event_name>, to form a properly formatted log message.
OPTIONS
Note: single-letter options in upper-case control how things are logged, whereas lower-case options control what is logged.
- --version
-
show program's version number and exit
- -h, --help
-
show this help message and exit
- -g, --guid
-
add guid=GUID to message. This is overridden by an explicit guid=GUID argument.
- -i, --ip
-
add 'host=IP' to message. This is overridden by an explicit host=HOST argument.
- -n NUM, --num=NUM
-
Write NUM messages, each with n=<1..NUM> in them (default=1)
- -H HOST, --host=HOST
-
for UDP/TCP/AMQP, the remote host (default=localhost)
- -P PORT, --port=PORT
-
For UDP/TCP/AMQP, the port to write to (default=UDP 514, TCP 14380, AMQP 5672)
- -S, --syslog
-
add a header for syslog (default=False unless -U is given, then True)
- -T, --tcp
-
write message to TCP (default port=14380)
- -U, --udp
-
write message to UDP (default port=514)
- -A, --amqp
-
write message to AMQP server (default port=5672)
- -D, --amqp_disconnect
-
send disconnect message to AMQP server when done. No effect if not used with -A.
- -O name
-
=val, --amqp_option=name=val optional arg to feed name/value options to amqp connection/producer. (repeatable: -O user=foo -O pw=bar)
Logging options:
- -L FILE, --log=FILE
-
write logs to FILE (default=stderr)
- -R TIME, --logrotate=TIME
-
rotate logs at an interval (<N>d or <N>h or <N>m)
- -v, --verbose
-
more verbose logging
- -q, --quiet
-
quiet mode, no logging
EXAMPLES
To write the default message:
nl_write
To write a message with a host, guid, and attributes foo and bar:
nl_write -g -i foo=12345 bar='hello, world'"
To write a syslog-formatted message to the standard syslog UDP port (514):
nl_write -g -U msg='hello, world'
EXIT STATUS
Returns zero on success, non-zero on error
BUGS
The host option always uses the default interface.
There is no way to write a message with a user-defined timestamp, the time is always "now".
AUTHOR
Dan Gunter <dkgunter@lbl.gov>