Wednesday 7 March 2012

Structured logging, standardised

Moves are afoot in the Linux world to standardise structured logging – logging in such a way that log event data can be parsed in a standard fashion to extract useful information for later analysis. One such project is Fedora’s lumberjack, which aims to implement the specifications proposed in MITRE’s CEE (Common Event Expression). Another Fedora initiative is embodied in the ELAPI project, which covers a slightly wider problem domain.

The basic idea is to format messages using structured formats such as JSON or XML. These messages can be sent conventionally across existing transport mechanisms such as syslog, and the aim is that in due course, structured messages can be routed to structured message stores (e.g. MongoDB databases) rather than plain text files.

Python logging already supports structured logging in a generic way, with third-party handlers available both for RDBMS and MongoDB. However, a key element of the emerging standards is interoperability, so it may be that some Python application developers will need to support these standards at some point.

This support can be provided using a custom formatter class. For example,let’s look at using the CEE standard and assume that JSON will be used as the structured format, and syslog as the transport. A CEE event must contain, as a minimum, the following fields:

Name Description
p_proc The name of the process that produced the event.
p_sys The hostname of the system that generated the event.
time The time the event occurred. This timestamp should have microsecond granularity (1E-6 seconds) and include the timezone GMT offset.

The time format needed is not quite the same as the default provided by logging, but it should be easy enough to provide using the datetime class. The complete CEE schema is quite involved and I won’t attempt to detail it here – you can use the links provided above if you need more information.

Here is an example script showing how you can use a formatter to provide CEE-compliant log output:

#!/usr/bin/env python
import datetime
import json
import logging

import os
import socket
import sys

import pytz

class CEEFormatter(logging.Formatter):
    cee_token = ' cee: '

    def __init__(self, fmt=None, datefmt=None, tz=None):

        # We don't use super because logging classes are
        # old-style classes on 2.x
        logging.Formatter.__init__(self, fmt, datefmt)

        self.hostname = socket.gethostname()
        self.procname = os.path.basename(sys.argv[0])
        self.tz = tz

    def format(self, record):
        result = logging.Formatter.format(self, record)
        structured = self.format_structured(record)
        if structured:
            result += '%s%s' % (self.cee_token, json.dumps(structured))
        return result

    def get_event_time(self, record):
        return datetime.datetime.fromtimestamp(record.created,
                                               tz=self.tz).isoformat()

    def format_structured(self, record):
        result = {
            'Event': {
                'p_proc': self.procname,
                'p_sys': self.hostname,
                'time': self.get_event_time(record)
            },
        }
        return result

def main():
    if len(sys.argv) > 1:
        try:
            tz = pytz.timezone(sys.argv[1])
        except pytz.UnknownTimeZoneError:
            print('Unknown timezone %r, using Europe/London' % sys.argv[1])
            tz = pytz.timezone('Europe/London')
    else:
        tz = pytz.timezone('Europe/London')

    formatter = CEEFormatter('%(message)s', tz=tz)
    record = logging.makeLogRecord(dict(msg='Hello, world!'))

    print(formatter.format(record))

if __name__ == '__main__':
    sys.exit(main())

In the above script, the CEEFormatter class represents just one way in which you can implement a formatter to implement CEE-compatible logging. The CEE structured data is appended to the message, which can (for example) be sent via a SysLogHandler to a local or remote host. The cee_token is a value which indicates to a CEE-aware receiver that the message contains structured data. This is given as "cee:" in the CEE online documentation, but I’ve seen uses of "@cee:" as well. Whatever the standard decides upon, you can set this at a per-instance level for testing etc.

The format method of the CEEFormatter class calls the superclass method to format the message, and then calls format_structured with the record to get a dict indicating what the CEE structure should be. If the method returns a non-empty dict, it is formatted into JSON and appended to the message, preceded by the CEE token. In the above example, the absolutely minimal CEE-compatible message is formatted.

The get_event_time method is provided to factor out the event time format, just in case it has minor format variations you want to support – in which case you need only override this method. The example shows use of pytz to set up timezones, but this may not be needed in your case.

In practice, you can use one of the established methods (extra parameter to logging call, LoggerAdapter or Filter) to pass context information in the LogRecord, which can be used by the format_structured method to add additional, context-dependent information to the structured portion of the message.

If you run the above script (converted into an executable using ln –sT ceefmt.py ceefmt; chmod u+x ceefmt), you might see results similar to the following.

$ ./ceefmt
Hello, world! cee: {"Event": {"time": "2012-03-07T20:21:16.612857+00:00", "p_proc": "ceefmt", "p_sys": "eta-jaunty"}}
$ ./ceefmt Europe/Rome
Hello, world! cee: {"Event": {"time": "2012-03-07T21:21:29.021881+01:00", "p_proc": "ceefmt", "p_sys": "eta-jaunty"}}
$ ./ceefmt America/Chicago
Hello, world! cee: {"Event": {"time": "2012-03-07T14:21:41.745214-06:00", "p_proc": "ceefmt", "p_sys": "eta-jaunty"}}
$ ./ceefmt America/Anchorage
Hello, world! cee: {"Event": {"time": "2012-03-07T11:21:47.686027-09:00", "p_proc": "ceefmt", "p_sys": "eta-jaunty"}}
$ ./ceefmt Pacific/Auckland
Hello, world! cee: {"Event": {"time": "2012-03-08T09:21:57.687873+13:00", "p_proc": "ceefmt", "p_sys": "eta-jaunty"}}

Comments welcome.