Sunday 29 April 2012

Introspecting the logger hierarchy

Brandon Rhodes’ post about his logging_tree project got me thinking about providing this sort of functionality in the logging package. Though introspection of the logger hierarchy has been mentioned a couple of times in newsgroup postings over the years, it has never actually been requested as a fleshed-out feature, and until Brandon’s post, it was below my radar. Now, thanks to him, I have given it some attention.

There are a number of things which need to be considered when providing some way of introspecting the logger hierarchy:

  • In many systems, the logging hierarchy doesn’t change once it’s configured. However, there are also systems where it does – logging provides the facility for on-the-fly reconfiguration without having to restart a Python process. Therefore, a general introspection facility must act as a snapshot – the state of the hierarchy at a particular time when the snapshot was requested.
  • The introspection facility must allow both built-in and user-defined Logger, Handler, Formatter, Filter instances to return useful information.
  • The data returned by the introspection facility should be, as far as practicable, immutable. Users should not expect to get direct access to individual logging elements (as there are already specific APIs for doing that).
  • The introspection facility should perhaps eschew any presentation of the hierarchy; this is likely to be very application-dependent, and it is debatable whether it is worth providing a “default” presentation. Example presentations would be a textual printout (such as logging_tree provides) or a depiction of the hierarchy in a browser-based or desktop GUI.

I’ve made an attempt at addressing the above points when implementing my own take on Brandon’s work. Although I’ve no definite plan to include this functionality in Python 3.3, I’m publishing this post in order to get feedback about the approach used, and also whether something like this belongs in the core package.

The basic API is provided by a single function, get_snapshot(), which returns a dictionary. The keys of that dictionary are logger names, with '' being the key for the root logger. The corresponding values are instances of namedtuple subclasses; they are snapshots of the loggers, handlers, formatters and filters.

Here’s how it might work. Running the code snippet

logging.getLogger('logtree.test')
logging.basicConfig(filename='logtree.log')
result = get_snapshot()
tw = textwrap.TextWrapper()
tw.subsequent_indent = ' ' * 4
for name, info in sorted(result.items()):
    print(tw.fill('%r -> %r' % (name, info)))
    print('-'*40)

would display something like

'' -> logging_RootLoggerInfo(class_='logging.RootLogger', level=30,
    propagate=1, children=(('logtree',
    logging_PlaceHolderInfo(class_='logging.PlaceHolder',
    children=(('logtree.test',
    logging_LoggerInfo(class_='logging.Logger', name='logtree.test',
    level=0, handlers=(), disabled=0, propagate=1, filters=(),
    children=())),))),), filters=(),
    handlers=(logging_FileHandlerInfo(class_='logging.FileHandler',
    encoding=None, level=0, delay=None,
    filename='C:\\temp\\logtree.log', mode='a', filters=(),
    formatter=logging_FormatterInfo(class_='logging.Formatter',
    style='%', datefmt=None,
    format='%(levelname)s:%(name)s:%(message)s')),))
----------------------------------------
'logtree' -> logging_PlaceHolderInfo(class_='logging.PlaceHolder',
    children=(('logtree.test',
    logging_LoggerInfo(class_='logging.Logger', name='logtree.test',
    level=0, handlers=(), disabled=0, propagate=1, filters=(),
    children=())),))
----------------------------------------
'logtree.test' -> logging_LoggerInfo(class_='logging.Logger',
    name='logtree.test', level=0, handlers=(), disabled=0,
    propagate=1, filters=(), children=())
----------------------------------------

Of course, this is very simplified formatting, and can be much improved on, but it gives the gist of what’s being returned. The various classes whose names end in Info are the namedtuple subclasses, and the corresponding instances represent particular logging objects. You can see that specific attributes are displayed; these can be customised quite flexibly.

Customising is done via a configuration dictionary – if you don’t supply one, a default is used. The full definition of get_snapshot() is:

def get_snapshot(config=None):
    return Snapper(config).snapshot()

If no configuration dictionary is specified, a default one – available at Snapper.DEFAULT_CONFIG – is used. You can make a copy of that, add or change elements in your copy, and then pass that in to get_snapshot. (You could also subclass the Snapper class if you really need to, but in most cases, you shouldn’t need to.)

The configuration dictionary currently has two keys - attrmap and format – whose values are dictionaries. The dictionary is used when converting each logging object to its snapshot representation. Each object, and each attribute of that object, is passed recursively through the conversion process. This process works roughly as follows:

  1. A dict instance is converted to another dict instance whose keys and values have passed through the conversion process.
  2. Instances of list and tuple are converted to a tuple instance whose elements are the converted source elements.
  3. Instances of int, float, and str are passed through unchanged. None is passed through unchanged. (If needed, this could also be applied to complex and decimal.Decimal types, but it seems unlikely these would be used in a logging configuration.)
  4. For instances, a qualified class name is computed from instance.__class__.__name__ and instance.__module__ (if present). This is used as the key to the attrmap and format configuration sub-dictionaries. The format dictionary is checked first – if the key is present, the value should be a callable which takes a single argument – the instance – and returns the representation. This is used, for example, to convert internal implementation class instances such as PercentStyle, StrFormatStyle and StringTemplateStyle (in Python 3.2 and later) to the strings '%', '{' and '$'. (The value for config['attrmap']['logging.PercentStyle'] is lambda x: '%'.) If the key is not present in format, then attrmap is checked for the key. If present, the value should be a space-separated list of attribute names to bring into the snapshot; if an attribute name starts with an underscore, you can represent this as e.g. _fmt:format, where the instance attribute in the logging object will be _fmt but the field name in the corresponding namedtuple will be format. (Field names in named tuples can’t start with an underscore.) All named tuple classes have the field name class_ included, which holds the class name of the type represented by the named tuple.

This arrangement allows user-defined handler classes, for example, to be easily accommodated by just adding corresponding entries in the configuration dictionary. A part of the default configuration dictionary is shown below:

DEFAULT_CONFIG = {
    'attrmap': {
        'TextIOWrapper': 'name mode encoding', # for Python 3.x
        'file': 'name mode encoding', # for Python 2.x
        'logging.Formatter': 'datefmt _fmt:format _style:style',
        'logging.Logger': 'disabled filters handlers level name '
                          'propagate children',
        'logging.StreamHandler': 'filters formatter level stream',
        'logging.FileHandler': 'filters formatter level '
            'baseFilename:filename mode encoding delay',
        'logging.handlers.SMTPHandler': 'filters formatter level mailhost '
            'mailport fromaddr toaddrs username password subject secure '
            'timeout',
    },
    'format': {
        'logging.PercentStyle': lambda x : '%',
        'logging.StrFormatStyle': lambda x : '{',
        'logging.StringTemplateStyle': lambda x : '$',
    },
}

Just as an exercise, I coded up an implementation of a presentation which echoes Brandon’s formatting. You can see these in this gist, which contains two files: logtree.py, which contains the introspection implementation, and lttest.py, a simple test which defines a logging configuration similar to Django’s default configuration and prints out representations using Brandon’s logging_tree and the logtree.py implementation. The differences between the representations are not especially significant, and relate to the representation of individual handlers:

SHGHFV~A

Comments are welcome on the approach used, the desirability (or otherwise) of adding the logtree.py functionality to the stdlib, or any other pertinent point.