Plumber Jack: December 2010

Wednesday 29 December 2010

Colourising logging output in terminals

A nice-to-have feature when logging to terminals is to have the output colourised, typically according to the severity of individual messages. This is fairly straightforward to achieve, but the following points have to be borne in mind:

On Unix, Linux and OS X, terminals support ANSI escape codes for colourisation, and so a single solution should be able to support all of these platforms.
Windows does not support ANSI escape codes (though MS-DOS used to, using the ANSI.SYS driver), but does provide an API which can be used to achieve a similar effect.
When the terminal output is redirected to a file, colourisation should be disabled.

In this post, I’ll present a fairly simple handler which considers each of the above points in its design and implementation. I’ll be adding this handler to the logutils package soon.

We’ll assume that an entire formatted message will be colourised. If you need to colourise only certain parts of the message, you can adapt the solution presented here fairly easily – perhaps by adding a specialised formatter which is aware of ANSI escape codes.

We’ll also assume that we only want to be able to set background colour, foreground colour and foreground intensity, and that we’ll only want to set eight colours – red, green, blue, cyan, magenta, yellow, black and white. That’s more or less the lowest common denominator afforded us by ANSI escape sequences. If you have more specialised needs which are supported by your environment, you should be able to adapt the solution presented here, most likely just by subclassing.

It seems appropriate to base the handler on logging.StreamHandler. In addition to importing logging, we’ll need to import os (so that we can take appropriate action on Windows) and ctypes (so we can access the Windows console API needed to colourise terminal output).

This is how we start the handler implementation:

import ctypes
import logging
import os

class ColorizingStreamHandler(logging.StreamHandler):

    @property
    def is_tty(self):
        isatty = getattr(self.stream, 'isatty', None)
        return isatty and isatty()

    def emit(self, record):
        try:
            message = self.format(record)
            stream = self.stream
            if not self.is_tty:
                stream.write(message)
            else:
                self.output_colorized(message)
            stream.write(getattr(self, 'terminator', '\n'))
            self.flush()
        except (KeyboardInterrupt, SystemExit):
            raise
        except:
            self.handleError(record)

Noting that we need to determine when terminal output is piped to a file, we provide a property called is_tty on the handler which says whether the handler’s stream is a terminal. If it is, we’ll use a method called output_colorized (whose implementation will be platform dependent, and discussed shortly) to do the colourised output; otherwise, we just write the message to the stream.

In order to add colour to the message, we’ll override the format method of the handler:

    def format(self, record):
        message = logging.StreamHandler.format(self, record)
        if self.is_tty:
            # Don't colorize any traceback
            parts = message.split('\n', 1)
            parts[0] = self.colorize(parts[0], record)
            message = '\n'.join(parts)
        return message

We first call the base class format method to do the actual formatting according to whatever formatter is set for the handler. If we are writing to a terminal, we extract the first line of the formatted text (which could contain an exception traceback), colourise that first line and add the traceback without any colouring.

I did consider putting the colourising logic into a specialised Formatter subclass, but decided against it, because:

Colourising is only done when writing to a terminal; a formatter would be unaware of specific output destinations.
Placing the colourising logic in a handler would allow a formatter to be shared with other handlers.
It’s simpler to keep everything in one place.

To do the grunt work, we’ll define the colorize helper method together with some data which drives it. We’ll make it the method’s job to add ANSI escape sequence to colour the message, even though Windows does not support ANSI escape sequences natively - we’ll just have to deal with it in the implementation of output_colorized, which we’ll come to a little later.

Let’s discuss ANSI escape sequences briefly now – for more information you can consult this article. Basically, colourising is done by placing a special character sequence in the character stream sent to the terminal. The sequence always starts with an ESC (ASCII 27) followed by a left bracket character ([). The sequence always ends with an ‘m’. In between is one or more integer parameters (as printed characters in decimal format, separated by semicolons). These parameters have the following meanings:

0 – reset background and foreground colours and intensity to the defaults.
1 – set the foreground intensity on.
between 30 and 37 – set the foreground colour to one of the values in this table, after subtracting 30 from the parameter.
between 40 and 47 - set the background colour to one of the values in this table, after subtracting 40 from the parameter.

There are other parameters with other meanings in the ANSI specification, but we won’t concern ourselves with those as they aren’t relevant to our simple colourisation scheme.

So, we use a simple color_map dictionary that allows mapping colour names to integer colour values. We also use a level_map dictionary which maps logging levels to a tuple encoding the background colour, foreground colour (colour names) and intensity (a Boolean). If no background colour is specified (None) , the default is used. Because Windows terminals usually have a dark background and Linux terminals a light background, the level_map is defined slightly differently on Windows. We also define the escape sequence beginning (called the CSI for Control Sequence Introducer/Initiator), and the reset sequence.

    # color names to indices
    color_map = {
        'black': 0,
        'red': 1,
        'green': 2,
        'yellow': 3,
        'blue': 4,
        'magenta': 5,
        'cyan': 6,
        'white': 7,
    }

    #levels to (background, foreground, bold/intense)
    if os.name == 'nt':
        level_map = {
            logging.DEBUG: (None, 'blue', True),
            logging.INFO: (None, 'white', False),
            logging.WARNING: (None, 'yellow', True),
            logging.ERROR: (None, 'red', True),
            logging.CRITICAL: ('red', 'white', True),
        }
    else:
        level_map = {
            logging.DEBUG: (None, 'blue', False),
            logging.INFO: (None, 'black', False),
            logging.WARNING: (None, 'yellow', False),
            logging.ERROR: (None, 'red', False),
            logging.CRITICAL: ('red', 'white', True),
        }
    csi = '\x1b['
    reset = '\x1b[0m'

    def colorize(self, message, record):
        if record.levelno in self.level_map:
            bg, fg, bold = self.level_map[record.levelno]
            params = []
            if bg in self.color_map:
                params.append(str(self.color_map[bg] + 40))
            if fg in self.color_map:
                params.append(str(self.color_map[fg] + 30))
            if bold:
                params.append('1')
            if params:
                message = ''.join((self.csi, ';'.join(params),
                                   'm', message, self.reset))
        return message

Once the mappings and constants have been defined, the colorize method is very simple. If the record’s level is in the level map, we get the tuple for background and foreground colours and intensity, get the appropriate commands and construct the escape sequence. We return the passed in message, bracketed by the initial escape sequence which encodes the appropriate colours and intensity for the record’s level, and the final escape sequence which resets the terminal to default colours and intensity.

The implementation thus far adds ANSI escape sequences to messages to colourise them. These are natively supported on Unix, Linux and OS X, but not so on Windows. We therefore define the output_colorized message, which is called from emit, differently on NT and non-NT platforms. The non-NT implementation is very simple: assuming ANSI support is native, we just write the colourised message to the output stream. For Windows, we need to interpret the ANSI escape sequences and implement them with Win32 API calls. Here are the implementations:

    if os.name != 'nt':
        def output_colorized(self, message):
            self.stream.write(message)
    else:
        import re
        ansi_esc = re.compile(r'\x1b\[((\d+)(;(\d+))*)m')

        nt_color_map = {
            0: 0x00,    # black
            1: 0x04,    # red
            2: 0x02,    # green
            3: 0x06,    # yellow
            4: 0x01,    # blue
            5: 0x05,    # magenta
            6: 0x03,    # cyan
            7: 0x07,    # white
        }

        def output_colorized(self, message):
            parts = self.ansi_esc.split(message)
            write = self.stream.write
            h = None
            fd = getattr(self.stream, 'fileno', None)
            if fd is not None:
                fd = fd()
                if fd in (1, 2): # stdout or stderr
                    h = ctypes.windll.kernel32.GetStdHandle(-10 - fd)
            while parts:
                text = parts.pop(0)
                if text:
                    write(text)
                if len(parts) > 4:
                    params = parts[0]
                    parts = parts[4:]
                    if h is not None:
                        params = [int(p) for p in params.split(';')]
                        color = 0
                        for p in params:
                            if 40 <= p <= 47:
                                color |= self.nt_color_map[p - 40] << 4
                            elif 30 <= p <= 37:
                                color |= self.nt_color_map[p - 30]
                            elif p == 1:
                                color |= 0x08 # foreground intensity on
                            elif p == 0: # reset to default color
                                color = 0x07
                            else:
                                pass # error condition ignored
                        ctypes.windll.kernel32.SetConsoleTextAttribute(h, color)

Because the integer values for ANSI colour codes differ slightly from those for Windows, we use an nt_color_map dictionary to map between the two. We use regular expressions to split the colourised message into escape sequences and literal text to be output. (Note that because there are four groups in the regular expression, the list will contain four entries for each ANSI sequence found.)

In order to colourise the text in the terminal, we need to obtain a Windows console handle for the output stream, using the GetStdHandle Win32 API. If this handle is not obtainable, literal text is output but the ANSI escape sequences are ignored. If the handle is obtained, then the ANSI sequences are converted to the corresponding Win32 attribute values and the SetConsoleTextAttribute API is used to write them to the console.

The whole thing is available a ready-to-run form in this gist, which should run on all Pythons >= 2.5 (and Python 2.4, too, if you have ctypes installed). Here are some screenshots of running the scripts on Ubuntu, Windows 7, Windows 2008, and Mac OS X, showing the expected results for both terminal output and output piped to a file.

Ubuntu Jaunty:

Windows 7:

Windows 2008 (Python 2.4):

Mac OS X:

If you want different colour settings, you can probably achieve this just by changing the level_map contents. If you want to insert colourisation for just part of your messages, you could change the logic in the colorize and/or format methods. Of course, you can also override output_colorized and colorize if you need to support additional ANSI escape sequences.

Your comments are welcome, particularly any suggestions for improvements. Thanks for reading.

Sunday 19 December 2010

Logging documentation for Python 3.2 reorganised

After comments that the length of logging documentation was making it hard to navigate, and after the addition of a new basic tutorial, the documentation has been reorganised. The library reference is supposed to be just that – a reference – so the tutorials have been moved to the HOWTO section. The configuration and handler documentation has also been moved into separate pages. The organisation of the Python 3.2 logging documentation is now as follows:

http://docs.python.org/dev/library/logging – reference documentation for the logging module
http://docs.python.org/dev/library/logging.config - reference documentation for the logging.config module
http://docs.python.org/dev/library/logging.handlers - reference documentation for the logging.handlers module
http://docs.python.org/dev/howto/logging – basic and advanced logging tutorials
http://docs.python.org/dev/howto/logging-cookbook – how to use logging in different scenarios

There’s a prominent sidebar at the start of the reference documentation pages pointing to the HOWTO sections.

I hope the results are easier to navigate and to understand. Please feel free to give your feedback.

Thursday 9 December 2010

Proposed changes to logging defaults

Following recent discussion on python-dev, some changes to logging’s default behaviour are likely to be implemented soon, perhaps as soon as in 3.2b2 (3.2b1 was released just a few days ago). By default behaviour is meant behaviour in the absence of any explicit logging configuration.

The current behaviour, which dates from logging’s first appearance in Python, is as follows. When a logging event needs to be output, the logging package looks for handlers which have been configured by the application. If no such handlers are found, the package prints a message “No handlers could be found for logger XXX”, naming the logger that was used to log the event. (The message is only ever printed once, even if subsequent logging calls are made. Furthermore, the message is only printed when logging.raiseExceptions is True, which is the “development” mode of operation for logging. When logging.raiseExceptions is False, the “production” mode of operation for logging, the message isn’t printed, even once.)

This creates some problems when an application (or simple script, or interactive session – I’ll use script to mean any of these) meets the following criteria:

It uses a library which does logging under the covers.
It doesn’t configure any handlers.
The library code invoked logs events of severity WARNING or above.

Because the default verbosity for logging is WARNING, the logging package looks for handlers, and, not finding any, prints the one-off “No handlers could be found …” message. This may be a surprise to the author of the script and also for its users; it can generate support calls to the author from the users. The message was provided as a means of warning developers, who would have been new to logging, about a possible misconfiguration. (Of course, when logging was introduced, everyone was new to it.) The importance of the message is less now than it was before, because many people are familiar with how logging works and provide good support on comp.lang.python to people who are newcomers to logging, or who run into issues that are simple to resolve.

To deal with the problem described above, the approach suggested has been for library developers to add a NullHandler instance to their top-level logger. This prevents messages from library code from causing the misconfiguration message to be printed, because a handler — the NullHandler — is always found when handling events from the library. If a script developer adds logging events in their own code which need to be handled, then the misconfiguration message will still appear, and that’s as it should be.

When the logging package was introduced, this was done with a philosophy that logging is an adjunct to normal application activity — meaning that a script should work in exactly the same way with or without the presence of logging. So logging tries to work as unobtrusively as possible, and is set up so that in production use, exceptions that occur in logging (e.g. mismatched format string and parameters) will not be raised, allowing a script to carry on uninterrupted. Also pertinent is the Unix philosophy that verbosity should be kept to a minimum.

The above describes the current state of play, but things are now set to change. The rationale for changing is:

The present behaviour goes against the Zen of Python, which says “Errors should never pass silently. Unless explicitly silenced.”
Python core developers want to be able to use the logging package not just as an adjunct to doing real work in the standard library, but to actually do the real work when it comes to the output of warning and error messages.
In some cases exceptions can’t be usefully raised, and writing out a message is the only way to provide information about an error condition which has been detected. Providing it via logging provides more flexibility in dealing with it than would a bare write to sys.stderr.

These changes will, in due course, bring benefits such as being able to redirect standard library warning and error messages, which are currently written to sys.stderr, to some destination of a developer’s choosing. However, it may cause some inconvenience to existing users, because the proposed behaviour change is not backwards-compatible.

How will the change be implemented? This is still being thought about, but the current front runner is to implement through the idea of a “handler of last resort”. When no handler is found when processing an event which needs to be output, instead of printing the misconfiguration message, the system will use the “last resort” handler to handle the event. The “last resort” handler will be (by default) a StreamHandler using sys.stderr as its stream, with its level set to WARNING. Details about how to preserve existing behaviour are still being worked on; one thought is that you can get existing behaviour by setting the last resort handler to None.

What’s the impact on users of logging?

If you’re using a version of Python earlier than 3.2, there’s no impact. Things will stay as they are.
Script developers: If you configure logging (i.e. add handlers), there should be no impact – the behaviour you see shouldn’t change, as long as there is no situation where a handler isn’t found. If you don’t configure logging, then in place of a misconfiguration message, you’ll see events of severity WARNING and above printed to sys.stderr without any specific formatting (i.e. as if the format string were "%(message)s").
Library developers: If the script using your library configured logging, you should see no change in behaviour. Otherwise: if you added a NullHandler to your library’s top-level logger, you should see no change in behaviour, but remember — your warning and error messages will not be shown to end-users. If you reverse the NullHandler addition or didn’t add it in the first place, then instead of a misconfiguration message, end-users will see your warning and error messages on sys.stderr.
End users: If the script being run configured logging, you should see no change in behaviour. If it didn’t, then in place of a misconfiguration message, you’ll see events of severity WARNING and above printed to sys.stderr without any specific formatting.

I hope I’ve covered everything, but should any kind reader spot some error or omission, please let me know.

Please feel free to comment giving your views on the proposed change, whether you approve of it or whether you think it affects you adversely. Thanks for reading.

Friday 3 December 2010

Getting more control over LogRecord creation

Every logging event is represented by a LogRecord instance. When an event is logged and not filtered out by a logger’s level, a LogRecord is created, populated with information about the event and then passed to the handlers for that logger (and its ancestors, up to and including the logger where further propagation up the hierarchy is disabled). Before Python 3.2, there were only two places where this creation was done:

Logger.makeRecord, which is called in the normal process of logging an event. This invoked LogRecord directly to create an instance.
logging.makeLogRecord, which is called with a dictionary containing attributes to be added to the LogRecord. This is typically invoked when a suitable dictionary has been received over the network (e.g. in pickle form via a SocketHandler, or in JSON form via an HTTPHandler).

This has usually meant that if you need to do anything special with a LogRecord, you’ve had to do one of the following.

Create your own Logger subclass, which overrides makeRecord(), and set it using logging.setLoggerClass before any loggers that you care about are instantiated.
Add a Filter to a logger or handler, which does the necessary special manipulation you need when its filter() method is called.

The first approach would be a little unwieldy in the scenario where (say) several different libraries wanted to do different things. Each would attempt to set its own Logger subclass, and the one which did this last would win. The second approach works reasonably well for many cases, but does not allow you to e.g. use a specialized subclass of LogRecord. Library developers can set a suitable filter on their loggers, but they would have to remember to do this every time they introduced a new logger (which they would do simply by adding new packages or modules and doing logger = logging.getLogger(__name__) at module level). It’s probably one too many things to think about. They could also add the filter to a NullHandler attached to their top-level logger, but this would not be invoked if an application developer attached a handler to a lower-level library logger – so output from that handler would not reflect the intentions of the library developer.

In Python 3.2, LogRecord creation will be done through a factory, which you can specify. The factory is just a callable you can set with logging.setLogRecordFactory(factory), and interrogate with logging.getLogRecordFactory(). The factory is invoked with the same signature as the LogRecord constructor, as LogRecord is the default setting for the factory:

factory(name, lvl, pathname, lineno, msg, args, exc_info, func=None, sinfo=None, **kwargs)

The parameters are as follows:

name – the name of the logger used.
lvl – the numeric value of the event’s level.
pathname, lineno – the location in the source code where the logging call appears.
msg, args – the logging message and arguments.
exc_info – any exception info, or None.
func – the name of the function or method from which the logging call was made.
sinfo – a traceback such as is produced by traceback.print_stack. (This is not the same as the exception traceback.)
kwargs – currently empty, but this could change in future versions of logging.

This approach allows a custom factory to control all aspects of LogRecord creation. For example, you could return a subclass, or just add some additional attributes to the record once created, using a pattern similar to this:

old_factory = logging.getLogRecordFactory()

def record_factory(*args, **kwargs):
    record = old_factory(*args, **kwargs)
    record.custom_attribute = 0xdecafbad 
    return record 
 
logging.setLogRecordFactory(record_factory)

This pattern allows different libraries to chain factories together, and as long as they don’t overwrite each other’s attributes or unintentionally overwrite the attributes provided as standard, there should be no surprises. However, it should be borne in mind that each link in the chain adds run-time overhead to all logging operations, and the technique should only be used when the use of a Filter does not provide the desired result.

These changes have been added to Python 3.2 (i.e. in the py3k branch) and should be in the first beta. Feel free to try them out. Your feedback is, as always, most welcome.

Plumber Jack