Tuesday 13 December 2011

Improved flexibility for log file rotation

The RotatingFileHandler and TimedRotatingFileHandler classes provide basic log file rotation functionality, but there are times when more functionality is required than is provided in these classes. One requirement is to compress log files when they are rotated. Currently, if you want this kind of functionality, you have to subclass the relevant handler and override the doRollover method. This works, but requires you to copy and paste the entire method, and then adapt it to your needs; clearly, this is not ideal. An example of this can be found in this ActiveState recipe for a TimedCompressedRotatingFileHandler. This uses the .zip format to compress log files.

In order to provide improved flexibility for log file processing (which will generally be compression, but in practice any transformation could be performed), I propose to add two new methods to these handlers (in their common base class, BaseRotatingHandler).

The first method is named rotate. It takes two arguments, source and dest. The source argument is the name of the log file about to be rotated (e.g. test.log), and the dest argument names the target of the rotation (e.g test.log.1). Under normal circumstances, the file named by dest should not exist, having been explicitly removed by the rollover logic. the default behaviour is what happens currently – a call to os.rename(source, dest).

Since we are potentially transforming log files, we might also want to change their extension, so that they are more usable with e.g. file system viewers like Nautilus or Windows Explorer. To facilitate this, a second method is provided, called filename. This takes a single argument – the filename of a destination file in a rotation (for example, test.log.1)  – and returns a modified version (for example, adding an extension – test.log.1.gz). The default behaviour is to return the name unchanged.

The definition of the new methods is as follows.

    def filename(self, name):

        """
        Modify the filename of a log file when rotating.

        This is provided so that a custom filename can be provided,
        say by adding an extension.

        The default implementation returns the name unchanged.

        :param name: The default name for the destination log file.
        """

    def rotate(self, source, dest):

        """
        When rotating, rotate the current log in the file named by
        source to the file named by dest.

        On entry, the file named by source should exist and the file
        named by dest should not exist.

        On exit, the file named by source should not exist and the

        file named by dest should exist.

        The default implementation simply renames source to dest.

        :param source: The source filename. This is normally the base
                       filename, e.g. 'test.log'
        :param dest:   The destination filename. This is normally
                       what the source is rotated to, e.g. 'test.log.1'.
        """

This will allow a custom rotation strategy to be implemented relatively easily: for example, if you want to gzip rotated log files, you can ensure that filename returns test.log.1.gz when passed test.log.1, and that rotate gzips test.log to test.log.1.gz.

The ActiveState example I cited earlier used the .zip format, but what if you want to use .gz, .bz2, .7z or some other compression algorithm? You would need to support all of them in a single handler, or write multiple handler classes for each format.

To avoid the need for users to subclass the rotating handlers, I further propose to allow configurability of the rotate and filename methods by providing two new handler attributes (both, by default, set to None in BaseRotatingHandler).

The first handler attribute is named rotator. You can set it to a callable that takes two arguments, source and dest. The meanings of these arguments are the same as for the rotate method. If a callable is provided in the rotator attribute, it is called by the rotate method to perform the rotation.

The second handler attribute is called namer. You can set this to a callable which takes a single argument – the filename of a destination file in a rotation (for example, test.log.1)  – and modifies it to, for example, add an extension. If a callable is set, then it is called by the filename method to return the new name.

An example of how you can define a namer and rotator is given in the following snippet, which shows zlib-based compression of the log file.

def namer(name):
    return name + ".gz"

def rotator(source, dest):
    with open(source, "rb") as sf:
        data = sf.read()
        compressed = zlib.compress(data, 9)
        with open(dest, "wb") as df:
            df.write(compressed)
    os.remove(source)

rh = logging.handlers.RotatingFileHandler(...)
rh.rotator = rotator
rh.namer = namer

I’m aware that these are not “true” .gz files, as they are bare compressed data, with no “container” such as you’d find in an actual gzip file. This snippet is just for illustration purposes.

Note: The namer function is called quite a few times during rollover, so it should be as simple and as fast as possible. It should also return the same output every time for a given input, otherwise the rollover behaviour may not work as expected. If either the namer or rotator function raises an exception, this will be handled in the same way as any other exception during an emit() call, i.e. via the handleError method of the handler.

The internal logic for determining which files to delete (when a backupCount is specified on a rotating file handler) uses, in the case of  TimedRotatingFileHandler, regular expressions for the date/time-based suffix. The regular expressions used have been updated to allow an optional file extension consisting of a period followed by one or more letters and digits.

These changes will soon appear in Python 3.3’s development repository. Your comments are welcome.