Thursday 26 July 2012

A potential security vulnerability in logging.fileConfig()

Although now superseded by the dictConfig() API proposed in PEP 391, the much older fileConfig() API has been widely used to configure logging. It is also possible to do on-the-fly reconfiguration of logging through logging’s listen() API, which listens on a socket for configuration data in either the format understood by fileConfig(), or JSON format for use with dictConfig(). Although listen() only ever binds to localhost and so cannot be reached from a remote machine, it is possible in certain scenarios, if you use the listen() function, that code which you don’t control will be executed in the process which calls listen(). To see how, consider a shared hosting environment where multiple users, who have no connection with each other other than being hosted on the same server, run processes. Ordinarily these processes are isolated from each other, but if one user’s process uses listen(), another user’s process could connect to the listened-on socket and send arbitrary configuration data. Because fileConfig() calls eval() under the hood to convert from text elements in the configuration text to Python objects for use in the configuration operation, potentially arbitrary code could be sent in a logging configuration from a malicious user to a victim, perhaps resulting in undesirable consequences for the victim when that code is run by eval(). (Thanks are due to Graham Dumpleton for reporting this vulnerability.)

Why does fileConfig() use eval(), and how can the vulnerability be tackled? Although recent versions of Python provide the ast module to assist with processing Python source code, this module was not available when fileConfig() was written. The type of configuration data which needs to be evaluated can be more complex than a simple parser can cope with: for example, a configuration may refer to a handler’s class, whether it be a stdlib-included one such as handlers.WatchedFileHandler, or a user-defined or third-party handler such as mailinglogger.MailingLogger. The configuration can also specify the constructor arguments for a handler, which can also be fairly involved to parse in ad hoc fashion. For this reason, eval() was used.

Users have already been alerted to the vulnerability via an update to the documentation, but that is perhaps not enough. One way of addressing the vulnerability is to not use eval(), but some other evaluation mechanism which is more sandboxed. One suggestion which has been made is to use ast.literal_eval(), but this only copes with Python literals – it does no name lookup, and so can’t cope with expressions like handlers.WatchedFileHandler.

As an experiment, I put together a simple expression evaluator which uses the ast module to parse input text and probably provides the bare minimum evaluation of AST nodes to support common use cases:

This shows a simple evaluator together with a simple test harness which evaluates user-entered expressions in the context of the logging package:

However, limiting the scope of evaluation is not the complete answer, and perhaps not the correct answer. For example, a malicious user could still send a bogus configuration which, for example, just turns the verbosity of all loggers off, or configures a huge number of bogus loggers. This would certainly be allowed even by a limited-evaluation scheme if a user legitimately wanted to do so; but if a malicious user sends the exact same “legal” configuration, it is still a security exploit because the consequences may be undesirable to the victim.

But how is the listener to know whether or not the configuration is coming from a legitimate source (a client process controlled by the same user who is running the process which uses listen())  or a malicious user (a client process controlled by some other user)? The simplest answer would appear to be a shared secret: When listen() is called, it is passed a text passphrase, which is also known to legitimate clients. When handling a configuration request via the socket, the configuration is checked to see if it contains the passphrase. If it does, the request is processed; otherwise, it is ignored.

In the fileConfig() input data, the passphrase could be provided via a passphrase=secret entry in the [default] section. In the dictConfig() input data, the passphrase could be provided against the passphrase key in the dict which is passed to dictConfig(). The checking would be done in the request handler code before calling fileConfig() or dictConfig(). If the passphrase argument to the listen() call is None (the default, preserving the current behaviour) no passphrase checking would be done.

As always, comments are welcome. Although this vulnerability is limited to specific scenarios, it would be good to address it as soon as possible. As Python 3.3 has entered beta, and is therefore in feature freeze, it is likely that a fix will go into Python 3.4 – the vulnerability is not currently considered serious enough to warrant fixing in earlier versions.