Last modified 4 years ago Last modified on 02.01.2013 18:01:24

Structured Logging


In the old, good times, when programs were bug-less, nobody needed event log or informations for debugging. These days are gone forever (probably because programmers decayed). The age of logs began.

Different log formats happen, but mostly they burn down to a single line of text. This form is the simplest idea and it doesn't require much to be printed to the screen, to flat log file or to be sent over the network or even to UNIX socket (e.g. to syslog).

Unfortunately, just plain line of text is not the ultimate format, neither for application operator nor the author.

The author needs to prepare the line of text, what means quite an effort. Apart from the message itself, programmer typically adds some context: port number, source address, MySQL error number, submitted query, path to the file, UID and GID, and so on. And the programmer has to pack all this into a single line of text, often remembering about quoting the data (e.g. space character in login). This is tedious, boring and uninteresting -- because repeatable -- work.

Operator is not satisfied either. What the programmer prepared, operator has not only read, but process with machine tools, find and extract some information, correlate the entry with others, possibly coming from different applications. This requires parsing flat string into some in-memory data structure.


There is a solution of both inconveniences: logging structured data instead of flat strings. Programmer, apart from the message, adds some context data in few named fields and doesn't worry about formatting. This is what the logging framework is for. Operator, on the other hand, gets a text that is easy to read and process, because the data format is defined in advance by the framework instead of ad-hoc programmer-defined format.

The are two requirements for logging framework. It needs to provide easy way of logging ad-hoc specified fields (zero preparation before use) and needs to support, beside from the outputs understanding structured data, flat text outputs, like (traditional) syslog and text files, with the same message (i.e. programmer specifies only one message with fields, and the framework can pass it to syslog or write directly in /var/log).


The first time I've encountered the idea was Fluentd, I was writing plugins for it.

# in_exec_yaml.rb
for m in messages
  if m.is_a?(Hash)
    # enrich message
    # ...

  $log.debug "emitting message", :type => m.class.to_s, :keys => m.keys
  Fluent::Engine.emit(tag, time, m)

The outcome looks following:

# /var/log/fluentd.log
2012-12-29 18:28:39 +0100: emitting message type="Hash" keys=["host", "source", ...]
# /var/log/fluentd/internal.log.20121229_0
2012-12-29T18:28:15+01:00    fluent.debug    {"message":"emitting message","type":"Hash","keys":["host","source",...]}

The first file contains Fluentd's STDOUT, the second is an output of out_file plugin.

As an author, I didn't need to worry how to format the message, especially the fields. I've just specified a message and added few fields that were necessary at the moment. There is little I could do manually to make a flat string more readable.

Of course, /var/log/fluentd.log is not as easy to parse by machine, but Fluentd provides the same information as structured message (in above configuration, it lands in /var/log/fluentd/internal.log.*), and this is the data meant to be processed by programs.


I wanted to have exactly the same way of logging things in Perl, but also to keep compatibility with great Log::Log4perl framework, which separates log generation (in the source code) from collection and sending to remote destinations (in the configuration).

The outcome is Log::Message::JSON module, which can be used in following manner:

use Log::Message::JSON qw{msg};

my $m = msg message => "Something went wrong",
            time => strftime("%F %T", localtime),
            host => hostname(),
            is_error => 1;

if ($m->{is_error}) {
  printf "error encountered: %s\n", $m;

Note the treating $m as a hash in if condition and as a string in printf() call.

The screen output will look similar to this:

error encountered: {"message":"Something went wrong","time":"2012-12-29 19:07:44","host":"rudolf","is_error":"1"}

With Log::Log4perl, the usage goes this way:

package My::Module;

use Log::Log4perl;
use Log::Message::JSON qw{msg};

sub handle {
  # ...

  my $logger = Log::Log4perl->get_logger();

  $logger->info('info, flat string');
  $logger->warn('warn, flat string');
  $logger->info(msg level => 'info', type => ['structured', 'hash']);
  $logger->warn(msg "some warning message",
                    level => 'warn', type => ['structured', 'hash']);

If the configured appender is capable of processing structured messages (like Fluent appender), it will get one. But if the appender only works with strings, it's fine -- the message still fits and is readable.

Example output of Screen appender:

19:17:01 [main/INFO] info, flat string
19:17:01 [main/WARN] warn, flat string
19:17:01 [main/INFO] {"level":"info","type":["structured","hash"]}
19:17:01 [main/WARN] {"message":"some warning message","level":"warn","type":["structured","hash"]}


Not everybody is convinced to structured logs. Miloslav Trmač has some valid objections to the structured logging as a movement. He doesn't believe that everybody will produce logs correctly, consistently and and with uniform field naming. Hence, all the log processing tools always will need to be able to process free-form strings.

I think he's right, but the idea behind Log::Message::JSON module doesn't clash with this belief. JSON is merely a convention that is compatible with current logging infrastructure (syslog, flat files and so on). At the same time, creating log messages is a bit easier for programmer. Adding to this the possibility of keeping field structure if the framework and configuration support, it's a triple win situation.


My tools

Others' tools