Categories
logging microservices

Cleaning up messy distributed logs, part 3

If you have been reading along it is my final part of this short story on cleaning up messy logs. I might write one on Kibana itself some day, as it is a difficult tool to use for a beginners. Previous parts of this story can be found here: part 1 and part 2.

This last couple of days at work I could spent more time on fixing logs as new project estimates has finished. Last thing I sorted was syslog parsing, with that done I could think on figuring out how to log tracebacks raised in our services.

Tracebacks are spread over many lines and when they arrive to logstash they are parsed as separate entries and then sent to elasticsearch like wise. So Kibana displays them as separate as well making it really difficult to read. In addition of being separate they are in reverse order.

My idea is to serialize each log message into JSON and then parse it in logstash. As I have probably mentioned I am working on an app using microservices pattern and there is a few services to be done. First candidate is the gateway service, as it’s easiest to test and relatively simple with no additional components beside the service. It is flask app running on gunicorn so to configure logging there is no need to change the code but rather your entrypoint by adding --log-config.

#!/bin/bash
exec gunicorn -b 0.0.0.0:80 --log-config /code/log-config.ini --access-logfile - --reload "gate.app:create_app()"

Config file should be written in ini style.

[loggers]
keys=root

[handlers]
keys=console

[formatters]
keys=default

[logger_root]
level=DEBUG
qualname=root
handlers=console

[handler_console]
class=StreamHandler
formatter=default
args=(sys.stdout, )

[formatter_default]
class=logstash_formatter.LogstashFormatter

This alone handles almost whole application configuration, logs are nicely serialized into JSON with whole traceback.

Categories
microservices

Cleaning up messy distributed logs, part 2

This is a story on sorting logs. First part can be found here, it’s where you can get all the nitty gritty. If you are not feeling like it here is a short elevator pitch. Our kibana logs are a mess, each of the services seems to have different log format and trace backs are split across many entries. My goal is to fix it.

Second part is much much shorter as it’s one of the steps I took on path to sorting this pet peeve of mine. Unfortunately, I can not spend my whole time at work to fix it as it is not a crucial part of our application. At least it is not at this moment.

My first idea was to tackle "easy" issue of grok parsing failure. Our configuration is rather bloated IMO so I trimmed it (I had mentioned that it is not my concern but it easier to reason about simple code). From this:

      input {
        gelf { }
        heartbeat { }
        syslog {
          port => 5145
        }
      }
      filter {
        ruby {
          code => "
            event.to_hash.keys.each { |k| event[ k.gsub('"'.'"','"'_'"') ] = 
            event.remove(k) if k.include?'"'.'"' }
          "
        }
        grok {
          match => {
            "message" => [
              # Logspout
              "%{SYSLOG5424PRI}%{NONNEGINT:ver} +(?:%{TIMESTAMP_ISO8601:ts}|-) 
              +(?:%{HOSTNAME:containerid}|-) 
              +(?:%{WORD:process}\.%{INT:processnumber}\.%{WORD:processsha}|%{NOTSPACE:process}) 
              +(?:%{NOTSPACE:proc}|-) +(?:%{WORD:msgid}|-) 
              +(?:%{SYSLOG5424SD:sd}|-|) +%{GREEDYDATA:msg}"
            ]
          }
        }
      }
      output {
        elasticsearch {
          hosts => ["elasticsearch:9200"]
        }
        stdout {
          codec => rubydebug
        }
      }

I went to this:

     input {
      syslog { }
      }
      filter {
        grok {
          match => {
            "message" => [
              # Logspout
              "%{SYSLOG5424PRI}%{NONNEGINT:ver} +(?:%{TIMESTAMP_ISO8601:ts}|-) 
              +(?:%{HOSTNAME:containerid}|-) 
              +(?:%{WORD:process}\.%{INT:processnumber}\.%{WORD:processsha}|%{NOTSPACE:process}) 
              +(?:%{NOTSPACE:proc}|-) +(?:%{WORD:msgid}|-) 
              +(?:%{SYSLOG5424SD:sd}|-|) +%{GREEDYDATA:msg}"
            ]
          }
        }
      }
      output {
        elasticsearch {
          hosts => ["elasticsearch:9200"]
        }
      }

We had few plugins that were doing nothing so went under the cleaver. Since I can run the swarm locally I’d verify that all is logged and displayed correctly in kibana.

So then I did what everyone does, google it hard. Actually duckduckgo it but it just does not sound right. After lot of reading finally found something relevant to my case. So the perpetrator is syslog plugin, I mentioned before that logs format might not be consistent and possibly (for sure TBH) not up to syslog’s spec. As author recommended I had replaced syslog { } with tcp and udp plugins.

     input {
          tcp {
              port => 5145
              type => syslog
          }
          udp {
              port => 5145
              type => syslog
          }
      }

Logs are now without _grokparsefailure tags. Still bit messy but at least correctly and efficiently parsed. See you in the next part.

Categories
microservices

Cleaning up messy distributed logs, part 1

Logging is an important part of software lifecycle. It is used in every stage, from the beginning, through testing, deployment, staging, production. I find neat logs soothing, they increase my confidence in running software providing visibility when required.

Having said that I have taken a task to work on logs in a distributed system I’m working on. Since we push everything through ELK stack this is what I have inherited.

Unfiltered messy kibana logs

It looks like a mess, but this is because kibana does not have any sensible default configuration in our setup. After looking more closely there is another issue. I have marked it with red.

Unfiltered messy kibana logs with tags marked with red

So time to look at how logstash is configured as there is a problem with how grok filter is set up. Before doing that it is worth explaining how can you configure logstash as it will help in understanding my configuration, and maybe explain briefly what logstash is.

Easiest way to describe purpose of this tool is by citing introduction taken from the documentation.

Logstash is an open source data collection engine with real-time pipelining capabilities. Logstash can dynamically unify data from disparate sources and normalize the data into destinations of your choice.

How do you configure it? Simply, by taking the plugins you wish and putting them into your configuration file. Plugins are:

  • input plugins, they are sources of events fed to logstash. list
  • filter plugins, used for processing of event data. list
  • output plugins, send data to particular destination. list
  • codec plugins, can be part of input or output plugins and can change the data. list

In my case configuration is a bit complex and what’s most important grok is not configured correctly (So it seems but there might be something else as well in play). Some of the plugins may be redundant as well but this is not my concern now.

      input {
        gelf { }
        heartbeat { }
        syslog {
          port => 5145
        }
      }
      filter {
        ruby {
          code => "
            event.to_hash.keys.each { |k| event[ k.gsub('"'.'"','"'_'"') ] = 
            event.remove(k) if k.include?'"'.'"' }
          "
        }
        grok {
          match => {
            "message" => [
              # Logspout
              "%{SYSLOG5424PRI}%{NONNEGINT:ver} +(?:%{TIMESTAMP_ISO8601:ts}|-) 
              +(?:%{HOSTNAME:containerid}|-) 
              +(?:%{WORD:process}\.%{INT:processnumber}\.%{WORD:processsha}|%{NOTSPACE:process}) 
              +(?:%{NOTSPACE:proc}|-) +(?:%{WORD:msgid}|-) 
              +(?:%{SYSLOG5424SD:sd}|-|) +%{GREEDYDATA:msg}"
            ]
          }
        }
      }
      output {
        elasticsearch {
          hosts => ["elasticsearch:9200"]
        }
        stdout {
          codec => rubydebug
        }
      }

This is the state now, further on I’ll describe as I go, my quest to make logs great again.