Categories
logging microservices

Effective logging strategies after running the production for 2 years

I had trouble picking an appropriate title. The guidelines you are about to read are based on a true story, sometimes a horror story. I can’t believe how much I have changed in this past 2 years. It was changed for the better. I grew a lot, I grew with the system I built, run, and supported.

Logging is hard, and it’s most of the time an afterthought. Read this guideline and learn from my mistakes. You may probably apply these recommendations to any language.

Categories
f-strings logging python

f-strings and python logging

There is one thing that has bothered me for a couple of months. It felt wrong when I saw it in the codebase but I could not tell why it is wrong. It was just a hunch that something is not right, but not enough to make me look for a reason.

For last couple of days I have been struggling to sort out my bot configuration on Azure and decided I need a break from that. Python being something I know best is a good candidate to feel comfortable and in control again.

I have decided to finally answer the question that was buggin me. Why using f-strings in logger calls makes me uneasy? Why this feels wrong?

hero = "Mumen Rider"
logger.error(f"Class C, rank 1: {hero}")

f-strings

Most of the pythonistas would know by now what f-strings are. They are convenient way of constructing strings. Values can be included directly in the string what makes the string much more readable. Here is an example from Python 3’s f-Strings: An Improved String Formatting Syntax (Guide), which is worth at least skimming through if you know f-strings.

>>> name = "Eric"
>>> age = 74
>>> f"Hello, {name}. You are {age}."
'Hello, Eric. You are 74'

They have benefits and my team have been using them since. It’s fine as they are awesome however I feel that they should not be used when we talk about logging.

logging

I’m not talking about poor man’s logging which is print. This is an example of logging in Python

logger.info("This is an example of a log message, and a value of %s", 42)

When the code includes such line and when it is executed it outputs string according to log configuration. Of course your log level needs to match but I’m skipping this as it is not relevant here, I’ll get back to this later. The %s identifier in log messages means that anything passed into logger.info will replace the identifier. So the message will look like this.

INFO:MyLogger:This is an example of a log message, and a value of 42

logging + f-strings

Since logging accept strings and f-strings are so nice they could be used together. Yes, it is possible of course but I’d not use f-strings for such purpose. Best to illustrate why is an example followed with explanation.

    import logging
    logging.basicConfig(level=logging.ERROR)
    logger = logging.getLogger('klich.dev')

    class MyClass:
        def __str__(self):
            print('Calling __str__')
            return "Hiya"

    c = MyClass()
    print("F style")
    logger.debug(f'{c}')
    logger.info(f'{c}')
    logger.warning(f'{c}')
    logger.error(f'{c}')

    print()
    print("Regular style")
    logger.debug('%s', c)
    logger.info('%s', c)
    logger.warning('%s', c)
    logger.error('%s', c)

This short example creates logger and sets logging level to ERROR. This means that only calls of logger.error will produce output. __str__ method of object used in log messages prints information when it is called. So each level matching logger call will print Calling __str__ message and Hiya. Since there are two logger.error calls we should get four lines total. This is what actually is printed out.

    % python3 logg.py
    F style
    Calling __str__
    Calling __str__
    Calling __str__
    Calling __str__
    ERROR:klich.dev:Hiya

    Regular style
    Calling __str__
    ERROR:klich.dev:Hiya

We can see that logger lines using f-strings are calling __str__ even if the log message is not printed out. This is not a big penalty but it may compound to something significant if you have many log calls with f-strings.

what is going on

According to documentation on logging

Formatting of message arguments is deferred until it cannot be avoided.

Logger is smart enough to actually not format messages if it is not needed. It will refrain from calling __str__ until it is required, when it is passed to std out or to a file, or other with options supported by logger.

To dig a little bit more we can use dis module from python standard library. After feeding our code to dis.dis method we will get a list of operations that happend under the hood. For detailed explanation of what exact operations do have a look at ceval.c from Python’s sources.

    >>> import logging
    >>> logger = logging.getLogger()
    >>> def f1():
            logger.info("This is an example of a log message, and a value of %s", 42)

    >>> def f2():
            logger.info(f"This is an example of a log message, and a value of {42}")

    >>> import dis
    >>> dis.dis(f1)
                0 LOAD_GLOBAL              0 (logger)
                2 LOAD_METHOD              1 (info)
                4 LOAD_CONST               1 ('This is an example of a log message, and a value of %s')
                6 LOAD_CONST               2 (42)
                8 CALL_METHOD              2
                10 POP_TOP
                12 LOAD_CONST               0 (None)
                14 RETURN_VALUE
    >>> dis.dis(f2)
                0 LOAD_GLOBAL              0 (logger)
                2 LOAD_METHOD              1 (info)
                4 LOAD_CONST               1 ('This is an example of a log message, and a value of ')
                6 LOAD_CONST               2 (42)
                8 FORMAT_VALUE             0
                10 BUILD_STRING             2
                12 CALL_METHOD              1
                14 POP_TOP
                16 LOAD_CONST               0 (None)
                18 RETURN_VALUE

In this case we won’t get into much details, it is enough to see that f-strings add two additional operations of FORMAT_VALUE (Handles f-string value formatting.) and BUILD_STRING.

After this small research I can explain why we should not be using f-strings in this specific place which is logging. I also can put my uneasiness to rest.

Categories
logging microservices

Cleaning up messy distributed logs, part 3

If you have been reading along it is my final part of this short story on cleaning up messy logs. I might write one on Kibana itself some day, as it is a difficult tool to use for a beginners. Previous parts of this story can be found here: part 1 and part 2.

This last couple of days at work I could spent more time on fixing logs as new project estimates has finished. Last thing I sorted was syslog parsing, with that done I could think on figuring out how to log tracebacks raised in our services.

Tracebacks are spread over many lines and when they arrive to logstash they are parsed as separate entries and then sent to elasticsearch like wise. So Kibana displays them as separate as well making it really difficult to read. In addition of being separate they are in reverse order.

My idea is to serialize each log message into JSON and then parse it in logstash. As I have probably mentioned I am working on an app using microservices pattern and there is a few services to be done. First candidate is the gateway service, as it’s easiest to test and relatively simple with no additional components beside the service. It is flask app running on gunicorn so to configure logging there is no need to change the code but rather your entrypoint by adding --log-config.

#!/bin/bash
exec gunicorn -b 0.0.0.0:80 --log-config /code/log-config.ini --access-logfile - --reload "gate.app:create_app()"

Config file should be written in ini style.

[loggers]
keys=root

[handlers]
keys=console

[formatters]
keys=default

[logger_root]
level=DEBUG
qualname=root
handlers=console

[handler_console]
class=StreamHandler
formatter=default
args=(sys.stdout, )

[formatter_default]
class=logstash_formatter.LogstashFormatter

This alone handles almost whole application configuration, logs are nicely serialized into JSON with whole traceback.

Categories
logging python

Logging and microservices

The context

As you might already know, logging is a rather important thing. If you don’t know, you will probably find out, as (like with backups) there are people who log and ones who just have plenty of spare time. Good logging is hard. The line between spamming and logging is thin, the subject of useful logging is worth exploring, possibly in another post.

This post will be about logging information by tracking incoming requests within web applications based on microservices architecture. Having a distributed system means that a request send by the user will most probably hit more than one service. It is very useful to keep track of how the request is routed across all the services. When one service calls another, in order to not treat every call as separate but have a bigger picture of what has been happening with data along the way, the request has to be identified and the identifier must be passed along.

By using the REST approach each of the services becomes the consumer and provider of the API. They are third party API’s for each other.

The goal

Distributed systems require you to have a central place for storing and viewing logs, so that you remain sane. A good choice would be something like elk.

The fact is that apps are multi threaded but a log is not. It is single threaded and time-indexed. It doesn’t matter if the log is kept in one place or partitioned onto several computers; reading it with no indicator how records are related does not make a lot of sense. I’d compare it to trying to read a document that went through a shredder.

Here’s a real life example to illustrate this problem. Imagine a simple document search engine that allows users to search for a document that has been stored on the server. The user types in a name in the search box and the system comes up with a list of candidates. Pretty simple and common example of an app. From the user’s point of view it looks pretty simple, however, it gets complicated when we look into the internals. Under the hood, querying the app might open a request to one or many servers. Servers might be kept behind load balancers that would introduce additional fragmentation of requests and logs. If you take into account different response times that may happen, depending on the server load etc., you will definitely not get a linear and chronological log of events that happened. In addition to that, each service will treat incoming request as a unique request without the history that may have happened up to this point. Debugging would be horrendous. So the goal is to provide a set of tools that would let you track how the request travels across services. Each request should be given an ID that would travel with it and would be stored with each log record being saved.

The solution

The idea is to have a header with a unique ID on each request that would identify them when travelling across the system. The name of the header is ‘Correlation-Id’, you may pick whatever suits your case. The solution is based on a real world example with pretty simple tech stack.

  • ubuntu
  • nginx
  • uwsgi
  • flask or django

I’d like to advise you that the code here is not complete and will probably not work if copied and pasted. Hopefully it is clear enough to guide you to your solution.

Nginx

Requests coming into the system are first handled by the Nginx server. There are two cases that it needs to handle: request with Correlation-Id header and one without the header. If the request has the required Correlation-Id header, nginx passes the request to uwsgi without modification. The second case is when the request does not include the Correlation-Id header and nginx has to inject it. It is possible to provide a bit of logic into nginx config file – the code has to be written in Lua (a very simple language), thus it requires a module to be installed. Having the logic inside the config file makes it easy to deploy and maintain. I’d reconsider this with a more complicated case, but for this one it seems all right. When installing nginx package by ansible, the only thing that had to be modified was the name of the package. Ubuntu has an nginx-extras package that adds Lua with a bunch of other features. After you install nginx-extras you can write Lua in config files.

    # Generate random HTTP_CORRELATION_ID if request does not provide it
    set_by_lua $correlation_id '
        if(ngx.var.http_correlation_id == nil or ngx.var.http_correlation_id == "")
            then
               return string.format("%010d", math.random(0, 10000000000))
            else
               return ngx.var.http_correlation_id
        end
    ';

This snippet evaluates if the request includes the Correlation-Id header and generates new random value if it’s missing. The value being returned can now be attached to the request.

uwsgi_param HTTP_CORRELATION_ID $correlation_id;

How you inject value depends on what is your setup. When running an app using uwsgi and sockets I could not make it work by calling proxy_set_header but it worked with uwsgi_param instead. After switching to serving app on port it worked fine with proxy_set_header.

Next step is to handle the header in app as uwsgi simply passes it through.

The app

At this point all requests have a Correlation-Id header which can be extracted and used for logging purposes. In order to simplify the life of a developer and automate all the work, an intermediary needs to do the heavy lifting. Processing of the header has to be done before the request will be handled over to any views in Django or Flask app. Logging effort also has to be minimal (as Homer Simpson said, "Can’t someone else do it?"). Applications also contact other services so such calls should include the header as well. Thus the way in which the Correlation-Id is transferred should be done effortlesly. Finally, the response should be returned to the originator as it may also log the result on its side. Every element of this puzzle is invisible to the developer, apart from the helpers designed to work when a service contacts other services. It is not possible to enforce sending the required header unless the requests library (which almost everyone loves), becomes a standard in the project. I provide examples using requests as everyone in my team uses it, if you are using different tools you have to make your own helpers.

Flask

As mentioned earlier, here comes the bit that does the heavy lifting. Flask provides two methods allowing to register functions that will run before each request and after each request. I’ll use them to get the value of Correlation-Id from the header and to attach the value to the header on exit.

import threading

from flask import request

tlocal = threading.local()

def log_incoming(*args, **kwargs):
    tlocal.corr_id = request.headers.get('Correlation-Id')
    logger.debug('corrID %s extracted', tlocal.corr_id)

Flask already uses thread-local objects, and if you are only using Flask as your framework, you might get away with using the request object instead of calling threading.local.

def cleanup_after_outgoing(response):
    response.headers['Correlation-Id'] = tlocal.corr_id
    logger.debug('Removing corrID %s from thread local storage',
                 tlocal.corr_id)
    del tlocal.corr_id
    return response

Making sure the header is returned and doing cleanup like a good scout. The middleware has to be registered next so Flask can use it.

app = Flask(__name__)

# register middleware
app.before_request(log_incoming)
app.after_request(cleanup_after_outgoing)

Django

Middleware for Django is quite similar. The place where it differs is registration of the middleware in the framework.

class ThreadStorageMiddleware:
    def process_request(self, request):
        tlocal.corr_id = request.META.get(UWSGI_CORRELATION_HEADER)
        logger.debug('corrID %s extracted', tlocal.corr_id)

    def process_response(self, request, response):
        try:
            response[CORRELATION_HEADER] = tlocal.corr_id
        except AttributeError:
            # corr_id has not been set
            return response
        logger.debug('Removing corrID %s from thread local storage',
                     tlocal.corr_id)
        del tlocal.corr_id
        return response

Using thread-local storage in Django is required as there is no mechanism for storing information in an accessible way in every function during the lifetime of response. Remember to register middleware by adding it to the MIDDLEWARE_CLASSES list.

Utils

The whole effort of adding the middleware is not useful if you are not passing the Correlation-Id value to other services that you might be contacting. The solution proposed in this post is not forcing anyone to use specific libraries, however, this greatly simplifies passing the header. I gotta say, this is not ideal and may change in the future depending on how often people forget about using it.

def add_corr_id(headers=None):
    headers = headers or {}
    headers.update({'Correlation-Id': tlocal.corr_id})
    return headers
    
    
requests.get('https://the.cheddar.com', headers=add_corr_id())

headers = {'Required-Content': 'More beans'}
requests.get('https://monty.py', header=add_corr_id(headers))

Helper functions creating the header if header is not passed or adding it to existing dictionary.

Logging

Logging has not been mentioned yet. All the examples were preparing the framework for feeding data to logging so we could log something useful and make sense out of the log file. As with the examples above, logging was designed to be effortless in use by the developer.


class InjectingFilter(logging.Filter):
    def filter(self, record):
        try:
            record.corrId = tlocal.corr_id
        except AttributeError:
            record.corrId = 'Internal'
        return True

First the filter is defined that uses value saved in thread-local storage and injects it into the LogRecord. There will be logs that originated within the service and thus will not have the Correlation-Id header. They are marked with ‘Internal’ label.

DEFAULT_LOGGING = {
    ...
    'filters': {
        'correlation_filter': {
            '()': InjectingFilter
        }
    },
    'formatters': {
        'default': {
            'format': ('%(corrId)s - %(asctime)s - %(filename)s - '
                       '%(funcName)s - %(message)s')
        }
    },
    'handlers': {
        'console': {
            'level': 'DEBUG',
            'class': 'logging.StreamHandler',
            'formatter': 'default',
            'filters': ['correlation_filter']
        },
    }
    ...
}

This config includes all the needed details. New filter is defined with the name ‘correlation_filter’. Formatter configuration makes use of the header value injected into LogRecord. Lastly, the handler configuration ties it together with StreamHandler. Logging now can be used as seen in examples, by calling all the regular functions from logging module.

import logging
logger = logging.getLogger(__name__)

logger.debug('corrID %s extracted', tlocal.corr_id)
logger.debug('Removing corrID %s from thread local storage', tlocal.corr_id)
logger.info('User created')

In first two examples thread-local is used explicitly as it is passed into a string. Third example is just a plain simple log that you would see in every application. What is interesting is the output that goes into the log file or in this case the terminal.

# Making request with lovely httpie
http https://the.web.app Correlation-Id:123456

123456 - 2016-05-16 21:26:31,448 - log.py - process_request - corrID 123456 extracted
123456 - 2016-05-16 21:26:34,309 - log.py - process_response - Removing corrId from threa-local storage
123456 - 2016-05-16 21:26:35,042 - user.py - create - User created

As we can see, the Correlation-Id is silently added to our LogRecord objects and included in the log. In the case with many simultaneous requests, the log can be filtered and does not need any special machinery; pure grep will do the trick if you don’t have access to mentioned earlier elk.

The conclusion

There are no one size fits all solutions or at least I don’t think there are. You could provide libraries to cover handling of the headers on framework level, on server level, log configurations etc. But there are many ways to approach this, for example instead of having a middleware on framework level you could create one for wsgi. It would save you time writing a framework specific solution but it would tie you to wsgi (which could be beneficial for some users). This is why I believe there is no off the shelf solution. I definitely recommend implementing the proposed solution. I’ll stress this again, it is a custom solution. The road to the goal will be a bit rocky, full of surprises but definitely will teach you something new.