Categories
microservices patterns

GitOps workflow

I have been using GitOps in my last project and I like the way it changed my workflow. It had to change as in the world of microservices old ways have to go. It is not a post if this is good or bad, I may write one someday. This post is about GitOps, if you do not know what GitOps is read here. TLDR version: in practice, each commit deploys a new version of a service to one’s cluster.

Going back to the subject of the workflow. I’ll focus on the microservices workflow as here in my opinion GitOps is extremely useful. One of the main pain points of microservices architecture is deployment. When deployments are mentioned you instinctively think about deploying the application. It may be difficult but it is not as difficult as creating developer environments and QA environments.

Here comes GitOps. When applied to your project you immediately get new service per each commit. This applies to each service you have. Having this at your disposal you can set up your application in numerous combinations of versions. You can also easily replace one of the services in your stack. Sounds good but nothing beats a demo, so here we go.

Demo

Let’s say I have a task of verifying if our changes to one of the microservices are correct. I’m using Rio to manage my Kubernetes cluster as it makes things smoother. Change in one service affects another service, and I have to verify it using UI. This adds up to 3 services deployed in one namespace and configured so they talk to each other. After I add commit in the service repository there is a namespace already created on a cluster. Each commit creates a new version of the service.

% rio -n bugfix ps -q
bugfix:app

Now I need to add missing services, and I can do it by branching off from master. The name of the branch must match in all services involved.

% cd other_services && git checkout -b bugfix 
% git push

After pushing the changes Rio adds them to the same namespace.

% rio -n bugfix ps -q
bugfix:app
bugifx:web
bugfix:other_app

One thing left is to wire them up so services talk to each other. As I’m using recommendations from https://12factor.net/config so it is dead easy, and I can use Rio to do it. Edit command allows me to modify the environment variables of each service.

% rio -n bugfix edit web

This opens up your favourite text editor where you can edit the variables and setup where the web app can find other services. You can do the same changes in other services if necessary.

I have wired up services, they are talking to each other and I can proceed with my work. This my workflow using GitOps, Rio, and microservices.

Categories
logging microservices

Cleaning up messy distributed logs, part 3

If you have been reading along it is my final part of this short story on cleaning up messy logs. I might write one on Kibana itself some day, as it is a difficult tool to use for a beginners. Previous parts of this story can be found here: part 1 and part 2.

This last couple of days at work I could spent more time on fixing logs as new project estimates has finished. Last thing I sorted was syslog parsing, with that done I could think on figuring out how to log tracebacks raised in our services.

Tracebacks are spread over many lines and when they arrive to logstash they are parsed as separate entries and then sent to elasticsearch like wise. So Kibana displays them as separate as well making it really difficult to read. In addition of being separate they are in reverse order.

My idea is to serialize each log message into JSON and then parse it in logstash. As I have probably mentioned I am working on an app using microservices pattern and there is a few services to be done. First candidate is the gateway service, as it’s easiest to test and relatively simple with no additional components beside the service. It is flask app running on gunicorn so to configure logging there is no need to change the code but rather your entrypoint by adding --log-config.

#!/bin/bash
exec gunicorn -b 0.0.0.0:80 --log-config /code/log-config.ini --access-logfile - --reload "gate.app:create_app()"

Config file should be written in ini style.

[loggers]
keys=root

[handlers]
keys=console

[formatters]
keys=default

[logger_root]
level=DEBUG
qualname=root
handlers=console

[handler_console]
class=StreamHandler
formatter=default
args=(sys.stdout, )

[formatter_default]
class=logstash_formatter.LogstashFormatter

This alone handles almost whole application configuration, logs are nicely serialized into JSON with whole traceback.

Categories
microservices

Cleaning up messy distributed logs, part 2

This is a story on sorting logs. First part can be found here, it’s where you can get all the nitty gritty. If you are not feeling like it here is a short elevator pitch. Our kibana logs are a mess, each of the services seems to have different log format and trace backs are split across many entries. My goal is to fix it.

Second part is much much shorter as it’s one of the steps I took on path to sorting this pet peeve of mine. Unfortunately, I can not spend my whole time at work to fix it as it is not a crucial part of our application. At least it is not at this moment.

My first idea was to tackle "easy" issue of grok parsing failure. Our configuration is rather bloated IMO so I trimmed it (I had mentioned that it is not my concern but it easier to reason about simple code). From this:

      input {
        gelf { }
        heartbeat { }
        syslog {
          port => 5145
        }
      }
      filter {
        ruby {
          code => "
            event.to_hash.keys.each { |k| event[ k.gsub('"'.'"','"'_'"') ] = 
            event.remove(k) if k.include?'"'.'"' }
          "
        }
        grok {
          match => {
            "message" => [
              # Logspout
              "%{SYSLOG5424PRI}%{NONNEGINT:ver} +(?:%{TIMESTAMP_ISO8601:ts}|-) 
              +(?:%{HOSTNAME:containerid}|-) 
              +(?:%{WORD:process}\.%{INT:processnumber}\.%{WORD:processsha}|%{NOTSPACE:process}) 
              +(?:%{NOTSPACE:proc}|-) +(?:%{WORD:msgid}|-) 
              +(?:%{SYSLOG5424SD:sd}|-|) +%{GREEDYDATA:msg}"
            ]
          }
        }
      }
      output {
        elasticsearch {
          hosts => ["elasticsearch:9200"]
        }
        stdout {
          codec => rubydebug
        }
      }

I went to this:

     input {
      syslog { }
      }
      filter {
        grok {
          match => {
            "message" => [
              # Logspout
              "%{SYSLOG5424PRI}%{NONNEGINT:ver} +(?:%{TIMESTAMP_ISO8601:ts}|-) 
              +(?:%{HOSTNAME:containerid}|-) 
              +(?:%{WORD:process}\.%{INT:processnumber}\.%{WORD:processsha}|%{NOTSPACE:process}) 
              +(?:%{NOTSPACE:proc}|-) +(?:%{WORD:msgid}|-) 
              +(?:%{SYSLOG5424SD:sd}|-|) +%{GREEDYDATA:msg}"
            ]
          }
        }
      }
      output {
        elasticsearch {
          hosts => ["elasticsearch:9200"]
        }
      }

We had few plugins that were doing nothing so went under the cleaver. Since I can run the swarm locally I’d verify that all is logged and displayed correctly in kibana.

So then I did what everyone does, google it hard. Actually duckduckgo it but it just does not sound right. After lot of reading finally found something relevant to my case. So the perpetrator is syslog plugin, I mentioned before that logs format might not be consistent and possibly (for sure TBH) not up to syslog’s spec. As author recommended I had replaced syslog { } with tcp and udp plugins.

     input {
          tcp {
              port => 5145
              type => syslog
          }
          udp {
              port => 5145
              type => syslog
          }
      }

Logs are now without _grokparsefailure tags. Still bit messy but at least correctly and efficiently parsed. See you in the next part.

Categories
microservices

Cleaning up messy distributed logs, part 1

Logging is an important part of software lifecycle. It is used in every stage, from the beginning, through testing, deployment, staging, production. I find neat logs soothing, they increase my confidence in running software providing visibility when required.

Having said that I have taken a task to work on logs in a distributed system I’m working on. Since we push everything through ELK stack this is what I have inherited.

Unfiltered messy kibana logs

It looks like a mess, but this is because kibana does not have any sensible default configuration in our setup. After looking more closely there is another issue. I have marked it with red.

Unfiltered messy kibana logs with tags marked with red

So time to look at how logstash is configured as there is a problem with how grok filter is set up. Before doing that it is worth explaining how can you configure logstash as it will help in understanding my configuration, and maybe explain briefly what logstash is.

Easiest way to describe purpose of this tool is by citing introduction taken from the documentation.

Logstash is an open source data collection engine with real-time pipelining capabilities. Logstash can dynamically unify data from disparate sources and normalize the data into destinations of your choice.

How do you configure it? Simply, by taking the plugins you wish and putting them into your configuration file. Plugins are:

  • input plugins, they are sources of events fed to logstash. list
  • filter plugins, used for processing of event data. list
  • output plugins, send data to particular destination. list
  • codec plugins, can be part of input or output plugins and can change the data. list

In my case configuration is a bit complex and what’s most important grok is not configured correctly (So it seems but there might be something else as well in play). Some of the plugins may be redundant as well but this is not my concern now.

      input {
        gelf { }
        heartbeat { }
        syslog {
          port => 5145
        }
      }
      filter {
        ruby {
          code => "
            event.to_hash.keys.each { |k| event[ k.gsub('"'.'"','"'_'"') ] = 
            event.remove(k) if k.include?'"'.'"' }
          "
        }
        grok {
          match => {
            "message" => [
              # Logspout
              "%{SYSLOG5424PRI}%{NONNEGINT:ver} +(?:%{TIMESTAMP_ISO8601:ts}|-) 
              +(?:%{HOSTNAME:containerid}|-) 
              +(?:%{WORD:process}\.%{INT:processnumber}\.%{WORD:processsha}|%{NOTSPACE:process}) 
              +(?:%{NOTSPACE:proc}|-) +(?:%{WORD:msgid}|-) 
              +(?:%{SYSLOG5424SD:sd}|-|) +%{GREEDYDATA:msg}"
            ]
          }
        }
      }
      output {
        elasticsearch {
          hosts => ["elasticsearch:9200"]
        }
        stdout {
          codec => rubydebug
        }
      }

This is the state now, further on I’ll describe as I go, my quest to make logs great again.

Categories
logging python

Logging and microservices

The context

As you might already know, logging is a rather important thing. If you don’t know, you will probably find out, as (like with backups) there are people who log and ones who just have plenty of spare time. Good logging is hard. The line between spamming and logging is thin, the subject of useful logging is worth exploring, possibly in another post.

This post will be about logging information by tracking incoming requests within web applications based on microservices architecture. Having a distributed system means that a request send by the user will most probably hit more than one service. It is very useful to keep track of how the request is routed across all the services. When one service calls another, in order to not treat every call as separate but have a bigger picture of what has been happening with data along the way, the request has to be identified and the identifier must be passed along.

By using the REST approach each of the services becomes the consumer and provider of the API. They are third party API’s for each other.

The goal

Distributed systems require you to have a central place for storing and viewing logs, so that you remain sane. A good choice would be something like elk.

The fact is that apps are multi threaded but a log is not. It is single threaded and time-indexed. It doesn’t matter if the log is kept in one place or partitioned onto several computers; reading it with no indicator how records are related does not make a lot of sense. I’d compare it to trying to read a document that went through a shredder.

Here’s a real life example to illustrate this problem. Imagine a simple document search engine that allows users to search for a document that has been stored on the server. The user types in a name in the search box and the system comes up with a list of candidates. Pretty simple and common example of an app. From the user’s point of view it looks pretty simple, however, it gets complicated when we look into the internals. Under the hood, querying the app might open a request to one or many servers. Servers might be kept behind load balancers that would introduce additional fragmentation of requests and logs. If you take into account different response times that may happen, depending on the server load etc., you will definitely not get a linear and chronological log of events that happened. In addition to that, each service will treat incoming request as a unique request without the history that may have happened up to this point. Debugging would be horrendous. So the goal is to provide a set of tools that would let you track how the request travels across services. Each request should be given an ID that would travel with it and would be stored with each log record being saved.

The solution

The idea is to have a header with a unique ID on each request that would identify them when travelling across the system. The name of the header is ‘Correlation-Id’, you may pick whatever suits your case. The solution is based on a real world example with pretty simple tech stack.

  • ubuntu
  • nginx
  • uwsgi
  • flask or django

I’d like to advise you that the code here is not complete and will probably not work if copied and pasted. Hopefully it is clear enough to guide you to your solution.

Nginx

Requests coming into the system are first handled by the Nginx server. There are two cases that it needs to handle: request with Correlation-Id header and one without the header. If the request has the required Correlation-Id header, nginx passes the request to uwsgi without modification. The second case is when the request does not include the Correlation-Id header and nginx has to inject it. It is possible to provide a bit of logic into nginx config file – the code has to be written in Lua (a very simple language), thus it requires a module to be installed. Having the logic inside the config file makes it easy to deploy and maintain. I’d reconsider this with a more complicated case, but for this one it seems all right. When installing nginx package by ansible, the only thing that had to be modified was the name of the package. Ubuntu has an nginx-extras package that adds Lua with a bunch of other features. After you install nginx-extras you can write Lua in config files.

    # Generate random HTTP_CORRELATION_ID if request does not provide it
    set_by_lua $correlation_id '
        if(ngx.var.http_correlation_id == nil or ngx.var.http_correlation_id == "")
            then
               return string.format("%010d", math.random(0, 10000000000))
            else
               return ngx.var.http_correlation_id
        end
    ';

This snippet evaluates if the request includes the Correlation-Id header and generates new random value if it’s missing. The value being returned can now be attached to the request.

uwsgi_param HTTP_CORRELATION_ID $correlation_id;

How you inject value depends on what is your setup. When running an app using uwsgi and sockets I could not make it work by calling proxy_set_header but it worked with uwsgi_param instead. After switching to serving app on port it worked fine with proxy_set_header.

Next step is to handle the header in app as uwsgi simply passes it through.

The app

At this point all requests have a Correlation-Id header which can be extracted and used for logging purposes. In order to simplify the life of a developer and automate all the work, an intermediary needs to do the heavy lifting. Processing of the header has to be done before the request will be handled over to any views in Django or Flask app. Logging effort also has to be minimal (as Homer Simpson said, "Can’t someone else do it?"). Applications also contact other services so such calls should include the header as well. Thus the way in which the Correlation-Id is transferred should be done effortlesly. Finally, the response should be returned to the originator as it may also log the result on its side. Every element of this puzzle is invisible to the developer, apart from the helpers designed to work when a service contacts other services. It is not possible to enforce sending the required header unless the requests library (which almost everyone loves), becomes a standard in the project. I provide examples using requests as everyone in my team uses it, if you are using different tools you have to make your own helpers.

Flask

As mentioned earlier, here comes the bit that does the heavy lifting. Flask provides two methods allowing to register functions that will run before each request and after each request. I’ll use them to get the value of Correlation-Id from the header and to attach the value to the header on exit.

import threading

from flask import request

tlocal = threading.local()

def log_incoming(*args, **kwargs):
    tlocal.corr_id = request.headers.get('Correlation-Id')
    logger.debug('corrID %s extracted', tlocal.corr_id)

Flask already uses thread-local objects, and if you are only using Flask as your framework, you might get away with using the request object instead of calling threading.local.

def cleanup_after_outgoing(response):
    response.headers['Correlation-Id'] = tlocal.corr_id
    logger.debug('Removing corrID %s from thread local storage',
                 tlocal.corr_id)
    del tlocal.corr_id
    return response

Making sure the header is returned and doing cleanup like a good scout. The middleware has to be registered next so Flask can use it.

app = Flask(__name__)

# register middleware
app.before_request(log_incoming)
app.after_request(cleanup_after_outgoing)

Django

Middleware for Django is quite similar. The place where it differs is registration of the middleware in the framework.

class ThreadStorageMiddleware:
    def process_request(self, request):
        tlocal.corr_id = request.META.get(UWSGI_CORRELATION_HEADER)
        logger.debug('corrID %s extracted', tlocal.corr_id)

    def process_response(self, request, response):
        try:
            response[CORRELATION_HEADER] = tlocal.corr_id
        except AttributeError:
            # corr_id has not been set
            return response
        logger.debug('Removing corrID %s from thread local storage',
                     tlocal.corr_id)
        del tlocal.corr_id
        return response

Using thread-local storage in Django is required as there is no mechanism for storing information in an accessible way in every function during the lifetime of response. Remember to register middleware by adding it to the MIDDLEWARE_CLASSES list.

Utils

The whole effort of adding the middleware is not useful if you are not passing the Correlation-Id value to other services that you might be contacting. The solution proposed in this post is not forcing anyone to use specific libraries, however, this greatly simplifies passing the header. I gotta say, this is not ideal and may change in the future depending on how often people forget about using it.

def add_corr_id(headers=None):
    headers = headers or {}
    headers.update({'Correlation-Id': tlocal.corr_id})
    return headers
    
    
requests.get('https://the.cheddar.com', headers=add_corr_id())

headers = {'Required-Content': 'More beans'}
requests.get('https://monty.py', header=add_corr_id(headers))

Helper functions creating the header if header is not passed or adding it to existing dictionary.

Logging

Logging has not been mentioned yet. All the examples were preparing the framework for feeding data to logging so we could log something useful and make sense out of the log file. As with the examples above, logging was designed to be effortless in use by the developer.


class InjectingFilter(logging.Filter):
    def filter(self, record):
        try:
            record.corrId = tlocal.corr_id
        except AttributeError:
            record.corrId = 'Internal'
        return True

First the filter is defined that uses value saved in thread-local storage and injects it into the LogRecord. There will be logs that originated within the service and thus will not have the Correlation-Id header. They are marked with ‘Internal’ label.

DEFAULT_LOGGING = {
    ...
    'filters': {
        'correlation_filter': {
            '()': InjectingFilter
        }
    },
    'formatters': {
        'default': {
            'format': ('%(corrId)s - %(asctime)s - %(filename)s - '
                       '%(funcName)s - %(message)s')
        }
    },
    'handlers': {
        'console': {
            'level': 'DEBUG',
            'class': 'logging.StreamHandler',
            'formatter': 'default',
            'filters': ['correlation_filter']
        },
    }
    ...
}

This config includes all the needed details. New filter is defined with the name ‘correlation_filter’. Formatter configuration makes use of the header value injected into LogRecord. Lastly, the handler configuration ties it together with StreamHandler. Logging now can be used as seen in examples, by calling all the regular functions from logging module.

import logging
logger = logging.getLogger(__name__)

logger.debug('corrID %s extracted', tlocal.corr_id)
logger.debug('Removing corrID %s from thread local storage', tlocal.corr_id)
logger.info('User created')

In first two examples thread-local is used explicitly as it is passed into a string. Third example is just a plain simple log that you would see in every application. What is interesting is the output that goes into the log file or in this case the terminal.

# Making request with lovely httpie
http https://the.web.app Correlation-Id:123456

123456 - 2016-05-16 21:26:31,448 - log.py - process_request - corrID 123456 extracted
123456 - 2016-05-16 21:26:34,309 - log.py - process_response - Removing corrId from threa-local storage
123456 - 2016-05-16 21:26:35,042 - user.py - create - User created

As we can see, the Correlation-Id is silently added to our LogRecord objects and included in the log. In the case with many simultaneous requests, the log can be filtered and does not need any special machinery; pure grep will do the trick if you don’t have access to mentioned earlier elk.

The conclusion

There are no one size fits all solutions or at least I don’t think there are. You could provide libraries to cover handling of the headers on framework level, on server level, log configurations etc. But there are many ways to approach this, for example instead of having a middleware on framework level you could create one for wsgi. It would save you time writing a framework specific solution but it would tie you to wsgi (which could be beneficial for some users). This is why I believe there is no off the shelf solution. I definitely recommend implementing the proposed solution. I’ll stress this again, it is a custom solution. The road to the goal will be a bit rocky, full of surprises but definitely will teach you something new.