Categories
logging microservices

Effective logging strategies after running the production for 2 years

I had trouble picking an appropriate title. The guidelines you are about to read are based on a true story, sometimes a horror story. I can’t believe how much I have changed in this past 2 years. It was changed for the better. I grew a lot, I grew with the system I built, run, and supported.

Logging is hard, and it’s most of the time an afterthought. Read this guideline and learn from my mistakes. You may probably apply these recommendations to any language.

Categories
microservices

Bridge to kubernetes

When you are using kubernetes you have a cluster, and clusters are rarely small. Your gripes may come in different flavours though. The cluster won’t fit on your local box any more. You most probably have people in your team that need the cluster as well. You need the feedback faster than deploying your changes to the cluster. What to do with all those problems?

Run a bridge that will route traffic to your local version of one of the microservices. Easy. You will run your service locally that will interact with a kubernetes cluster without interrupting anyone. Sounds amazing, doesn’t it.

To illustrate this let’s run a sample app in a cluster and then do some “development” and “debugging”. This will not affect the “development” environment for other people that could use it. The code will be isolated and your team will be happy.

Categories
microservices patterns

Cloud agnostic apps with DAPR

DAPR is cool, as stated on the website “APIs for building portable and reliable microservices”, it works with many clouds, and external services. As a result you only need to configure the services and then use DAPR APIs. It is true and I’ll show you. You will find the code for this article here. A must have tool when working with micro services.

Applications can use DAPR as a sidecar container or as a separate process. I’ll show you a local version of the app, where DAPR is configured to use Redis running in a container. The repo will have Azure, AWS, and GCP configuration as well.

Before we can start the adventure you have to install DAPR.

Running app locally

Configuration

You have to start off on the right foot. Because of that we have to configure secrets store where you can keep passwords and such. I could skip this step but then there is a risk someone will never find out how to do it properly and will ship passwords in plain text. Here we go.

# save under ./components/secrets.yaml
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: local-secret-store
  namespace: default
spec:
  type: secretstores.local.file
  version: v1
  metadata:
  - name: secretsFile
    value: ../secrets.json

The file secrets.json should have all your secrets, like connection strings, user and pass pairs, etc. Don’t commit this file.

{
    "redisPass": "just a password"
}

Next file is publish subscribe configuration. Dead simple but I’d recommend going through the docs as there is much more to pub/sub. Here you can reference your secrets as shown.

# save under ./components/pubsub.yaml
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: order_pub_sub
spec:
  type: pubsub.redis
  version: v1
  metadata:
  - name: redisHost
    value: localhost:6379
  - name: redisPass
    secretKeyRef:
      name: redisPass
      key: redisPass
auth:
  secretStore: local-secret-store

Publisher and subscriber

With config out of the way only thing left are publisher part and subscriber part. As mentioned before the app you write talks to DAPR API. This means you may use http calls or Dapr client. Best part is that no matter what is on the other end, be it Redis or PostgreSQL, your code will not change even when you change your external services.

Here goes publisher that will send events to a topic. Topic can be hosted anywhere, here is a list of supported brokers. The list is long however only 3 are stable. I really like how DAPR is approaching components certification though. There are well defined requirements to pass to advance from Alpha, to Beta, and finally to Stable.

# save under ./publisher/app.py

import logging

from dapr.clients import DaprClient
from fastapi import FastAPI
from pydantic import BaseModel

logging.basicConfig(level=logging.INFO)


app = FastAPI()


class Order(BaseModel):
    product: str


@app.post("/orders")
def orders(order: Order):
    logging.info("Received order")
    with DaprClient() as dapr_client:
        dapr_client.publish_event(
            pubsub_name="order_pub_sub",
            topic_name="orders",
            data=order.json(),
            data_content_type="application/json",
        )
    return order

Here is a consumer.

# save under ./consumer/app.py

from dapr.ext.fastapi import DaprApp
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()
dapr_app = DaprApp(app)


class CloudEvent(BaseModel):
    datacontenttype: str
    source: str
    topic: str
    pubsubname: str
    data: dict
    id: str
    specversion: str
    tracestate: str
    type: str
    traceid: str


@dapr_app.subscribe(pubsub="order_pub_sub", topic="orders")
def orders_subscriber(event: CloudEvent):
    print("Subscriber received : %s" % event.data["product"], flush=True)
    return {"success": True}

Running the apps

Now you can run both apps together in separate terminal windows and see how they talk to each other using configured broker. For this example we are using Redis as a broker. You will see how easy is to run them on different platforms.

In the first terminal run the consumer.

$ dapr run --app-id order-processor --components-path ../components/ --app-port 8000 -- uvicorn app:app

In the other terminal run the producer.

$ dapr run --app-id order-processor --components-path ../components/ --app-port 8001 -- uvicorn app:app --port 8001

After you make a HTTP call to a producer you should see both of them producing log messages as follows.

$ http :8001/orders product=falafel

# producer
== APP == INFO:root:Received order
== APP == INFO:     127.0.0.1:49698 - "POST /orders HTTP/1.1" 200 OK

# subscriber
== APP == Subscriber received : falafel
== APP == INFO:     127.0.0.1:49701 - "POST /events/order_pub_sub/orders HTTP/1.1" 200 OK

Running app in the cloud

It took us a bit to reach the clu of this post. We had to build something, and run it so then we can run it in the cloud. Above example will run on cloud with a simple change of configuration.

Simplest configuration is for Azure. Change your pubsub.yaml so it looks as follows, and update your secrets.json as well.

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: order_pub_sub
  spec:
    type: pubsub.azure.servicebus
    version: v1
    metadata:
    - name: connectionString
      secretKeyRef:
        name: connectionStrings:azure
        key: connectionStrings:azure

Your secrets.json should look like this now

{
  "connectionStrings": {
    "azure": "YOUR CONNECTION STRING"
  }
}

Rerun both commands in terminal and the output will look the same as with local env but the app will run on Azure Service Bus.

Bloody magic if you’d ask me. You can mix and match your dependencies without changing your application. In some cases you may even use features not available to a particular cloud, like message routing based on body in Azure Service Bus. This will be another post though.

Here is the repo for this post, it includes all the providers listed below:

  • Azure
  • Google Cloud
  • AWS

Please remember to update your secrets.json.

Have fun 🙂

Categories
microservices patterns

GitOps workflow

I have been using GitOps in my last project and I like the way it changed my workflow. It had to change as in the world of microservices old ways have to go. It is not a post if this is good or bad, I may write one someday. This post is about GitOps, if you do not know what GitOps is read here. TLDR version: in practice, each commit deploys a new version of a service to one’s cluster.

Going back to the subject of the workflow. I’ll focus on the microservices workflow as here in my opinion GitOps is extremely useful. One of the main pain points of microservices architecture is deployment. When deployments are mentioned you instinctively think about deploying the application. It may be difficult but it is not as difficult as creating developer environments and QA environments.

Here comes GitOps. When applied to your project you immediately get new service per each commit. This applies to each service you have. Having this at your disposal you can set up your application in numerous combinations of versions. You can also easily replace one of the services in your stack. Sounds good but nothing beats a demo, so here we go.

Demo

Let’s say I have a task of verifying if our changes to one of the microservices are correct. I’m using Rio to manage my Kubernetes cluster as it makes things smoother. Change in one service affects another service, and I have to verify it using UI. This adds up to 3 services deployed in one namespace and configured so they talk to each other. After I add commit in the service repository there is a namespace already created on a cluster. Each commit creates a new version of the service.

% rio -n bugfix ps -q
bugfix:appCode language: CSS (css)

Now I need to add missing services, and I can do it by branching off from master. The name of the branch must match in all services involved.

% cd other_services && git checkout -b bugfix 
% git push

After pushing the changes Rio adds them to the same namespace.

% rio -n bugfix ps -q
bugfix:app
bugifx:web
bugfix:other_appCode language: CSS (css)

One thing left is to wire them up so services talk to each other. As I’m using recommendations from https://12factor.net/config so it is dead easy, and I can use Rio to do it. Edit command allows me to modify the environment variables of each service.

% rio -n bugfix edit web

This opens up your favourite text editor where you can edit the variables and setup where the web app can find other services. You can do the same changes in other services if necessary.

I have wired up services, they are talking to each other and I can proceed with my work. This my workflow using GitOps, Rio, and microservices.

Categories
logging microservices

Cleaning up messy distributed logs, part 3

If you have been reading along it is my final part of this short story on cleaning up messy logs. I might write one on Kibana itself some day, as it is a difficult tool to use for a beginners. Previous parts of this story can be found here: part 1 and part 2.

This last couple of days at work I could spent more time on fixing logs as new project estimates has finished. Last thing I sorted was syslog parsing, with that done I could think on figuring out how to log tracebacks raised in our services.

Tracebacks are spread over many lines and when they arrive to logstash they are parsed as separate entries and then sent to elasticsearch like wise. So Kibana displays them as separate as well making it really difficult to read. In addition of being separate they are in reverse order.

My idea is to serialize each log message into JSON and then parse it in logstash. As I have probably mentioned I am working on an app using microservices pattern and there is a few services to be done. First candidate is the gateway service, as it’s easiest to test and relatively simple with no additional components beside the service. It is flask app running on gunicorn so to configure logging there is no need to change the code but rather your entrypoint by adding --log-config.

#!/bin/bash
exec gunicorn -b 0.0.0.0:80 --log-config /code/log-config.ini --access-logfile - --reload "gate.app:create_app()"

Config file should be written in ini style.

[loggers]
keys=root

[handlers]
keys=console

[formatters]
keys=default

[logger_root]
level=DEBUG
qualname=root
handlers=console

[handler_console]
class=StreamHandler
formatter=default
args=(sys.stdout, )

[formatter_default]
class=logstash_formatter.LogstashFormatter

This alone handles almost whole application configuration, logs are nicely serialized into JSON with whole traceback.

Categories
microservices

Cleaning up messy distributed logs, part 2

This is a story on sorting logs. First part can be found here, it’s where you can get all the nitty gritty. If you are not feeling like it here is a short elevator pitch. Our kibana logs are a mess, each of the services seems to have different log format and trace backs are split across many entries. My goal is to fix it.

Second part is much much shorter as it’s one of the steps I took on path to sorting this pet peeve of mine. Unfortunately, I can not spend my whole time at work to fix it as it is not a crucial part of our application. At least it is not at this moment.

My first idea was to tackle "easy" issue of grok parsing failure. Our configuration is rather bloated IMO so I trimmed it (I had mentioned that it is not my concern but it easier to reason about simple code). From this:

      input {
        gelf { }
        heartbeat { }
        syslog {
          port => 5145
        }
      }
      filter {
        ruby {
          code => "
            event.to_hash.keys.each { |k| event[ k.gsub('"'.'"','"'_'"') ] = 
            event.remove(k) if k.include?'"'.'"' }
          "
        }
        grok {
          match => {
            "message" => [
              # Logspout
              "%{SYSLOG5424PRI}%{NONNEGINT:ver} +(?:%{TIMESTAMP_ISO8601:ts}|-) 
              +(?:%{HOSTNAME:containerid}|-) 
              +(?:%{WORD:process}\.%{INT:processnumber}\.%{WORD:processsha}|%{NOTSPACE:process}) 
              +(?:%{NOTSPACE:proc}|-) +(?:%{WORD:msgid}|-) 
              +(?:%{SYSLOG5424SD:sd}|-|) +%{GREEDYDATA:msg}"
            ]
          }
        }
      }
      output {
        elasticsearch {
          hosts => ["elasticsearch:9200"]
        }
        stdout {
          codec => rubydebug
        }
      }

I went to this:

     input {
      syslog { }
      }
      filter {
        grok {
          match => {
            "message" => [
              # Logspout
              "%{SYSLOG5424PRI}%{NONNEGINT:ver} +(?:%{TIMESTAMP_ISO8601:ts}|-) 
              +(?:%{HOSTNAME:containerid}|-) 
              +(?:%{WORD:process}\.%{INT:processnumber}\.%{WORD:processsha}|%{NOTSPACE:process}) 
              +(?:%{NOTSPACE:proc}|-) +(?:%{WORD:msgid}|-) 
              +(?:%{SYSLOG5424SD:sd}|-|) +%{GREEDYDATA:msg}"
            ]
          }
        }
      }
      output {
        elasticsearch {
          hosts => ["elasticsearch:9200"]
        }
      }

We had few plugins that were doing nothing so went under the cleaver. Since I can run the swarm locally I’d verify that all is logged and displayed correctly in kibana.

So then I did what everyone does, google it hard. Actually duckduckgo it but it just does not sound right. After lot of reading finally found something relevant to my case. So the perpetrator is syslog plugin, I mentioned before that logs format might not be consistent and possibly (for sure TBH) not up to syslog’s spec. As author recommended I had replaced syslog { } with tcp and udp plugins.

     input {
          tcp {
              port => 5145
              type => syslog
          }
          udp {
              port => 5145
              type => syslog
          }
      }

Logs are now without _grokparsefailure tags. Still bit messy but at least correctly and efficiently parsed. See you in the next part.

Categories
microservices

Cleaning up messy distributed logs, part 1

Logging is an important part of software lifecycle. It is used in every stage, from the beginning, through testing, deployment, staging, production. I find neat logs soothing, they increase my confidence in running software providing visibility when required.

Having said that I have taken a task to work on logs in a distributed system I’m working on. Since we push everything through ELK stack this is what I have inherited.

Unfiltered messy kibana logs

It looks like a mess, but this is because kibana does not have any sensible default configuration in our setup. After looking more closely there is another issue. I have marked it with red.

Unfiltered messy kibana logs with tags marked with red

So time to look at how logstash is configured as there is a problem with how grok filter is set up. Before doing that it is worth explaining how can you configure logstash as it will help in understanding my configuration, and maybe explain briefly what logstash is.

Easiest way to describe purpose of this tool is by citing introduction taken from the documentation.

Logstash is an open source data collection engine with real-time pipelining capabilities. Logstash can dynamically unify data from disparate sources and normalize the data into destinations of your choice.

How do you configure it? Simply, by taking the plugins you wish and putting them into your configuration file. Plugins are:

  • input plugins, they are sources of events fed to logstash. list
  • filter plugins, used for processing of event data. list
  • output plugins, send data to particular destination. list
  • codec plugins, can be part of input or output plugins and can change the data. list

In my case configuration is a bit complex and what’s most important grok is not configured correctly (So it seems but there might be something else as well in play). Some of the plugins may be redundant as well but this is not my concern now.

      input {
        gelf { }
        heartbeat { }
        syslog {
          port => 5145
        }
      }
      filter {
        ruby {
          code => "
            event.to_hash.keys.each { |k| event[ k.gsub('"'.'"','"'_'"') ] = 
            event.remove(k) if k.include?'"'.'"' }
          "
        }
        grok {
          match => {
            "message" => [
              # Logspout
              "%{SYSLOG5424PRI}%{NONNEGINT:ver} +(?:%{TIMESTAMP_ISO8601:ts}|-) 
              +(?:%{HOSTNAME:containerid}|-) 
              +(?:%{WORD:process}\.%{INT:processnumber}\.%{WORD:processsha}|%{NOTSPACE:process}) 
              +(?:%{NOTSPACE:proc}|-) +(?:%{WORD:msgid}|-) 
              +(?:%{SYSLOG5424SD:sd}|-|) +%{GREEDYDATA:msg}"
            ]
          }
        }
      }
      output {
        elasticsearch {
          hosts => ["elasticsearch:9200"]
        }
        stdout {
          codec => rubydebug
        }
      }

This is the state now, further on I’ll describe as I go, my quest to make logs great again.