I had trouble picking an appropriate title. The guidelines you are about to read are based on a true story, sometimes a horror story. I can’t believe how much I have changed in this past 2 years. It was changed for the better. I grew a lot, I grew with the system I built, run, and supported.
Logging is hard, and it’s most of the time an afterthought. Read this guideline and learn from my mistakes. You may probably apply these recommendations to any language.
tracing
Yes, it is not strictly logging but distributed tracing of your app can save you a ton of time. Don’t postpone it, you will thank me. Won’t recommend any tools, go with defaults for the platform you are on (AWS X-Ray, Azure Monitor).
logging levels
You should have logs in production and during development. Logging levels should be different though, and messages should provide a bit different details based on the environment. Production needs to run on logging.INFO
by default. While develop or testing will normally run on logging.DEBUG
. When developing you need the extra information this level provides. Logging configuration should also allow to modify log level of any module by flipping an environment variable.
Stable parts of your production environment can probably run on logging.ERROR
or higher. So that rusty old API that never raises can be set to silent. You should monitor response codes though so you won’t be totally blind.
log retention
Logging can be costly, but it is also very useful so you must strike a balance. Usually, it is best to keep production logs for longer than development logs. I’d say a sprint’s worth of logs during development is fine, but that depends on your use case. Production logs are definitely kept longer and it depends on your use case. For some types of applications, certain regulations will enforce the time of how long the logs should be kept.
logging in code
Probably stating the obvious thing here but there must be one formatter for all logs your application produces. Now a less obvious thing is each line should include an important identifier like “[ORDER ID: 123]” or “[USER ID: 456]”. Sifting through logs will be much easier. This technique is called log correlation. You can use custom LoggerAdapter for this as described in Python’s logging cookbook.
Log correlation may already be a feature of the tooling you are using like Opencensus Azure extension or its successor Opentelemetry. Don’t waste time implementing your own solution.
Below you will find more detailed information when each log level should be used.
logging.INFO
logging.INFO
must be used at the entry of a code path, as in a FastAPI router, Azure Function def main
, or apscheduler job start. Exit spots, where you return, must also use logging.INFO
.
Each line must provide at least one identifier (as mentioned in the previous paragraph) that can be used to query for more information.
Why do you need to log such info? Access logs of Azure Api Management do not provide information about the body of a request. Anything else than GET request most likely will have a body. This may be enough to debug basic issues on production, or at least confirm that you received a particular request. Check your tooling to see if you can get such information on the infrastructure level, if not do as I recommend.
logging.DEBUG
logging.DEBUG
must be used at entry, exit, and at all code branches (if/else
, etc.). Objects being output need to have full information. In case of too big objects they will be cut down by the logger. Any loops in the code should print out each iterated object. Results of third party calls should also be logged, as well as parameters used in the call (enabling DEBUG
logs for third party might produce a lot of logs so check).
The data transformation step in the method should also probably log the result, as it may assist in debugging. Unless it is done in a call to another function. In such case, the function will log with debug level as mentioned in the beginning of this chapter.
Code branches should log the code as verbatim.
if checkout_data.status == CheckoutStatus.DRAFT:
logger.debug(
"Branch: checkout_data.status == CheckoutStatus.DRAFT",
)
Code language: Python (python)
It is easier to spot the divergence of the log message with the code. If someone would pick a name to summarise the block behind if
it could easier become outdated and change without people realising it. I know this is a tough sell but logging information which branch of code base is being executed is important.
logging.WARNING
logging.WARNING
must be used for cases where we need to be notified about certain situation that is not strictly an error. Examples:
- handled connection issues
- deprecated feature flag usage
logging.EXCEPTION
logging.EXCEPTION
must be used for uncaught exceptions handler so we can get a traceback that would help in fixing the issue. It also must be used when handling general Exception
or other very broad try/except
.
try:
ext_service.make_call(
session_token=token,
)
except SpecificException:
...
except Exception as exc:
logger.exception(
"Unhandled error in session call, %s",
session_code,
exc_info=exc,
)
return JSONResponse(
status_code=500,
)
Code language: PHP (php)
logging.ERROR
logger.ERROR
must be used for cases where there is an error in execution that needs to be fixed. This will trigger integrations to raise an issue in Rollbar or another tool. Most likely this will be inside an error handling code block and will not crash or stop the application. Please make sure that logging in global exception handler is using correct formatting.
These are all recommendations I could think of. Hopefully reading them will provoke you to make your life a bit easier. As an extra bonus below is a presentation I gave regarding logging in production apps.
Peace