0

I have an existing Python application, which logs like:

import logging
import json
logger = logging.getLogger()

some_var = 'abc'
data = {
   1: 2,
   'blah': {
      ['hello']
   }
}

logger.info(f"The value of some_var is {some_var} and data is {json.dumps(data)}")

So the logger.info function is given:

The value of some_var is abc and data is {1: 2,"blah": {["hello"]}}

Currently my logs go to AWS CloudWatch, which does some magic and renders this with indentation like:

The value of some_var is abc and data is {
   1: 2,
   "blah": {
      ["hello"]
   }
}

This makes the logs super clear to read.

Now I want to make some changes to my logging, handling it myself with another python script that wraps around my code and emails out logs when there's a failure.

What I want is some way of taking each log entry (or a stream/list of entries), and applying this indentation.

So I want a function which takes in a string, and detects which subset(s) of that string are json, then inserts \n and to pretty-print that json.

example input:

Hello, {"a": {"b": "c"}} is some json data, but also {"c": [1,2,3]} is too

example output

Hello, 
{
  "a": {
    "b": "c"
  }
} 
is some json data, but also 
{
  "c": [
    1,
    2,
    3
  ]
}
is too

I have considered splitting up each entry into everything before and after the first {. Leave the left half as is, and pass the right half to json.dumps(json.loads(x), indent=4).

But what if there's stuff after the json object in the log file? Ok, we can just select everything after the first { and before the last }. Then pass the middle bit to the JSON library.

But what if there's two JSON objects in this log entry? (Like in the above example.) We'll have to use a stack to figure out whether any { appears after all prior { have been closed with a corresponding }.

But what if there's something like {"a": "\}"}. Hmm, ok we need to handle escaping. Now I find myself having to write a whole json parser from scratch.

Is there any easy way to do this?

I suppose I could use a regex to replace every instance of json.dumps(x) in my whole repo with json.dumps(x, indent=4). But json.dumps is sometimes used outside logging statements, and it just makes all my logging lines that extra bit longer. Is there a neat elegant solution?

(Bonus points if it can parse and indent the json-like output that str(x) produces in python. That's basically json with single quotes instead of double.)

1 Answer 1

1

In order to extract JSON objects from a string, see this answer. The extract_json_objects() function from that answer will handle JSON objects, and nested JSON objects but nothing else. If you have a list in your log outside of a JSON object, it's not going to be picked up.

In your case, modify the function to also return the strings/text around all the JSON objects, so that you can put them all into the log together (or replace the logline):

from json import JSONDecoder

def extract_json_objects(text, decoder=JSONDecoder()):
    pos = 0
    while True:
        match = text.find('{', pos)
        if match == -1:
            yield text[pos:]  # return the remaining text
            break
        yield text[pos:match]  # modification for the non-JSON parts
        try:
            result, index = decoder.raw_decode(text[match:])
            yield result
            pos = match + index
        except ValueError:
            pos = match + 1

Use that function to process your loglines, add them to a list of strings, which you then join together to produce a single string for your output, logger, etc.:

def jsonify_logline(line):
    line_parts = []
    for result in extract_json_objects(line):
        if isinstance(result, dict):  # got a JSON obj
            line_parts.append(json.dumps(result, indent=4))
        else:                         # got text/non-JSON-obj
            line_parts.append(result)
    # (don't make that a list comprehension, quite un-readable)

    return ''.join(line_parts)

Example:

>>> demo_text = """Hello, {"a": {"b": "c"}} is some json data, but also {"c": [1,2,3]} is too"""
>>> print(jsonify_logline(demo_text))
Hello, {
    "a": {
        "b": "c"
    }
} is some json data, but also {
    "c": [
        1,
        2,
        3
    ]
} is too
>>>

Other things not directly related which would have helped:

  • Instead of using json.dumps(x) for all your log lines, following the DRY principle and create a function like logdump(x) which does whatever you'd want to do, like json.dumps(x), or json.dumps(x, indent=4), or jsonify_logline(x). That way, if you needed to change the JSON format for all your logs, you just change that one function; no need for mass "search & replace", which comes with its own issues and edge-cases.
    • You can even add an optional parameter to it pretty=True to decide if you want it indented or not.
  • You could mass search & replace all your existing loglines to do logger.blah(jsonify_logline(<previous log f-string or text>))
  • If you are JSON-dumping custom objects/class instances, then use their __str__ method to always output pretty-printed JSON. And the __repr__ to be non-pretty/compact.
    • Then you wouldn't need to modify the logline at all. Doing logger.info(f'here is my object {x}') would directly invoke obj.__str__.
Sign up to request clarification or add additional context in comments.

2 Comments

Great! A logdump function like that would be good. In particular I have a function I use sometimes when I want to dump to json a dict containing a datetime (which causes json.dumps to crash)
jsonify_logline is your new logdump. It shouldn't give errors for datetimes since it's not dict/json object. But that could also be wrapped in a try-else block. Update: fixed a bug where the trailing text after the last JSON object won't be returned.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.