How to implement custom indentation when pretty-printing with the JSON module?

Question

So I'm using Python 2.7, using the json module to encode the following data structure:

'layer1': {
    'layer2': {
        'layer3_1': [ long_list_of_stuff ],
        'layer3_2': 'string'
    }
}

My problem is that I'm printing everything out using pretty printing, as follows:

json.dumps(data_structure, indent=2)

Which is great, except I want to indent it all, except for the content in "layer3_1" — It's a massive dictionary listing coordinates, and as such, having a single value set on each one makes pretty printing create a file with thousands of lines, with an example as follows:

{
  "layer1": {
    "layer2": {
      "layer3_1": [
        {
          "x": 1,
          "y": 7
        },
        {
          "x": 0,
          "y": 4
        },
        {
          "x": 5,
          "y": 3
        },
        {
          "x": 6,
          "y": 9
        }
      ],
      "layer3_2": "string"
    }
  }
}

What I really want is something similar to the following:

{
  "layer1": {
    "layer2": {
      "layer3_1": [{"x":1,"y":7},{"x":0,"y":4},{"x":5,"y":3},{"x":6,"y":9}],
      "layer3_2": "string"
    }
  }
}

I hear it's possible to extend the json module: Is it possible to set it to only turn off indenting when inside the "layer3_1" object? If so, would somebody please tell me how?

For "pretty printing" you mean you're using the pprint module? — Bakuriu
– Bakuriu, Commented Nov 6, 2012 at 10:55
Amended the first snippet to something recognisable. And I'm using json.dumps(data_structure, indent=2) - Added that as an example. — Rohaq
– Rohaq, Commented Nov 6, 2012 at 10:57
I've posted a solution that works on 2.7 and plays nicely with options such as sort_keys and does not have special case implementation for sort order and instead relies on (composition with) collections.OrderedDict. — Erik Kaplun
– Erik Kaplun, Commented Sep 19, 2014 at 14:21

martineau · Accepted Answer · 2021-04-19 02:40:04Z

32

^{(Note:
The code in this answer only works with json.dumps() which returns a JSON formatted string, but not with json.dump() which writes directly to file-like objects. There's a modified version of it that works with both in my answer to the question Write two-dimensional list to JSON file.)}

Updated

Below is a version of my original answer that has been revised several times. Unlike the original, which I posted only to show how to get the first idea in J.F.Sebastian's answer to work, and which like his, returned a non-indented string representation of the object. The latest updated version returns the Python object JSON formatted in isolation.

The keys of each coordinate dict will appear in sorted order, as per one of the OP's comments, but only if a sort_keys=True keyword argument is specified in the initial json.dumps() call driving the process, and it no longer changes the object's type to a string along the way. In other words, the actual type of the "wrapped" object is now maintained.

I think not understanding the original intent of my post resulted in number of folks downvoting it—so, primarily for that reason, I have "fixed" and improved my answer several times. The current version is a hybrid of my original answer coupled with some of the ideas @Erik Allik used in his answer, plus useful feedback from other users shown in the comments below this answer.

The following code appears to work unchanged in both Python 2.7.16 and 3.7.4.

from _ctypes import PyObj_FromPtr
import json
import re

class NoIndent(object):
    """ Value wrapper. """
    def __init__(self, value):
        self.value = value


class MyEncoder(json.JSONEncoder):
    FORMAT_SPEC = '@@{}@@'
    regex = re.compile(FORMAT_SPEC.format(r'(\d+)'))

    def __init__(self, **kwargs):
        # Save copy of any keyword argument values needed for use here.
        self.__sort_keys = kwargs.get('sort_keys', None)
        super(MyEncoder, self).__init__(**kwargs)

    def default(self, obj):
        return (self.FORMAT_SPEC.format(id(obj)) if isinstance(obj, NoIndent)
                else super(MyEncoder, self).default(obj))

    def encode(self, obj):
        format_spec = self.FORMAT_SPEC  # Local var to expedite access.
        json_repr = super(MyEncoder, self).encode(obj)  # Default JSON.

        # Replace any marked-up object ids in the JSON repr with the
        # value returned from the json.dumps() of the corresponding
        # wrapped Python object.
        for match in self.regex.finditer(json_repr):
            # see https://stackoverflow.com/a/15012814/355230
            id = int(match.group(1))
            no_indent = PyObj_FromPtr(id)
            json_obj_repr = json.dumps(no_indent.value, sort_keys=self.__sort_keys)

            # Replace the matched id string with json formatted representation
            # of the corresponding Python object.
            json_repr = json_repr.replace(
                            '"{}"'.format(format_spec.format(id)), json_obj_repr)

        return json_repr


if __name__ == '__main__':
    from string import ascii_lowercase as letters

    data_structure = {
        'layer1': {
            'layer2': {
                'layer3_1': NoIndent([{"x":1,"y":7}, {"x":0,"y":4}, {"x":5,"y":3},
                                      {"x":6,"y":9},
                                      {k: v for v, k in enumerate(letters)}]),
                'layer3_2': 'string',
                'layer3_3': NoIndent([{"x":2,"y":8,"z":3}, {"x":1,"y":5,"z":4},
                                      {"x":6,"y":9,"z":8}]),
                'layer3_4': NoIndent(list(range(20))),
            }
        }
    }

    print(json.dumps(data_structure, cls=MyEncoder, sort_keys=True, indent=2))

Output:

{
  "layer1": {
    "layer2": {
      "layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}, {"a": 0, "b": 1, "c": 2, "d": 3, "e": 4, "f": 5, "g": 6, "h": 7, "i": 8, "j": 9, "k": 10, "l": 11, "m": 12, "n": 13, "o": 14, "p": 15, "q": 16, "r": 17, "s": 18, "t": 19, "u": 20, "v": 21, "w": 22, "x": 23, "y": 24, "z": 25}],
      "layer3_2": "string",
      "layer3_3": [{"x": 2, "y": 8, "z": 3}, {"x": 1, "y": 5, "z": 4}, {"x": 6, "y": 9, "z": 8}],
      "layer3_4": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
    }
  }
}

edited Apr 19, 2021 at 2:40

answered Nov 6, 2012 at 13:31

martineau

124k29 gold badges181 silver badges319 bronze badges

Sign up to request clarification or add additional context in comments.

16 Comments

Rohaq Over a year ago

Nice, I got this working, but wanted to sort the x and y for vanity's sake (parts of the JSON produced needs to be hand edited later on, don't ask why :(), so I tried using an OrderedDict. Now my problem is that I get the following in my output:

"layer3_1": "[OrderedDict([('x', 804), ('y', 622)]), OrderedDict([('x', 817), ('y', 635)]), OrderedDict([('x', 817), ('y', 664)]), OrderedDict([('x', 777), (' y', 664)]), OrderedDict([('x', 777), ('y', 622)]), OrderedDict([('x', 804), ('y' , 622)])]",

I think I'm missing something...

Erik Kaplun Over a year ago

This still prints the list as a string instead.

AnnieFromTaiwan Over a year ago

@ErikAllik was exactly right. The list became a string: "[{'x':1, 'y':7}, {'x':0, 'y':4}, {'x':5, 'y':3}, {'x':6, 'y':9}]". This is a wrong answer!

Polv Over a year ago

Not working with deserialisation (json.loads()) due to using single quote. I have to use @ErikAllik 's answer instead. -- github.com/patarapolw/pyexcel-formatter/blob/master/…

martineau Over a year ago

@Polv: Thanks for the feedback. I've updated my answer to address the issue.

|

M Somerville · Accepted Answer · 2016-05-20 08:06:20Z

18

A bodge, but once you have the string from dumps(), you can perform a regular expression substitution on it, if you're sure of the format of its contents. Something along the lines of:

s = json.dumps(data_structure, indent=2)
s = re.sub('\s*{\s*"(.)": (\d+),\s*"(.)": (\d+)\s*}(,?)\s*', r'{"\1":\2,"\3":\4}\5', s)

edited May 20, 2016 at 8:06

answered Nov 6, 2012 at 11:21

M Somerville

4,64032 silver badges39 bronze badges

3 Comments

Rohaq Over a year ago

Thanks, this worked too, and is indeed smaller, but decided to go with the solution provided by @martineau

Barney Szabolcs Over a year ago

Your solution is very funny!:) I love it, and it doesn't require any "NoIdent" tagging, works out of the box. I'll probably test it for large input files tomorrow, I'm looking for a simple solution to break out of the csv world since it doesn't really allow for metadata, yet keep the readability.

Marc Moreaux Over a year ago

hey, amazing answer ! I build on your ideas by provinding a more generic solution with the following regex: re.sub((?:\n\s{8,}(.*))|(?:\n\s{6,}(]|})), r'\1\2', s) Or read it in regex101.com/r/xWT7I1/2

Ehsan · Accepted Answer · 2020-07-30 23:56:22Z

12

The following solution seems to work correctly on Python 2.7.x. It uses a workaround taken from Custom JSON encoder in Python 2.7 to insert plain JavaScript code to avoid custom-encoded objects ending up as JSON strings in the output by using a UUID-based replacement scheme.

class NoIndent(object):
    def __init__(self, value):
        self.value = value


class NoIndentEncoder(json.JSONEncoder):
    def __init__(self, *args, **kwargs):
        super(NoIndentEncoder, self).__init__(*args, **kwargs)
        self.kwargs = dict(kwargs)
        del self.kwargs['indent']
        self._replacement_map = {}

    def default(self, o):
        if isinstance(o, NoIndent):
            key = uuid.uuid4().hex
            self._replacement_map[key] = json.dumps(o.value, **self.kwargs)
            return "@@%s@@" % (key,)
        else:
            return super(NoIndentEncoder, self).default(o)

    def encode(self, o):
        result = super(NoIndentEncoder, self).encode(o)
        for k, v in self._replacement_map.iteritems():
            result = result.replace('"@@%s@@"' % (k,), v)
        return result

Then this

obj = {
  "layer1": {
    "layer2": {
      "layer3_2": "string", 
      "layer3_1": NoIndent([{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}])
    }
  }
}
print json.dumps(obj, indent=2, cls=NoIndentEncoder)

produces the follwing output:

{
  "layer1": {
    "layer2": {
      "layer3_2": "string", 
      "layer3_1": [{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}]
    }
  }
}

It also correctly passes all options (except indent) e.g. sort_keys=True down to the nested json.dumps call.

obj = {
    "layer1": {
        "layer2": {
            "layer3_1": NoIndent([{"y": 7, "x": 1, }, {"y": 4, "x": 0}, {"y": 3, "x": 5, }, {"y": 9, "x": 6}]),
            "layer3_2": "string",
        }
    }
}    
print json.dumps(obj, indent=2, sort_keys=True, cls=NoIndentEncoder)

correctly outputs:

{
  "layer1": {
    "layer2": {
      "layer3_1": [{"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9}], 
      "layer3_2": "string"
    }
  }
}

It can also be combined with e.g. collections.OrderedDict:

obj = {
    "layer1": {
        "layer2": {
            "layer3_2": "string",
            "layer3_3": NoIndent(OrderedDict([("b", 1), ("a", 2)]))
        }
    }
}
print json.dumps(obj, indent=2, cls=NoIndentEncoder)

outputs:

{
  "layer1": {
    "layer2": {
      "layer3_3": {"b": 1, "a": 2}, 
      "layer3_2": "string"
    }
  }
}

UPDATE: In Python 3, there is no iteritems. You can replace encode with this:

def encode(self, o):
    result = super(NoIndentEncoder, self).encode(o)
    for k, v in iter(self._replacement_map.items()):
        result = result.replace('"@@%s@@"' % (k,), v)
    return result

edited Jul 30, 2020 at 23:56

Ehsan

12.5k2 gold badges24 silver badges36 bronze badges

answered Sep 19, 2014 at 13:38

Erik Kaplun

38.5k15 gold badges102 silver badges113 bronze badges

5 Comments

AnnieFromTaiwan Over a year ago

For those who don't understand how this solution works: The two lines for k, v in self._replacement_map.iteritems(): result = result.replace('"@@%s@@"' % (k,), v) inside encode(), is to replace "layer3_1": "@@d4e06719f9cb420a82ace98becab5ff8@@" to "layer3_1": [{"y": 7, "x": 1}, {"y": 4, "x": 0}, {"y": 3, "x": 5}, {"y": 9, "x": 6}]. I think this solution in some sense equals to @M Somerville's re substitution solution.

letmaik Over a year ago

This works in Python 3 as well. The only caveat is that you must use json.dumps, not json.dump! In the latter case you would have to override iterencode() as well and I couldn't get that working.

Ed_ Jan 25 at 12:59

This answer requires polluting source object with NoIndent classes, which is a bad approach. It doesn't work with arbitrary data structures. It's also not dynamic: encode doesn't adjust behavior depending on length of result (keep short items on one line, break up longer items). Nice attempt but not a general solution.

Erik Kaplun Jan 29 at 8:12

@Ed_ you're right, but I'd argue my answer is useful nevertheless in many cases, and will pave way for more general solutions. If you post a more general solution, I'll surely upvote it. A more general solution would most likely use some pattern/path-matching based side-annotation to direct the processing of the JSON without "polluting" the JSON itself, so that even externally sourced JSON data could be custom-formatted.

Ed_ Jan 30 at 13:42

@ErikKaplun Hi Erik, I understand your perspective. From my perspective, does it meet OP's narrow stated goal? yes. is it good practice? no. I don't see how adding unnecessary classes ever leads to a good general solution. SO promotes good solutions, not every possible approach. If you want to vote, there are already several better solutions on this page that don't alter source data, which is separate and independent from the JSON representation. Think about expressing that distinction more clearly. If you have a case where your approach is better, post a new question and answer it.

SZIEBERTH Ádám · Accepted Answer · 2016-10-31 04:19:58Z

10

This yields the OP's expected result:

import json

class MyJSONEncoder(json.JSONEncoder):

  def iterencode(self, o, _one_shot=False):
    list_lvl = 0
    for s in super(MyJSONEncoder, self).iterencode(o, _one_shot=_one_shot):
      if s.startswith('['):
        list_lvl += 1
        s = s.replace('\n', '').rstrip()
      elif 0 < list_lvl:
        s = s.replace('\n', '').rstrip()
        if s and s[-1] == ',':
          s = s[:-1] + self.item_separator
        elif s and s[-1] == ':':
          s = s[:-1] + self.key_separator
      if s.endswith(']'):
        list_lvl -= 1
      yield s

o = {
  "layer1":{
    "layer2":{
      "layer3_1":[{"y":7,"x":1},{"y":4,"x":0},{"y":3,"x":5},{"y":9,"x":6}],
      "layer3_2":"string",
      "layer3_3":["aaa\nbbb","ccc\nddd",{"aaa\nbbb":"ccc\nddd"}],
      "layer3_4":"aaa\nbbb",
    }
  }
}

jsonstr = json.dumps(o, indent=2, separators=(',', ':'), sort_keys=True,
    cls=MyJSONEncoder)
print(jsonstr)
o2 = json.loads(jsonstr)
print('identical objects: {}'.format((o == o2)))

edited Oct 31, 2016 at 4:19

answered Sep 27, 2016 at 16:55

SZIEBERTH Ádám

4,2342 gold badges26 silver badges34 bronze badges

1 Comment

Ed_ Jan 25 at 13:07

This approach is better than accepted answers, doesn't require changing source data. Limitation is it always puts lists on single line, regardless of data. Could be improved with dynamic size detection to condense short elements of any data type. However that would be difficult with iterencode alone, needs look-ahead ability. Maybe more extensive overriding of JSONEncoder would work. +1 for better solution though.

user19334243 · Accepted Answer · 2022-07-13 07:22:52Z

7

Answer for me and Python 3 users

import re

def jsonIndentLimit(jsonString, indent, limit):
    regexPattern = re.compile(f'\n({indent}){{{limit}}}(({indent})+|(?=(}}|])))')
    return regexPattern.sub('', jsonString)

if __name__ == '__main__':
    jsonString = '''{
  "layer1": {
    "layer2": {
      "layer3_1": [
        {
          "x": 1,
          "y": 7
        },
        {
          "x": 0,
          "y": 4
        },
        {
          "x": 5,
          "y": 3
        },
        {
          "x": 6,
          "y": 9
        }
      ],
      "layer3_2": "string"
    }
  }
}'''
    print(jsonIndentLimit(jsonString, '  ', 3))

'''print
{
  "layer1": {
    "layer2": {
      "layer3_1": [{"x": 1,"y": 7},{"x": 0,"y": 4},{"x": 5,"y": 3},{"x": 6,"y": 9}],
      "layer3_2": "string"
    }
  }
}'''

edited Jul 13, 2022 at 7:22

answered Jun 14, 2022 at 4:11

user19334243

911 silver badge4 bronze badges

3 Comments

Jordan He Over a year ago

This could be the accepted answer. To pretty-print a dictionary, combine it with json.dumps and it looks like this: jsonString = json.dumps(thedict, indent=4); print(jsonIndentLimit(jsonString, ' ', 3))

1000ml Over a year ago

Nice. To add spaces after commas and around brackets, replace the matches with a space: regexPattern.sub(" ", jsonString)

Ed_ Jan 25 at 13:13

Nice approach, more general than accepted answers. Lacks dynamic size detection: solution stops indenting after a fixed level regardless of data length. Could be added though with more complex processing.

jfs · Accepted Answer · 2012-11-06 12:23:19Z

2

You could try:

mark lists that shouldn't be indented by replacing them with NoIndentList:
```
class NoIndentList(list):
    pass
```
override json.Encoder.default method to produce a non-indented string representation for NoIndentList.

You could just cast it back to list and call json.dumps() without indent to get a single line

It seems the above approach doesn't work for the json module:

import json
import sys

class NoIndent(object):
    def __init__(self, value):
        self.value = value

def default(o, encoder=json.JSONEncoder()):
    if isinstance(o, NoIndent):
        return json.dumps(o.value)
    return encoder.default(o)

L = [dict(x=x, y=y) for x in range(1) for y in range(2)]
obj = [NoIndent(L), L]
json.dump(obj, sys.stdout, default=default, indent=4)

It produces invalid output (the list is serialized as a string):

[
    "[{\"y\": 0, \"x\": 0}, {\"y\": 1, \"x\": 0}]", 
    [
        {
            "y": 0, 
            "x": 0
        }, 
        {
            "y": 1, 
            "x": 0
        }
    ]
]

If you can use yaml then the method works:

import sys
import yaml

class NoIndentList(list):
    pass

def noindent_list_presenter(dumper, data):
    return dumper.represent_sequence(u'tag:yaml.org,2002:seq', data,
                                     flow_style=True)
yaml.add_representer(NoIndentList, noindent_list_presenter)


obj = [
    [dict(x=x, y=y) for x in range(2) for y in range(1)],
    [dict(x=x, y=y) for x in range(1) for y in range(2)],
    ]
obj[0] = NoIndentList(obj[0])
yaml.dump(obj, stream=sys.stdout, indent=4)

It produces:

- [{x: 0, y: 0}, {x: 1, y: 0}]
-   - {x: 0, y: 0}
    - {x: 0, y: 1}

i.e., the first list is serialized using [] and all items are on one line, the second list uses one line per item.

edited Nov 6, 2012 at 12:23

answered Nov 6, 2012 at 11:21

jfs

417k210 gold badges1k silver badges1.7k bronze badges

7 Comments

Rohaq Over a year ago

I think I get half of what you're saying, though I am a little confused. Probably down to me not having to override methods in Python before though. I'll do a bit more reading, but if you could provide a more complete example, it would be appreciated!

Ed_ Jan 25 at 13:26

Bad solution, requires changes to source data. Not a good idea.

jfs Jan 25 at 16:03

@Ed_ I agree, could you provide a link to a non-source-data-modifying solution for comparison?

Ed_ Jan 25 at 17:53

@jfs There's a few mentioned on this page. Check other answers. I'll post more soon.

jfs Jan 25 at 19:31

@Ed_ I'm just interested on your justification of "bad solution" Is it just talk you do you have specific code that is bettet

|

robm · Accepted Answer · 2016-02-19 20:09:51Z

Here's a post-processing solution if you have too many different types of objects contributing to the JSON to attempt the JSONEncoder method and too many varying types to use a regex. This function collapses whitespace after a specified level, without needing to know the specifics of the data itself.

def collapse_json(text, indent=12):
    """Compacts a string of json data by collapsing whitespace after the
    specified indent level

    NOTE: will not produce correct results when indent level is not a multiple
    of the json indent level
    """
    initial = " " * indent
    out = []  # final json output
    sublevel = []  # accumulation list for sublevel entries
    pending = None  # holder for consecutive entries at exact indent level
    for line in text.splitlines():
        if line.startswith(initial):
            if line[indent] == " ":
                # found a line indented further than the indent level, so add
                # it to the sublevel list
                if pending:
                    # the first item in the sublevel will be the pending item
                    # that was the previous line in the json
                    sublevel.append(pending)
                    pending = None
                item = line.strip()
                sublevel.append(item)
                if item.endswith(","):
                    sublevel.append(" ")
            elif sublevel:
                # found a line at the exact indent level *and* we have sublevel
                # items. This means the sublevel items have come to an end
                sublevel.append(line.strip())
                out.append("".join(sublevel))
                sublevel = []
            else:
                # found a line at the exact indent level but no items indented
                # further, so possibly start a new sub-level
                if pending:
                    # if there is already a pending item, it means that
                    # consecutive entries in the json had the exact same
                    # indentation and that last pending item was not the start
                    # of a new sublevel.
                    out.append(pending)
                pending = line.rstrip()
        else:
            if pending:
                # it's possible that an item will be pending but not added to
                # the output yet, so make sure it's not forgotten.
                out.append(pending)
                pending = None
            if sublevel:
                out.append("".join(sublevel))
            out.append(line)
    return "\n".join(out)

For example, using this structure as input to json.dumps with an indent level of 4:

text = json.dumps({"zero": ["first", {"second": 2, "third": 3, "fourth": 4, "items": [[1,2,3,4], [5,6,7,8], 9, 10, [11, [12, [13, [14, 15]]]]]}]}, indent=4)

here's the output of the function at various indent levels:

>>> print collapse_json(text, indent=0)
{"zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}]}
>>> print collapse_json(text, indent=4)
{
    "zero": ["first", {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}]
}
>>> print collapse_json(text, indent=8)
{
    "zero": [
        "first",
        {"items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]], "second": 2, "fourth": 4, "third": 3}
    ]
}
>>> print collapse_json(text, indent=12)
{
    "zero": [
        "first", 
        {
            "items": [[1, 2, 3, 4], [5, 6, 7, 8], 9, 10, [11, [12, [13, [14, 15]]]]],
            "second": 2,
            "fourth": 4,
            "third": 3
        }
    ]
}
>>> print collapse_json(text, indent=16)
{
    "zero": [
        "first", 
        {
            "items": [
                [1, 2, 3, 4],
                [5, 6, 7, 8],
                9,
                10,
                [11, [12, [13, [14, 15]]]]
            ], 
            "second": 2, 
            "fourth": 4, 
            "third": 3
        }
    ]
}

Only works with space indents not tabs. json.dumps allows arbitrary indent chars.

TRUC Vu · Accepted Answer · 2021-07-18 08:10:42Z

Best performance code (10MB text costs 1s):

import json
def dumps_json(data, indent=2, depth=2):
    assert depth > 0
    space = ' '*indent
    s = json.dumps(data, indent=indent)
    lines = s.splitlines()
    N = len(lines)
    # determine which lines to be shortened
    is_over_depth_line = lambda i: i in range(N) and lines[i].startswith(space*(depth+1))
    is_open_bracket_line = lambda i: not is_over_depth_line(i) and is_over_depth_line(i+1)
    is_close_bracket_line = lambda i: not is_over_depth_line(i) and is_over_depth_line(i-1)
    # 
    def shorten_line(line_index):
        if not is_open_bracket_line(line_index):
            return lines[line_index]
        # shorten over-depth lines
        start = line_index
        end = start
        while not is_close_bracket_line(end):
            end += 1
        has_trailing_comma = lines[end][-1] == ','
        _lines = [lines[start][-1], *lines[start+1:end], lines[end].replace(',','')]
        d = json.dumps(json.loads(' '.join(_lines)))
        return lines[line_index][:-1] + d + (',' if has_trailing_comma else '')
    # 
    s = '\n'.join([
        shorten_line(i)
        for i in range(N) if not is_over_depth_line(i) and not is_close_bracket_line(i)
    ])
    #
    return s

UPDATE: Here's my explanation:

First we use json.dumps to get json string has been indented. Example:

>>>  print(json.dumps({'0':{'1a':{'2a':None,'2b':None},'1b':{'2':None}}}, indent=2))
[0]  {
[1]    "0": {
[2]      "1a": {
[3]        "2a": null,
[4]        "2b": null
[5]      },
[6]      "1b": {
[7]        "2": null
[8]      }
[9]    }
[10] }

If we set indent=2 and depth = 2, then too depth lines start with 6 white-spaces

We has 4 types of line:

Normal line
Open bracket line (2,6)
Exceed depth line (3,4,7)
Close bracket line (5,8)

We will try to merge a sequence of lines (type 2 + 3 + 4) into one single line. Example:

[2]      "1a": {
[3]        "2a": null,
[4]        "2b": null
[5]      },

will be merged into:

[2]      "1a": {"2a": null, "2b": null},

NOTE: Close bracket line may has trailing comma

But, they did not asked about speed and performance! Please explain more.
I have to statistics a huge data metrics. So that I just focus on performance and accuracy.

Thell · Accepted Answer · 2024-10-15 18:47:26Z

1

I know this question is fairly old both in terms of time and in terms of python versions but while searching on a similar issue I came across compact-json

which simply just works...

> compact-json -l 80 sample.txt
{
    "layer1": {
        "layer2": {
            "layer3_1": [ {"x": 1, "y": 7}, {"x": 0, "y": 4}, {"x": 5, "y": 3}, {"x": 6, "y": 9} ],
            "layer3_2": "string"
        }
    }
}

and works just as easily in a script.

import json
from compact_json import Formatter


str = """
{
  "layer1": {
    "layer2": {
      "layer3_1": [
        {
          "x": 1,
          "y": 7
        },
        {
          "x": 0,
          "y": 4
        },
        {
          "x": 5,
          "y": 3
        },
        {
          "x": 6,
          "y": 9
        }
      ],
      "layer3_2": "string"
    }
  }
}"""

json_str = json.loads(str)
print(Formatter().serialize(json_str)) # same result as above

answered Oct 15, 2024 at 18:47

Thell

5,95834 silver badges55 bronze badges

1 Comment

Ed_ Jan 25 at 13:33

Finally, a solution that understands the general problem and doesn't modify source data. This should be accepted answer.

Polv · Accepted Answer · 2018-07-13 15:57:35Z

0

Indeed, one of things YAML is better than JSON.

I can't get NoIndentEncoder to work..., but I can use regex on JSON string...

def collapse_json(text, list_length=5):
    for length in range(list_length):
        re_pattern = r'\[' + (r'\s*(.+)\s*,' * length)[:-1] + r'\]'
        re_repl = r'[' + ''.join(r'\{}, '.format(i+1) for i in range(length))[:-2] + r']'

        text = re.sub(re_pattern, re_repl, text)

    return text

The question is, how do I perform this on a nested list?

Before:

[
  0,
  "any",
  [
    2,
    3
  ]
]

After:

[0, "any", [2, 3]]

edited Jul 13, 2018 at 15:57

answered Jul 13, 2018 at 15:41

Polv

2,2461 gold badge25 silver badges36 bronze badges

Comments

AgentM · Accepted Answer · 2023-06-19 07:19:29Z

An alternate method if you would like to specifically indent arrays differently, could look something like this:

import json

# Should be unique and never appear in the input
REPLACE_MARK = "#$ONE_LINE_ARRAY_{0}$#"

example_json = {
    "test_int": 3,
    "test_str": "Test",
    "test_arr": [ "An", "Array" ],
    "test_obj": {
        "nested_str": "string",
        "nested_arr": [{"id": 1},{"id": 2}]
    }
}

# Replace all arrays with the indexed markers.
a = example_json["test_arr"]
b = example_json["test_obj"]["nested_arr"]
example_json["test_arr"] = REPLACE_MARK.format("a")
example_json["test_obj"]["nested_arr"] = REPLACE_MARK.format("b")

# Generate the JSON without any arrays using your pretty print.
json_data = json.dumps(example_json, indent=4)

# Generate the JSON arrays without pretty print.
json_data_a = json.dumps(a)
json_data_b = json.dumps(b)

# Insert the flat JSON strings into the parent at the indexed marks.
json_data = json_data.replace(f"\"{REPLACE_MARK.format('a')}\"", json_data_a)
json_data = json_data.replace(f"\"{REPLACE_MARK.format('b')}\"", json_data_b)

print(json_data)

You could generalize this into a function that would walk through each element of your JSON object scanning for arrays and performing the replacements dynamically.

Pros:

Simple and expandable
No use of Regex
No custom JSON Encoder

Cons:

Take care that user input never contains the replacement placeholders.
Might not be performant on JSON structures containing lots of arrays.
- Could be optimized using this method: String Replacement with Array of Strings

Motivation for this solution was a fixed-format generation of animation frames, where each element of the array was an integer index. This solution worked well for me and was easy to adjust.

Here is the more generic and optimized version:

import json
import copy

REPLACE_MARK = "#$ONE_LINE_ARRAY_$#"

def dump_arrays_single_line(json_data):
    # Deep copy prevent modifying original data.
    json_data = copy.deepcopy(json_data)

    # Walk the dictionary, putting every JSON array into arr.
    def walk(node, arr):
        for key, item in node.items():
            if type(item) is dict:
                walk(item, arr)
            elif type(item) is list:
                arr.append(item)
                node[key] = REPLACE_MARK
            else:
                pass

    arr = []
    walk(json_data, arr)

    # Pretty format but keep arrays on single line.
    # Need to escape '{' and '}' to use 'str.format()'
    json_data = json.dumps(json_data, indent=4).replace('{', '{{').replace('}', '}}').replace(f'"{REPLACE_MARK}"', "{}", len(arr)).format(*arr)

    return json_data
                

example_json = {
    "test_int": 3,
    "test_str": "Test",
    "test_arr": [ "An", "Array" ],
    "test_obj": {
        "nested_str": "string",
        "nested_arr": [{"id": 1},{"id": 2}]
    }
}

print(dump_arrays_single_line(example_json))

Ed_ · Accepted Answer · 2025-01-25 18:48:35Z

I find the other answers on this page lacking. They either require changing data source object, eg adding NoIndent wrappers, or use a static wrapping strategy (eg don't wrap all lists, don't wrap certain keys). @Thell has the best general solution, which wraps each field dynamically on output length. Unfortunately it has terrible performance.

compact_json is great functionally, solves the general problem exactly how it should. Lines are wrapped or combined according to line length and many other configurable criteria, such as object complexity (think nested dict / list levels). This is the right approach. But it's performance is terrible.

Performance

The main issue is that compact_json is s-l-o-w. We're talking 50 times slower than stdlib json encoding. Here's a test with a 4 MB json file:

# stdlib json
> python3 -m timeit -s 'import json ; import compact_json ; data = json.load (open ("test.json", 'r')) ; fmt = compact_json.Formatter ()' -c 'json.dumps (data)'
2 loops, best of 5: 164 msec per loop

# compact_json
> python3 -m timeit  -s 'import json ; import compact_json ; data = json.load (open ("test.json", 'r')) ; fmt = compact_json.Formatter ()' -c 'fmt.serialize (data)'
1 loop, best of 5: 7.85 sec per loop

compact_json takes 8 seconds to dump 4 MB, vs 165 ms for stdlib json. If your data is more than toy size, take a nap - it'll be awhile. For apps with large data, compact_json won't work.

Solution

I found a a more performant solution with CompactJSONEncoder. Many fewer features then compact_json but solves the wrapping problem and is significantly faster.

Usage is simple. Just pass as cls param to stdlib : json.dumps (data, cls = CompactJSONEncoder). Here's the same 4 MB test:

# CompactJSONEncoder
> python3 -m timeit  -s 'import json ; data = json.load (open ("test.json", 'r'))' -c 'json.dumps (data, cls = CompactJSONEncoder)'
1 loop, best of 5: 1.58 sec per loop

Only 10x slower than stdlib json. That's naive implementation, can probably reduce that to 5x or less with optimizations. And no external libs needed: just one short class and stdlib json.

Code

Here's the CompactJSONEncoder class linked above, with slight modification. The stock version only flattens lists / dicts when they aren't nested. It gives good results. But OP wants entire layer3_1 entry on single line. For that, just remove the _primitives_only test as I did below, and any object up to MAX_WIDTH chars will be flattened.

class CompactJSONEncoder  (json.JSONEncoder)  :
    '''A JSON Encoder that puts small containers on single lines.'''

    CONTAINER_TYPES =  (list, tuple, dict)
    '''Container datatypes include primitives or other containers.'''

    MAX_WIDTH = 70
    '''Maximum width of a container that might be put on a single line.'''

    MAX_ITEMS = 12
    '''Maximum number of items in container that might be put on single line.'''

    def __init__ (me, *args, **kwargs) :
        super ().__init__ (*args, **kwargs)
        me.indentation_level = 0

    def encode (me, o) :
        '''Encode JSON object *o* with respect to single line lists.'''
        if isinstance (o,  (list, tuple)) :
            return me._encode_list (o)
        if isinstance (o, dict) :
            return me._encode_object (o)
        if isinstance (o, float) :  # Use scientific notation for floats
            return format (o, 'g')
        return json.dumps (
            o,
            skipkeys       = me.skipkeys,
            ensure_ascii   = me.ensure_ascii,
            check_circular = me.check_circular,
            allow_nan      = me.allow_nan,
            sort_keys      = me.sort_keys,
            indent         = me.indent,
            separators     = (me.item_separator, me.key_separator),
            default        = me.default if hasattr (me, 'default') else None,
        )

    def _encode_list (me, o) :
        if me._put_on_single_line (o) :
            return '[' + ', '.join (me.encode (el) for el in o) + ']'
        me.indentation_level += 1
        output = [me.indent_str + me.encode (el) for el in o]
        me.indentation_level -= 1
        return '[\n' + ',\n'.join (output) + '\n' + me.indent_str + ']'

    def _encode_object (me, o) :
        if not o :
            return '{}'

        # ensure keys are converted to strings
        o = {str (k) if k is not None else 'null' : v for k, v in o.items ()}

        if me.sort_keys :
            o = dict (sorted (o.items (), key=lambda x : x[0]))

        if me._put_on_single_line (o) :
            return  ('{ ' + 
                ', '.join (f'{json.dumps (k)} : {me.encode (el)}' for k, el in o.items ())
                + ' }'
            )

        me.indentation_level += 1
        output = [
            f'{me.indent_str}{json.dumps (k)} : {me.encode (v)}' for k, v in o.items ()
        ]
        me.indentation_level -= 1

        return '{\n' + ',\n'.join (output) + '\n' + me.indent_str + '}'

    def iterencode (me, o, **kwargs) :
        '''Required to also work with `json.dump`.'''
        return me.encode (o)

    def _put_on_single_line (me, o) :
        return  (
            #me._primitives_only (o) and  ## changed for OP's requirements
            len (o) <= me.MAX_ITEMS
            and len (str (o)) - 2 <= me.MAX_WIDTH
        )

    #def _primitives_only (me, o : list | tuple | dict) :   # remove useless type annotations
    def _primitives_only (me, o) :
        if isinstance (o,  (list, tuple)) :
            return not any (isinstance (el, me.CONTAINER_TYPES) for el in o)
        elif isinstance (o, dict) :
            return not any (isinstance (el, me.CONTAINER_TYPES) for el in o.values ())

    @property
    def indent_str (me) -> str :
        if isinstance (me.indent, int) :
            return ' ' *  (me.indentation_level * me.indent)
        elif isinstance (me.indent, str) :
            return me.indentation_level * me.indent
        else :
            raise ValueError (
                f'indent must either be of type int or str  (is : {type (me.indent)})'
            )

Nice Zombies · Accepted Answer · 2025-08-28 18:09:51Z

0

With jsonyx 2.0 you can set a maximum indent level:

>>> jsonyx.dump(_, indent=2, max_indent_level=3, separators=(",", ": "))
{
  "layer1": {
    "layer2": {
      "layer3_1": [{"x": 1,"y": 7},{"x": 0,"y": 4},{"x": 5,"y": 3},{"x": 6,"y": 9}],
      "layer3_2": "string"
    }
  }
}

answered Aug 28 at 18:09

Nice Zombies

1,2051 gold badge7 silver badges24 bronze badges

Comments

Bula · Accepted Answer · 2012-11-06 12:11:30Z

-1

This solution is not so elegant and generic as the others and you will not learn much from it but it's quick and simple.

def custom_print(data_structure, indent):
    for key, value in data_structure.items():
        print "\n%s%s:" % (' '*indent,str(key)),
        if isinstance(value, dict):
            custom_print(value, indent+1)
        else:
            print "%s" % (str(value)),

Usage and output:

>>> custom_print(data_structure,1)

 layer1:
  layer2:
   layer3_2: string
   layer3_1: [{'y': 7, 'x': 1}, {'y': 4, 'x': 0}, {'y': 3, 'x': 5}, {'y': 9, 'x': 6}]

answered Nov 6, 2012 at 12:11

Bula

1,5961 gold badge14 silver badges34 bronze badges

Comments

kashiraja · Accepted Answer · 2016-08-31 23:43:41Z

-1

As a side note, this website has a built-in JavaScript that will avoid line feeds in JSON strings when lines are shorter than 70 chars:

http://www.csvjson.com/json_beautifier

(was implemented using a modified version of JSON-js)

Select "Inline short arrays"

Great for quickly viewing data that you have in the copy buffer.

answered Aug 31, 2016 at 23:43

kashiraja

75812 silver badges25 bronze badges

1 Comment

Polv Over a year ago

The question is, how do I implement "Inline short arrays" in Python?

James · Accepted Answer · 2025-01-27 12:58:33Z

This is a rather old question, but the following is a solution that indents the JSON up to a maximum nesting depth. If the object nesting is deeper than indent_max_depth, the output JSON is flat.

The code is a modification of the cpython/Lib/json/encoder.py file. Sorry, but it is a bit long.

import json
from json.encoder import encode_basestring, encode_basestring_ascii, INFINITY


class JSONMaxDepthEncoder(json.JSONEncoder):
    def __init__(
        self,
        *,
        skipkeys: bool=False,
        ensure_ascii: bool=True,
        check_circular: bool=True,
        allow_nan: bool=True,
        sort_keys: bool=False,
        indent: int|str=None,
        separators: tuple[str,str]=None,
        default: callable=None,
        indent_max_depth: int=3
        ) -> None:
        """
        JSON encoder that indents upto indent_max_depth.
        """
        super().__init__(
            skipkeys=skipkeys,
            ensure_ascii=ensure_ascii,
            check_circular=check_circular,
            allow_nan=allow_nan,
            sort_keys=sort_keys,
            indent=indent,
            separators=separators,
            default=default,
        )
        self.indent_max_depth = indent_max_depth
        self._level = 0

    def iterencode(self, o, _one_shot=False):
        """Encode the given object and yield each string
        representation as available.

        For example::

            for chunk in JSONEncoder().iterencode(bigobject):
                mysocket.write(chunk)

        """
        if self.check_circular:
            markers = {}
        else:
            markers = None
        if self.ensure_ascii:
            _encoder = encode_basestring_ascii
        else:
            _encoder = encode_basestring

        def floatstr(o, allow_nan=self.allow_nan,
                _repr=float.__repr__, _inf=INFINITY, _neginf=-INFINITY):
            # Check for specials.  Note that this type of test is processor
            # and/or platform-specific, so do tests which don't depend on the
            # internals.

            if o != o:
                text = 'NaN'
            elif o == _inf:
                text = 'Infinity'
            elif o == _neginf:
                text = '-Infinity'
            else:
                return _repr(o)

            if not allow_nan:
                raise ValueError(
                    "Out of range float values are not JSON compliant: " +
                    repr(o))

            return text

        _iterencode = _make_iterencode(
            markers, self.default, _encoder, self.indent, floatstr,
            self.key_separator, self.item_separator, self.sort_keys,
            self.skipkeys, _one_shot, self.indent_max_depth)
        return _iterencode(o, 0)
    

def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
        _key_separator, _item_separator, _sort_keys, _skipkeys, _one_shot,
        indent_max_depth,
        ## HACK: hand-optimized bytecode; turn globals into locals
        ValueError=ValueError,
        dict=dict,
        float=float,
        id=id,
        int=int,
        isinstance=isinstance,
        list=list,
        str=str,
        tuple=tuple,
        _intstr=int.__repr__,
    ):

    if _indent is not None and not isinstance(_indent, str):
        _indent = ' ' * _indent

    def _iterencode_list(lst, current_indent_level, indent_max_depth):
        if not lst:
            yield '[]'
            return
        if markers is not None:
            markerid = id(lst)
            if markerid in markers:
                raise ValueError("Circular reference detected")
            markers[markerid] = lst
        buf = '['
        if _indent is not None:
            current_indent_level += 1
            newline_indent = (
                '\n' + _indent * current_indent_level
                if current_indent_level <= indent_max_depth
                else ''
            )
            separator = _item_separator + newline_indent
            buf += newline_indent
        else:
            newline_indent = None
            separator = _item_separator
        first = True
        for value in lst:
            if first:
                first = False
            else:
                buf = separator
            if isinstance(value, str):
                yield buf + _encoder(value)
            elif value is None:
                yield buf + 'null'
            elif value is True:
                yield buf + 'true'
            elif value is False:
                yield buf + 'false'
            elif isinstance(value, int):
                yield buf + _intstr(value)
            elif isinstance(value, float):
                yield buf + _floatstr(value)
            else:
                yield buf
                if isinstance(value, (list, tuple)):
                    chunks = _iterencode_list(value, current_indent_level, indent_max_depth)
                elif isinstance(value, dict):
                    chunks = _iterencode_dict(value, current_indent_level, indent_max_depth)
                else:
                    chunks = _iterencode(value, current_indent_level, indent_max_depth)
                yield from chunks
        if newline_indent is not None:
            current_indent_level -= 1
            if current_indent_level < indent_max_depth:
                yield '\n' + _indent * current_indent_level
        yield ']'
        if markers is not None:
            del markers[markerid]

    def _iterencode_dict(dct, current_indent_level, indent_max_depth):
        if not dct:
            yield '{}'
            return
        if markers is not None:
            markerid = id(dct)
            if markerid in markers:
                raise ValueError("Circular reference detected")
            markers[markerid] = dct
        yield '{'
        if _indent is not None:
            current_indent_level += 1
            newline_indent = (
                '\n' + _indent * current_indent_level
                if current_indent_level <= indent_max_depth
                else ''
            )
            item_separator = _item_separator + newline_indent
            yield newline_indent
        else:
            newline_indent = None
            item_separator = _item_separator
        first = True
        if _sort_keys:
            items = sorted(dct.items())
        else:
            items = dct.items()
        for key, value in items:
            if isinstance(key, str):
                pass
            elif isinstance(key, float):
                key = _floatstr(key)
            elif key is True:
                key = 'true'
            elif key is False:
                key = 'false'
            elif key is None:
                key = 'null'
            elif isinstance(key, int):
                key = _intstr(key)
            elif _skipkeys:
                continue
            else:
                raise TypeError(f'keys must be str, int, float, bool or None, '
                                f'not {key.__class__.__name__}')
            if first:
                first = False
            else:
                yield item_separator
            yield _encoder(key)
            yield _key_separator
            if isinstance(value, str):
                yield _encoder(value)
            elif value is None:
                yield 'null'
            elif value is True:
                yield 'true'
            elif value is False:
                yield 'false'
            elif isinstance(value, int):
                yield _intstr(value)
            elif isinstance(value, float):
                yield _floatstr(value)
            else:
                if isinstance(value, (list, tuple)):
                    chunks = _iterencode_list(value, current_indent_level, indent_max_depth)
                elif isinstance(value, dict):
                    chunks = _iterencode_dict(value, current_indent_level, indent_max_depth)
                else:
                    chunks = _iterencode(value, current_indent_level, indent_max_depth)
                yield from chunks
        if newline_indent is not None:
            current_indent_level -= 1
            if current_indent_level < indent_max_depth:
                yield '\n' + _indent * current_indent_level
        yield '}'
        if markers is not None:
            del markers[markerid]

    def _iterencode(o, current_indent_level, indent_max_depth=indent_max_depth):
        if isinstance(o, str):
            yield _encoder(o)
        elif o is None:
            yield 'null'
        elif o is True:
            yield 'true'
        elif o is False:
            yield 'false'
        elif isinstance(o, int):
            yield _intstr(o)
        elif isinstance(o, float):
            yield _floatstr(o)
        elif isinstance(o, (list, tuple)):
            yield from _iterencode_list(o, current_indent_level, indent_max_depth)
        elif isinstance(o, dict):
            yield from _iterencode_dict(o, current_indent_level, indent_max_depth)
        else:
            if markers is not None:
                markerid = id(o)
                if markerid in markers:
                    raise ValueError("Circular reference detected")
                markers[markerid] = o
            o = _default(o)
            yield from _iterencode(o, current_indent_level, indent_max_depth)
            if markers is not None:
                del markers[markerid]
    return _iterencode

The usage is as follows:

data = {
    'layer1': {
        'layer2': {
            'layer3_1': [
                {'x': 1, 'y': 7},
                {'x': 0, 'y': 4},
                {'x': 5, 'y': 3},
                {'x': 6, 'y': 9}
            ],
            'layer3_2': 'string'
        }
    }
}

encoder = JSONMaxDepthEncoder(indent=2, indent_max_depth=3)
print(encoder.encode(data))
# prints:
{
  "layer1": {
    "layer2": {
      "layer3_1": [{"x": 1,"y": 7},{"x": 0,"y": 4},{"x": 5,"y": 3},{"x": 6,"y": 9}],
      "layer3_2": "string"
    }
  }
}

To write directly to file:

with open('data.json', 'w') as fp:
    for chunk in encoder.iterencode(data):
        fp.write(chunk)

indent is not an int, it can also be a string such as tab.

Collectives™ on Stack Overflow

How to implement custom indentation when pretty-printing with the JSON module?

16 Answers 16

16 Comments

3 Comments

5 Comments

1 Comment

3 Comments

7 Comments

1 Comment

2 Comments

1 Comment

Comments

1 Comment

Performance

Solution

Code

Comments

Comments

Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

16 Comments

3 Comments

5 Comments

1 Comment

3 Comments

7 Comments

1 Comment

2 Comments

1 Comment

Comments

1 Comment

Performance

Solution

Code

Comments

Comments

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related