How to break a JSON file into a smaller json wrapped in an array using jq?

Question

[{"foo": 1},
 {"foo": 2},
 {"foo": 3},
 {"foo": 4},
 {"foo": 5},
 {"foo": 6},
 {"foo": 7},
 {"foo": 8},
 {"foo": 9},
 {"foo": 10},
 {"foo": 11},
 {"foo": 12},
 {"foo": 13},
 {"foo": 14},
 {"foo": 15}
]

I want to break this array into smaller array files using jq.

So far I have tried this

 cat foo.json | jq -c -M -s '.[]' | split -l 5 - charded/

This results in 3 separate files but does not wrap the dictionaries into an array.

peak · Accepted Answer · 2017-11-30 04:06:46Z

4

jq IO is rather primitive, so I'd suggest starting with:

def chunks(n):
  def c: .[0:n], (if length > n then .[n:]|c else empty end);
  c;

chunks(5)

The key now is to use the -c command-line option:

jq -c -f chunk.jq foo.json

With your data, this will produce a stream of three arrays, one per line.

You can pipe that into split or awk or whatever, to send each line to a separate file, e.g.

awk '{n++; print > "out" n ".json"}'

If you want the arrays to be pretty-printed in each file, you could then use jq on each, perhaps with sponge, along the lines of:

for f in out*.json ; do jq . $f | sponge $f ; done

def-free solution

If you don't want to define a function, or prefer a one-liner for the jq component of the pipeline, consider this:

jq -c --argjson n 5 'recurse(.[$n:]; length > 0) | .[0:$n]' foo.json

Notes

chunks will also work on strings.
chunks defines the 0-arity function, c, to take advantage of jq's support for tail-call optimization.

edited Nov 30, 2017 at 4:06

answered Nov 30, 2017 at 3:44

peak

119k21 gold badges185 silver badges218 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

June2017 Over a year ago

Thank you for the detailed explanation and providing options. This is what I ended up using/doing.

jq -c --argjson n 5 'recurse(.[$n:]; length > 0) | .[0:$n]' foo.json | awk '{n++}{filename=  "out" n ".json"; print > filename}'

peak · Accepted Answer · 2017-11-30 08:43:18Z

1

If data.json is VERY large (e.g., too big to fit comfortably into RAM), and if you have a version of jq that includes the so-called streaming parser, then you could use jq first to split up data.json into its top-level component elements, then regroup them, and finally use awk or split or whatever as described elsewhere on this page.

Invocation

Here first is the pipeline you'd use:

jq -cn --stream 'fromstream(1|truncate_stream(inputs))' data.json |
  jq -cn -f groups.jq

groups.jq

# Use nan as EOS
def groups(stream; n):
  foreach (stream,nan) as $x ([];
    if length < n then  . + [$x] else [$x] end;
    if (.[-1]|isnan) and length > 1 then .[:-1]
    elif length == n then .
    else empty end) ;

groups(inputs; 5)

edited Nov 30, 2017 at 8:43

answered Nov 30, 2017 at 5:04

peak

119k21 gold badges185 silver badges218 bronze badges

1 Comment

June2017 Over a year ago

yes, these JSON files are huge ~20GB. And as expected

jq -c --argjson n 5 'recurse(.[$n:]; length > 0) | .[0:$n]' foo.json | awk '{n++}{filename= "out" n ".json"; print > filename}'

this command terminated in between. I will try what you posted above. I started looking at jq just recently.

Collectives™ on Stack Overflow

How to break a JSON file into a smaller json wrapped in an array using jq?

2 Answers 2

def-free solution

Notes

1 Comment

Invocation

groups.jq

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

def-free solution

Notes

1 Comment

Invocation

groups.jq

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related