3
[{"foo": 1},
 {"foo": 2},
 {"foo": 3},
 {"foo": 4},
 {"foo": 5},
 {"foo": 6},
 {"foo": 7},
 {"foo": 8},
 {"foo": 9},
 {"foo": 10},
 {"foo": 11},
 {"foo": 12},
 {"foo": 13},
 {"foo": 14},
 {"foo": 15}
]

I want to break this array into smaller array files using jq.

So far I have tried this

 cat foo.json | jq -c -M -s '.[]' | split -l 5 - charded/

This results in 3 separate files but does not wrap the dictionaries into an array.

2 Answers 2

4

jq IO is rather primitive, so I'd suggest starting with:

def chunks(n):
  def c: .[0:n], (if length > n then .[n:]|c else empty end);
  c;

chunks(5)

The key now is to use the -c command-line option:

jq -c -f chunk.jq foo.json

With your data, this will produce a stream of three arrays, one per line.

You can pipe that into split or awk or whatever, to send each line to a separate file, e.g.

awk '{n++; print > "out" n ".json"}'

If you want the arrays to be pretty-printed in each file, you could then use jq on each, perhaps with sponge, along the lines of:

for f in out*.json ; do jq . $f | sponge $f ; done

def-free solution

If you don't want to define a function, or prefer a one-liner for the jq component of the pipeline, consider this:

jq -c --argjson n 5 'recurse(.[$n:]; length > 0) | .[0:$n]' foo.json

Notes

  1. chunks will also work on strings.
  2. chunks defines the 0-arity function, c, to take advantage of jq's support for tail-call optimization.
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the detailed explanation and providing options. This is what I ended up using/doing. jq -c --argjson n 5 'recurse(.[$n:]; length > 0) | .[0:$n]' foo.json | awk '{n++}{filename= "out" n ".json"; print > filename}'
1

If data.json is VERY large (e.g., too big to fit comfortably into RAM), and if you have a version of jq that includes the so-called streaming parser, then you could use jq first to split up data.json into its top-level component elements, then regroup them, and finally use awk or split or whatever as described elsewhere on this page.

Invocation

Here first is the pipeline you'd use:

jq -cn --stream 'fromstream(1|truncate_stream(inputs))' data.json |
  jq -cn -f groups.jq

groups.jq

# Use nan as EOS
def groups(stream; n):
  foreach (stream,nan) as $x ([];
    if length < n then  . + [$x] else [$x] end;
    if (.[-1]|isnan) and length > 1 then .[:-1]
    elif length == n then .
    else empty end) ;

groups(inputs; 5)

1 Comment

yes, these JSON files are huge ~20GB. And as expected jq -c --argjson n 5 'recurse(.[$n:]; length > 0) | .[0:$n]' foo.json | awk '{n++}{filename= "out" n ".json"; print > filename}' this command terminated in between. I will try what you posted above. I started looking at jq just recently.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.