0

I have a huge JSON file which contain records like this:

{"callsign":"abc","kruxSegmentIds":{"0":"q2d9nn1qv","1":"rle4kfgsf"},"liveFlag":"Y"}}

I need to replace the keys inside the nested JSON key "kruxSegmentIds" in such a way that 0 becomes "zero" and 1 as "one" like below:

{"callsign":"abc","kruxSegmentIds":{"zero":"q2d9nn1qv","one":"rle4kfgsf"},"liveFlag":"Y"}}

Is this possible using sed? I don't want to write a script as the file size is huge and it may not fit into memory.

Any help/support is greatly appreciated.

4
  • 1
    I'm a little confused by why you don't want to write a script. What does that have to do with the size of the file? Commented Oct 8, 2018 at 3:57
  • 4
    Use the JSON parser jq and not a non-syntax aware parser Commented Oct 8, 2018 at 4:07
  • @DavidZ My JSON file is of size 20 GB so writing a script (in python or any other language) would fail as the input JSON would crash the memory. Commented Oct 8, 2018 at 4:10
  • 1
    @Sains I still don't see how you're getting the idea that the size of the input has anything to do with whether you use a script or not. You understand that writing a script does not mean having to include the entire 20 GB of JSON in the script file itself, right? (Sorry if I sound harsh, I don't mean to be; it's just that I'm really really struggling to understand why you're saying what you did about scripts.) Commented Oct 8, 2018 at 4:16

1 Answer 1

2

From the problem description (and from the fact that the proposed awk solution has been accepted), it seems clear that although the file itself is large, each JSON document is relatively small, or at least small enough to fit in memory. If that is indeed the case, then a straightforward solution using jq would have similar performance characteristics to a sed or awk solution, but without the potential complications. Here therefore is such a solution:

jq '.kruxSegmentIds |= with_entries(.key |= if .=="0" then "zero" elif .=="1" then "one" else . end)'

If jq empty hugefile fails because of the file's size, then jq might still be useful because of its streaming parser, which is designed precisely for such cases.

Variations

In the comments, the OP posted another example, so it might be useful to define a filter for performing the key-to-key transformation:

def twiddle:
  with_entries(.key |= if .=="0" then "zero" elif .=="1" then "one" else . end);

With this, the solution to the original problem is:

 .kruxSegmentIds |= twiddle

and the solution to the variant is:

(.users.L3AVIcqaDpZxLf6ispK.kruxSegmentIds) |= twiddle 

Generalizing even further, if the task is to perform the transformation on all objects, wherever they occur, the solution is:

walk(if type == "object" then twiddle else . end)

If your jq does not have walk pre-defined, then you can snarf its def from https://raw.githubusercontent.com/stedolan/jq/master/src/builtin.jq

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for the suggestion. jq indeed seems to be a better solution for a large JSON file. However, I'm trying to fit the streaming logic in my JSON file for each object but couldn't get the required output.
Could you describe the contents of the file in more detail? Is it "line-delimited JSON"? What happens when you try jq empty inputfile?
Many thanks for the comment. I will ask a separate question with jq stream as the center of solution to avoid the confusion on the accepted answer using awk.
@Sains, No, I believe that you could remove awk, sed tags from this question as jq tag is already there so you could select this answer as correct answer and I could delete mine one, since this is correct and good answer.
@Sains - See update. Everything you've written indicates you do not need the --stream option, but if jq empty YOURFILE fails, let us know :-)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.