2

I have a huge JSON Array with multiple thousand objects and I need to filter all objects where the text field is too long (say 200 chars).

I've found a lot of SED/AWK advices to find a line with a certain length, but how can I delete that line AND the 1 before and the 2 after it; so that the whole JSON object is deleted?

The structure is like follows:

{ "text": "blah blah blah", "author": "John Doe" }

Thanks!

1
  • Next time you need to process JSON, also have a look at jq. Commented May 24, 2018 at 11:57

1 Answer 1

0

Here's a Python script that does what you want:

#!/usr/bin/env python
# -*- coding: ascii -*-
"""filter.py"""

import sys

# Get the file and the maximum line-length as command-line arguments
filepath = sys.argv[1]
maxlen = int(sys.argv[2])

# Initialize a list to store the unfiltered lines
lines = []

# Read the data file line-by-line
jsonfile = open(filepath, 'r')
for line in jsonfile:

    # Only consider non-empty lines
    if line:

        # For "text" lines that are too line, remove the previous line
        # and also skip the next two line
        if "text" in line and len(line) > maxlen: 
            lines.pop()
            next(jsonfile)
            next(jsonfile)
        # Add all other lines to the list
        else:
            lines.append(line)

# Strip trailing comma from the last object
lines[-2] = lines[-2].replace(',', '')

# Output the lines from the list
for line in lines:
    sys.stdout.write(line)

You could run it like this:

python filter.py data.json 34

Suppose you had the following data file:

[
    {
    "text": "blah blah blah one",
    "author": "John Doe"
    },
    {
    "text": "blah blah blah two",
    "author": "John Doe"
    },
    {
    "text": "blah blah blah three",
    "author": "John Doe"
    }
]

Then running the script as described would produce the following output:

[
    {
    "text": "blah blah blah one",
    "author": "John Doe"
    },
    {
    "text": "blah blah blah two",
    "author": "John Doe"
    }
]
0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.