4

I have a huge JSON file (1GB) which is basically an array of objects in the below format

[{"x":"y", "p":"q"}, {"x1":"y1", "p1":"q1"},....]

I want to parse this file such the all the data is not loaded in memory.
Basically I want to get for eg: first 1000 objects in the array to memory process it and then get the next 1000 objects into the memory process it and so on util all data is read.
Is there any JSON library that supports this use case? I currently use Gson. However it loads all the data to memory when I call gson.fromJson()

Thanks in advance for the help.

2 Answers 2

1

It looks like Gson has a streaming API, which is what you want: https://sites.google.com/site/gson/streaming

Sign up to request clarification or add additional context in comments.

4 Comments

Is it possible in Jackson or Gson given a byte offset from the beginning of the file read the next token from that offset? Basically I have a byte offset from the beginning of the file and I want to read the next item from that offset?
Did you try using BufferedReader and skip? docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
Is it possible to get the position of the file pointer in BufferedReader ? I want to basically get the char or byte offset from the beginning of the file?
Well, it starts at the beginning of the file by default... for example, to skip 7 bytes: br = new BufferedReader(new FileReader("file.txt")); br.skip(7);
1

With Jackson you can use a SAX-like approach (streaming) using a JsonParser object, in your case it would be something like this:

JsonFactory jsonFactory = new JsonFactory();
JsonParser parser = jsonFactory.createParser(new File("/path/to/my/jsonFile"));

// Map where to store your field-value pairs per object
Map<String, String> fields = new HashMap<String, String>();

JsonToken token;
while ((token = parser.nextToken()) != JsonToken.END_ARRAY) {
    switch (token) {

        // Starts a new object, clear the map
        case START_OBJECT:
            fields.clear();
            break;

        // For each field-value pair, store it in the map 'fields'
        case FIELD_NAME:
            String field = parser.getCurrentName();
            token = parser.nextToken();
            String value = parser.getValueAsString();
            fields.put(field, value);
            break;

        // Do something with the field-value pairs
        case END_OBJECT:
            doSomethingWithTheObject(fields)
            break;
        }
    }
    parser.close();

2 Comments

Hi morgano. Is it possible in Jackson or Gson given a byte offset from the beginning of the file read the next token from that offset? Basically I have a byte offset from the beginning of the file and I want to read the next item from that offset?
If you want to skip some bytes from the beginning try this: instead of using createParser(File), use createParser(InputStream). First create a FileInputStream from your file, then read the number of bytes you want to skip and then use createParser using the same FileInputStream

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.