1

Does Gson have a way to read in non-standard JSON files?

Instead of a typical file like:

[{obj1},{objN}]

I have a file like this:

{obj1}
{objN}

Where there are no square brackets or commas and each object is separated by a newline character.

3
  • Can you just read it in line-by-line, or in chunks, and parse it line-by-line? Commented Apr 20, 2017 at 19:30
  • Probably as it's a plain old file, but I didn't know if gson had some flags where it can automagically do everything Commented Apr 20, 2017 at 19:35
  • Can you provide a longer example of what these input files look like? For instance, one that has both lists and key-value pairs, to get an idea of what you're trying to do? Commented Apr 20, 2017 at 20:35

3 Answers 3

1

Yes, it has. Gson supports lenient reading. For example, the following JSON document (non-standard.json):

{
    "foo": 1
}
{
    "bar": 1
}

you can use the following reading way:

private static final Gson gson = new Gson();
private static final TypeAdapter<JsonElement> jsonElementTypeAdapter = gson.getAdapter(JsonElement.class);

public static void main(final String... args)
        throws IOException {
    try ( final Reader reader = getPackageResourceReader(Q43528208.class, "non-standard.json") ) {
        final JsonReader jsonReader = new JsonReader(reader);
        jsonReader.setLenient(true); // this makes it work
        while ( jsonReader.peek() != END_DOCUMENT ) {
            final JsonElement jsonElement = jsonElementTypeAdapter.read(jsonReader);
            System.out.println(jsonElement);
        }
    }
}

Output:

{"foo":1}  
{"bar":1}  

I'm not sure if you can write a robust deserializer this way though.

Update

In order to simplify the Gson support, we can implement a few convenient reading methods:

// A shortcut method for the below implementation: aggregates the whole result into a single list
private static <T> List<T> parseToListLenient(final JsonReader jsonReader, final IMapper<? super JsonReader, ? extends T> mapper)
        throws IOException {
    final List<T> list = new ArrayList<>();
    parseLenient(jsonReader, in -> list.add(mapper.map(in)));
    return list;
}

// A convenient strategy-accepting method to configure a JsonReader instance to make it lenient and do read
// The consumer defines the strategy what to do with the current JsonReader token
private static void parseLenient(final JsonReader jsonReader, final IConsumer<? super JsonReader> consumer)
        throws IOException {
    final boolean isLenient = jsonReader.isLenient();
    try {
        jsonReader.setLenient(true);
        while ( jsonReader.peek() != END_DOCUMENT ) {
            consumer.accept(jsonReader);
        }
    } finally {
        jsonReader.setLenient(isLenient);
    }
}

// Since Java 8 Consumer inteface does not allow checked exceptions to be rethrown
private interface IConsumer<T> {

    void accept(T value)
            throws IOException;

}

private interface IMapper<T, R> {

    R map(T value)
            throws IOException;

}

Then simple reading is really simple, and we can just use the methods above:

final Gson gson = new Gson();
final TypeToken<Map<String, Integer>> typeToken = new TypeToken<Map<String, Integer>>() {
};
final TypeAdapter<Map<String, Integer>> typeAdapter = gson.getAdapter(typeToken);
try ( final JsonReader jsonReader = getPackageResourceJsonReader(Q43528208.class, "non-standard.json") ) {
    final List<Map<String, Integer>> maps = parseToListLenient(jsonReader, typeAdapter::read);
    System.out.println(maps);
}

Deserialization via Gson directly would require more complicated implementation:

// This is just a marker not meant to be instantiated but to create a sort of "gateway" to dispatch types in Gson
@SuppressWarnings("unused")
private static final class LenientListMarker<T> {
    private LenientListMarker() {
        throw new AssertionError("must not be instantiated");
    }
}

private static void doDeserialize()
        throws IOException {
    final Gson gson = new GsonBuilder()
            .registerTypeAdapterFactory(new TypeAdapterFactory() {
                @Override
                public <T> TypeAdapter<T> create(final Gson gson, final TypeToken<T> typeToken) {
                    // Check if the given type is the lenient list marker class
                    if ( !LenientListMarker.class.isAssignableFrom(typeToken.getRawType()) ) {
                        // Not the case? Just delegate the job to Gson
                        return null;
                    }
                    final Type listElementType = getTypeParameter0(typeToken.getType());
                    final TypeAdapter<?> listElementAdapter = gson.getAdapter(TypeToken.get(listElementType));
                    @SuppressWarnings("unchecked")
                    final TypeToken<List<?>> listTypeToken = (TypeToken<List<?>>) TypeToken.getParameterized(List.class, listElementType);
                    final TypeAdapter<List<?>> listAdapter = gson.getAdapter(listTypeToken);
                    final TypeAdapter<List<?>> typeAdapter = new TypeAdapter<List<?>>() {
                        @Override
                        public void write(final JsonWriter out, final List<?> value)
                                throws IOException {
                            // Always write a well-formed list
                            listAdapter.write(out, value);
                        }

                        @Override
                        public List<?> read(final JsonReader in)
                                throws IOException {
                            // Delegate the job to the reading method - we only have to tell how to obtain the list values
                            return parseToListLenient(in, listElementAdapter::read);
                        }
                    };
                    @SuppressWarnings("unchecked")
                    final TypeAdapter<T> castTypeAdapter = (TypeAdapter<T>) typeAdapter;
                    return castTypeAdapter;
                }

                // A simple method to resolve actual type parameter
                private Type getTypeParameter0(final Type type) {
                    if ( !(type instanceof ParameterizedType) ) {
                        // List or List<?>
                        return Object.class;
                    }
                    return ((ParameterizedType) type).getActualTypeArguments()[0];
                }
            })
            .create();
    // This type declares a marker specialization to be used during deserialization
    final Type type = new TypeToken<LenientListMarker<Map<String, Integer>>>() {
    }.getType();
    try ( final JsonReader jsonReader = getPackageResourceJsonReader(Q43528208.class, "non-standard.json") ) {
        // This is where we're a sort of cheating:
        // We tell Gson to deserialize LenientListMarker<Map<String, Integer>> but the type adapter above will return a list
        final List<Map<String, Integer>> maps = gson.fromJson(jsonReader, type);
        System.out.println(maps);
    }
}

The output is now for Map<String, Integer>s, not JsonElements:

[{foo=1}, {bar=1}]

Update 2

TypeToken.getParameterized workaround:

@SuppressWarnings("unchecked")
final TypeToken<List<?>> listTypeToken = (TypeToken<List<?>>) TypeToken.get(new ParameterizedType() {
    @Override
    public Type getRawType() {
        return List.class;
    }

    @Override
    public Type[] getActualTypeArguments() {
        return new Type[]{ listElementType };
    }

    @Override
    public Type getOwnerType() {
        return null;
    }
});
Sign up to request clarification or add additional context in comments.

15 Comments

Thanks, that's good to know! Yeah, I'm looking for a deserializer as I'm trying to take json strings and turn them into objects
@ekjcfn3902039 How are you expecting the deserialized value look like? List<T> where each line represents a single T value?
(When I was saying that I'm not sure about a robust deserializer for such a case, I meant nested lists, because it would require more "intelligent" delimiter analysis, if it can work in Gson, of course.)
Yep, a List<T> where each line represents a single T value
Thanks, but my TypeToken doesn't have a getParameterized() method. I am stuck using 2.2.4 Is that a newer feature? I am also not sure what the getPackageResourceJsonReader() method is and the Q43528208 class file.
|
0

We can have one more program to introduce comma(,) and construct a well formed JSON

Comments

0

With spark 2, we can add multiline as read option.

spark.df.option("multiline","true").json("data.json")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.