0

I have a JSON String from which I am making a InputStream object as shown below and then I am making a GenericRecord object as I am trying to serialize my JSON object to Avro schema.

InputStream input = new ByteArrayInputStream(jsonString.getBytes());
DataInputStream din = new DataInputStream(input);

Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);

DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
// below line is throwing exception
GenericRecord datum = reader.read(null, decoder);   

Below is the exception I am getting:

org.codehaus.jackson.JsonParseException: Invalid UTF-8 middle byte 0x2d at [Source: java.io.DataInputStream@562aee31; line: 1, column: 74]

And here is the actual JSON string on which this exception is happening:

{"name":"car_test","attr_value":"2006|Renault|Megane II Coupé-Cabriolet|null|null|null|null|0|Wed Feb 03 10:00:59 GMT-07:00 2016|1|77|null|null|null|null","data_id":900}

I did some research and found out that I need to use ByteArrayInputStream with UTF-8 encodings as shown below:

InputStream input = new ByteArrayInputStream(jsonString.getBytes(StandardCharsets.UTF_8.displayName()));

But my question is what is the reason of this exception? And why it is happening on my above JSON String? I am just trying to understand why this exception is happening on my above JSON String. And using UTF-8 is the right fix for this?

What does this error means Invalid UTF-8 middle byte 0x2d?

1

2 Answers 2

1

You start with a Java Unicode String jsonString.

You then convert it into a byte stream using String.getBytes(). Since you didn't specify the byte encoding the platform default is used which is most likely ISO 8859-1.

Now you parse the JSON from the (Data)InputStream. Now Avro seems to use UTF-8 to decode the bytes. And when it encounters the é (0x2d) it fails since it is not a valid UTF byte sequence.

So in the end it is a mismatch between the actual encoding (ISO 8859-1) and the expected encoding (UTF-8).

You can solve this like you did, or just avoid to go from string to bytes:

Decoder decoder = DecoderFactory.get().jsonDecoder(schema, jsonString);
Sign up to request clarification or add additional context in comments.

Comments

0

Instead of using low-level Avro decoder functionality, you might be interested in a more convenient approach with Jackson Avro module:

https://github.com/FasterXML/jackson-dataformat-avro/issues

in which you can provide Avro Schema, but still work with POJOs. You can also use regular JSON Jackson databinding (different ObjectMapper) for JSON part, and read JSON into POJO, write POJO as Avro (or other combinations).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.