8

I'm using javax.xml.stream.XMLStreamReader to parse XML documents. Unfortunately, some of the documents I'm parsing use non-IANA encoding names, like "macroman" and "ms-ansi". For example:

<?xml version="1.0" encoding="macroman"?>
<foo />

This causes the parse to blow up with an exception:

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,42]
Message: Invalid encoding name "macroman".

Is there any way to provide a custom encoding handler to my XMLStreamReader so that I can augment it with support for the encodings I need??

3
  • I'm assuming you don't have the ability to alter the stream so that it doesn't contain the encoding line? XMLStreamReader has its limitations, and this is one of them. Commented Mar 14, 2019 at 16:45
  • Its unfortunate, but you may better be served by choosing a different XML library. Commented Mar 14, 2019 at 16:45
  • @Dylan I'm not producing these documents, just consuming them, so I have no control over the encoding line unfortunately. Are there other XML libraries that are more flexible? Commented Mar 15, 2019 at 22:40

1 Answer 1

0

You could wrap the input stream with a transformer that replaces the non-standard charset with the equivalent charset that XMLStreamReader does understand.

See Filter (search and replace) array of bytes in an InputStream

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.