XMLStreamReader Encoding

Question

I have some characters from the Unicode Extension B Chinese/Japanese/Korean set in my XML:

𠀀𠀁𠀂𠀃𠀄𪛔𪛕𪛖

But when I use streamReader.getText() it returns:

ࠀ

Does anyone know if Java's XMLStreamReader's encoding scheme for unicode characters can be changed?

It works with common East Asian characters, just not with the ones in Unicode Extension B.

How are you constructing the XMLStreamReader? What do XMLStreamReader#getEncoding() and XMLStreamReader#getCharacterEncodingScheme() return? What encoding is the XML actually stored with? — Matt Ball
– Matt Ball, Commented Jan 30, 2013 at 20:46
Hi Matt, the XML is utf-8 and XMLStreamReader#getCharacterEncodingScheme is utf-8 as well. XMLStreamReader#getEncoding is null The XMLStreamReader is created by XMLInputFactory.createXMlStreamReader() — daniely
– daniely, Commented Jan 30, 2013 at 21:09

user1006080 · Accepted Answer · 2018-09-25 17:20:48Z

1

when create XML Stream Reader, you can specify the encoding as UTF-8. Like the API below

abstract XMLStreamReader createXMLStreamReader(InputStream stream, String encoding)

answered Sep 25, 2018 at 17:20

user1006080

751 gold badge4 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Answer 1