I have a super simple XML document encoded in UTF-16 LE.
<?xml version="1.0" encoding="utf-16"?><X id="1" />
I'm loading it in as such (using jcabi-xml):
BOMInputStream bomIn = new BOMInputStream(Main.class.getResourceAsStream("resources/test.xml"), ByteOrderMark.UTF_16LE);
String firstNonBomCharacter = Character.toString((char)bomIn.read());
Reader reader = new InputStreamReader(bomIn, "UTF-16");
String xmlString = IOUtils.toString(reader);
xmlString = xmlString.trim();
xmlString = firstNonBomCharacter + xmlString;
bomIn.close();
reader.close();
final XML xml = new XMLDocument(xmlString);
I have checked that there are no extra BOM/junk symbols (leading or anywhere) by saving out the file and inspecting it with a hex editor. The XML is properly formatted.
However, I still get the following error:
[Fatal Error] :1:40: Content is not allowed in prolog.
Exception in thread "main" java.lang.IllegalArgumentException: Invalid XML: "<?xml version="1.0" encoding="utf-16"?><X id="1" />"
at com.jcabi.xml.DomParser.document(DomParser.java:115)
at com.jcabi.xml.XMLDocument.<init>(XMLDocument.java:155)
at Main.getTransformedString(Main.java:47)
at Main.main(Main.java:26)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 40; Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at com.jcabi.xml.DomParser.document(DomParser.java:105)
... 3 more
I have googled up and down for this error but they all say that it's the BOM's fault, which I have confirmed (to the best of my knowledge) to not be the case. What else could be wrong?
bomIn.read();for discarding the second byte?BOMInputStream, you should remove thebomIn.read()call altogether because the stream discards the BOM for you.bomIn.read()my string turns into something made of nothing but questions marks. Truthfully I'm not too sure exactly how to useBOMInputStreambut this answer (stackoverflow.com/questions/1835430/…) writes that callingreadskips to the first non-bom character (which I forgot to include in my sample code).InputStreamReadershould be told about the endiannness:Reader reader = new InputStreamReader(bomIn, StandardCharsets.UTF_16LE);bomIn.read(), thanks! However the actual error itself persists.