0
public XMLParser(InputStream is) {
    try {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db;
        db = dbf.newDocumentBuilder();
        Document doc = db.parse(is);
        node = doc.getDocumentElement();
    } catch (Exception e) {
        DebugLog.log(e);
    }
}

The inputStream contains content like: "Hey there this is a ü character." The character 'ü' is a 'ü';

When reading the node's content System.out.println(node.getTextContent()) I receive "hey there this is a character." ü is cut of.

2 Answers 2

0

Well, is this a valid document? Does it have encoding specified?-> http://www.w3schools.com/XML/xml_encoding.asp

Those might help:

Howto let the SAX parser determine the encoding from the xml declaration? http://www.coderanch.com/t/127052/XML/XML-parsers-encoding-byte-order

Sign up to request clarification or add additional context in comments.

3 Comments

It's a HTML Webpage. ISO-8859-1
What is the default charset on device/machine?
Ach, just noticed tag. IIRC if not specified, the reader/parser assumes device( UTF-8 in this case ) encoding. You need to specify encoding( docs.oracle.com/javase/1.4.2/docs/api/java/io/…) or create some custom InputStream which peeks encoding.
0

The Problem was the XML Entities and HTML Entities. I request a webpage which returns data with HTML Entities. I had to convert the HTML Entities to XML Entities and it worked!

Check this answer for some code

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.