0

Most XML documents, like RSS feeds, starts with a prologue:

<?xml version="1.0" encoding="UTF-8" ?>

But what i can't understand is why this is needed, because if application parses XML and reads "encoding" value, it is already reading text, decoded with application's encoding.

1

1 Answer 1

-1

because if application parses XML and reads "encoding" value, it is already reading text,

That's not necessarily true. The XML parser will read the bytes up until the first new line (which is the reason why the xml declaration must always be on the first line of a xml file), convert it to text in order to parse the encoding and then read the remaining bytes using the specified encoding.

Sign up to request clarification or add additional context in comments.

4 Comments

Sweet. Any XML parser implementations where i can see this behavior? i mean open source. Is there a specification for first line's encoding?
The answer to the duplicate question has link to the xml standard which describes how this is to be done: stackoverflow.com/a/5165423/676877
This answer is not really accurate. The new line has nothing to do with it. If the encoding is not utf-8 or utf-16, a prolog is required. The prolog can be identified in the first 2 or 4 bytes of data. Basically the goal is the brute force the reading of <?xml. This is all covered in the xml spec. w3.org/TR/REC-xml/#sec-guessing
@DmitrijA did you read the linked xml spec which describes how the prolog is used to "guess" the encoding? What are you still finding confusing?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.