2

Like in this question I am trying to record the exact position when parsing XML.

I already use the SAX Locator passed to setDocumentLocator() to record the line and column number but that doesn't give the offset from the beginning of the file. Is there a way to find the number of bytes read so far by the SAX parser or offset of each line without re-reading the whole file?

2 Answers 2

1

Hypothetically, you can use the CountingInputStream from Apache commons IO.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. The input stream cannot be accessed from within my DefaultHandler descendent, right? That means I'll need to retain some other reference to it..?
This is a good answer but CountingInputStream isn't reliable enough.
1

I found another question and answer which suggests using an XMLStreamReader instead of SAXParser because it has getLocation().getCharacterOffset() instead. It already has exactly what I need.

2 Comments

This is not correct.This way you get CHARACTER offset not BYTE offset. If your XML file contains at least one double byte character then you are in big trouble.
Please consider to take a look at this question stackoverflow.com/questions/43366566

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.