0

I have a XML file that I read from a URLConnection. The file grows as time goes. I want to read the first 100k of the file, today the file is 1.3MB. Then I want to parse the part I have read. How can I do that?

1
  • are you sure the first 100k of the file contains what you want? also, why not read just the 100k instead of the whole of the XML from the URLConnection? Commented Apr 7, 2011 at 9:11

2 Answers 2

3

(From scratch)

int length = 100*1024;
byte[] buf = new byte[length];
urlConnection.getInputStream().read(buf,0,length);
StringBufferInputStream in = new StringBufferInputStream(new String(buf));
new SAXParser().parse(in,myHandler);
Sign up to request clarification or add additional context in comments.

1 Comment

+1: read() is only guaranteed to read one byte. Perhaps DataInputStream.readFully() is a better choice.
1

As far as I understand you're interested not just in 100k of a stream but 100k of a stream from which you could extract data you need. This means taking 100k as proposed by Peter won't work as it might result in non-well-formed XML.

Instead I'd suggest to use StAX parser which will give you ability to read and parse XML directly from stream with ability to stop when you've reached 100k (or near) limit.

For further information take a look at XMLStreamReader interface (and samples around its usage). For example you could loop until you get to the START_ELEMENT with name "result" and then use method getTextCharacters(int sourceStart, char[] target, int targetStart, int length) specifying 100k as buffer size.

As you mentioned Android currently it doesn't have StAX parser available. However it does have XmlPullParser with similar functionality.

4 Comments

I want to read a stream that have a XML file downloaded over HTTP that contains 0-3000 elements <result>(more elements inside) </result> I only want to read the 20 first elements, and then the stream should be droped.
How could you be sure that <result>...</result> is contained in first 100k? Or your goal actually to read 100k of character data between <result></result> tags?
I know the structure of the file: <?xml version="1.0" encoding="UTF-8"?> <report> <layout> </layout> <data> <result></result> </data> </report>
So looks like you want to read only 100k of text data between <result> tag. In this case you can use XMLStreamReader, loop until you get to the START_ELEMENT with name "result" and then use method getTextCharacters(int sourceStart, char[] target, int targetStart, int length) specifying 100k as buffer size.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.