10

I have a text file (XML created with XStream) which is 63000 lines (3.5 MB) long. I'm trying to read it using Buffered reader:

                BufferedReader br = new BufferedReader(new FileReader(file));
                try {
                    String s = "";
                    String tempString;
                    int i = 0;
                    while ((tempString = br.readLine()) != null) {
                        s = s.concat(tempString);
//                        s=s+tempString;
                        i = i + 1;
                        if (i % 1000 == 0) {
                            System.out.println(Integer.toString(i));
                        }
                    }
                    br.close();

Here you can see my attempts to measure reading speed. And it's very low. It takes seconds to read 1000 lines after 10000 line. I'm clearly doing something wrong, but can't understand what. Thanks in advance for your help.

7
  • Is your intent to parse this file? Why not just load it with Xerces/SAX/ other parsing tool? Commented Apr 6, 2013 at 10:26
  • 10
    String + and concat is very inefficient if the Strings are large. Use StringBuilder or pass the InputStream/Reader straight to the xml parser. Commented Apr 6, 2013 at 10:27
  • Or if you truly need lines, use something like this - commons.apache.org/proper/commons-io/javadocs/api-2.4/org/…. Commented Apr 6, 2013 at 10:29
  • Yes, I'm trying to parse this file and input it into Xstream again to read saved class. Lines are not critical. Commented Apr 6, 2013 at 10:31
  • 1
    If you need it in XStream, why don't you just pass the reader directly to XStream instead of reading it yourself and then passing the string. Commented Apr 6, 2013 at 10:36

4 Answers 4

4

@PaulGrime is right. You are copying the string each time the loop reads a line. Once the string gets big (say 10,000 lines big), it is doing a lot of work to do that copying.

Try this:

StringBuilder sb = new StringBuilder();
while (...reading lines..){ 
   ....
   sb.append(tempString);  //should add newline
   ...
}

s = sb.toString();

Note: read Paul's answer below on why stripping newlines makes this a bad way to read in a file. Also, as mentioned in the question comments, XStream provides a way to read the file and even if it had not, IOUtils.toString(reader) would be a safer way to read a file.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! Really speeded loading up.
-1 the performance penalty is just not copying, Stringbuilder is the one advised in the documentation, PaulGrime is right is not really an answer which deserve to be accepted... and 10000? why?
I said, "say 10,000" which is to mean, "for example, when 10,000 lines big". I also explained why Paul was right and gave a code example. Also, please clarify what you mean by "not just copying".
I also changed to StringBuilder from StringBuffer - they are equivalent classes, but StringBuilder is not threadsafe (which is fine in this case).
4

Some immediate improvements you can do:

  • Use a StringBuilder instead of concat and +. Using + and concat can really affect the performance, specially when used in loops.
  • Reduce the access to the disk. You can do it by using a large buffer:

    BufferedReader br = new BufferedReader(new FileReader("someFile.txt"), SIZE);

Comments

3

You should use a StringBuilder as String concatenation is extremely slow for even small strings.

Further, try using NIO rather than a BufferedReader.

public static void main(String[] args) throws IOException {
    final File file = //some file
    try (final FileChannel fileChannel = new RandomAccessFile(file, "r").getChannel()) {
        final StringBuilder stringBuilder = new StringBuilder();
        final ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
        final CharsetDecoder charsetDecoder = Charset.forName("UTF-8").newDecoder();
        while (fileChannel.read(byteBuffer) > 0) {
            byteBuffer.flip();
            stringBuilder.append(charsetDecoder.decode(byteBuffer));
            byteBuffer.clear();
        }
    }
}

You can tune the buffer size if it's still too slow - it's heavily system dependent what buffer size works better. For me it makes very little difference if the buffer is 1K or 4K but on other systems I have know that change to increase speed by an order of magnitude.

Comments

1

In addition to what has already been said, depending on your use of the XML, your code is potentially incorrect as it discards line endings. For example, this code:

package temp.stackoverflow.q15849706;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;

import com.thoughtworks.xstream.XStream;

public class ReadXmlLines {
    public String read1(BufferedReader br) throws IOException {
        try {
            String s = "";
            String tempString;
            int i = 0;
            while ((tempString = br.readLine()) != null) {
                s = s.concat(tempString);
                // s=s+tempString;
                i = i + 1;
                if (i % 1000 == 0) {
                    System.out.println(Integer.toString(i));
                }
            }
            return s;
        } finally {
            br.close();
        }
    }

    public static void main(String[] args) throws IOException {
        ReadXmlLines r = new ReadXmlLines();

        URL url = ReadXmlLines.class.getResource("xml.xml");
        String xmlStr = r.read1(new BufferedReader(new InputStreamReader(url
                .openStream())));

        Object ob = null;

        XStream xs = new XStream();
        xs.alias("root", Root.class);

        // This is incorrectly read/parsed, as the line endings are not
        // preserved.
        System.out.println("----------1");
        System.out.println(xmlStr);
        ob = xs.fromXML(xmlStr);
        System.out.println(ob);

        // This is correctly read/parsed, when passing in the URL directly
        ob = xs.fromXML(url);
        System.out.println("----------2");
        System.out.println(ob);

        // This is correctly read/parsed, when passing in the InputStream
        // directly
        ob = xs.fromXML(url.openStream());
        System.out.println("----------3");
        System.out.println(ob);
    }

    public static class Root {
        public String script;

        public String toString() {
            return script;
        }
    }
}

and this xml.xml file on the classpath (in the same package as the class):

<root>
    <script>
<![CDATA[
// taken from http://www.w3schools.com/xml/xml_cdata.asp
function matchwo(a,b)
{
if (a < b && a < 0) then
  {
  return 1;
  }
else
  {
  return 0;
  }
}
]]>
    </script>
</root>

produces the following output. The first two lines shows the line endings have been removed, and thus made the Javascript in the CDATA section invalid (as the first JS comment now comments out the whole JS, because the JS lines have been merged).

----------1
<root>    <script><![CDATA[// taken from http://www.w3schools.com/xml/xml_cdata.aspfunction matchwo(a,b){if (a < b && a < 0) then  {  return 1;  }else  {  return 0;  }}]]>    </script></root>
// taken from http://www.w3schools.com/xml/xml_cdata.aspfunction matchwo(a,b){if (a < b && a < 0) then  {  return 1;  }else  {  return 0;  }}    
----------2


// taken from http://www.w3schools.com/xml/xml_cdata.asp
function matchwo(a,b)
{
if (a < b && a < 0) then
  {
  return 1;
  }
else
  {
  return 0;
  }
}
...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.