4

What is the most efficient way (In terms of time) to read a text file into a list of array. File are of size 100 mb to 2 gb. The file contains data in following formatted :

From      TO          time     

a         b      13 decc 2009
b         c      13 decc 2009
c         d      13 decc 2009
f         h      13 decc 2009
f         g      13 decc 2009

Edit: Following is code for reading file

public List<InputDataBean> readInputData() throws Exception{
        List<InputDataBean> dataSet = new ArrayList<InputDataBean>();
        FileInputStream fstream = null;
        BufferedReader br = null;
        try{
            fstream = new FileInputStream(filePath);
            br = new BufferedReader(new InputStreamReader(fstream));
            String strLine;
            Set<String> users = new TreeSet<String>();
            while ((strLine = br.readLine()) != null)   {
                InputDataBean data = validateRecord(strLine);
                if(data==null)
                    continue;
                dataSet.add(data);
                users.add(data.getFromName());
                users.add(data.getToName());
            }
            UserKeys.setUsers(users);

        }catch (Exception e){
            throw e;
        }finally{
            try {
                if(null!=br)
                    br.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return dataSet;
    }

After reading file I want to store into array not to database.

If any other better alternative for reading file? Is it good idea to call script from java program and read data using script and store into java array.

P.S.: I really appreciate if anybody can edit or improve tags.

6
  • Firstly, how exactly are you reading your files ? There is no sample codes that could allow anyone to use as sample for suggestions. Secondly, what is your expected standard ? Commented Dec 13, 2011 at 6:01
  • Don't forget to try using something like ensureCapacity() Commented Dec 13, 2011 at 6:02
  • There are some questions asked in stackoverflow regarding the parsing of tab delimited files in Java. I found one here: stackoverflow.com/questions/1635764/… Commented Dec 13, 2011 at 6:03
  • what are you doing with the data? if it goes to a database, you should use a tool that your database provides (most databases do). storing about 2 GB of data into the heap (as you read the file) may not be a great idea... generally, buffered readers are fine if you have to do this in java. Commented Dec 13, 2011 at 6:05
  • @thotheolh: Thanks for suggestion. sorry I want to read file using efficient way(In term of time) Commented Dec 13, 2011 at 6:14

1 Answer 1

3

Possibly wrapping a BufferedInputStream around the FileInputStream will further improve performance a bit (because reads will be buffered in multiples of 4 KB). You could also play a bit with the buffer size.

If you know it's just ASCII, you could avoid using a Reader and possibly avoid creating a String for each line.

If you have the time, I would compare the performance of your solution with existing CSV reader tools, such as the CSV tool from the H2 database (disclosure: I wrote it).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.