2

I, thank for your attention.

I want to export a lot of data, really a lot of data (6 million of rows) to a .csv file using java. The app is a swing application, with JPA, using toplink (ojdbc14).

I have tried to use:

BufferedWriter RandomAccessFile FileChannel

etc etc, but the consumption of memory remains very high, causing a Java Heap Out of Memory Exception, although I set the maximun heap size in 800m (-Xmx800m).

My last version of the souce code:

...(more lines of code)

FileChannel channel = getRandomAccessFile(tempFile).getChannel();
Object[][] data = pag.getRawData(); //Database data in a multidimentional array

            for (int j = 0; j < data.length; j++) {
                write(data[j], channel); //write data[j] (an array) into the channel
                freeStringLine(data[j]); //data[j] is an array, this method sets all positions =null
                data[j] = null;//sets reference in null
            }

            channel.force(false); //force writing in file system (HD)
            channel.close(); //Close the channel
            pag = null; 

...(more lines of code)

 private void write(Object[] row, FileChannel channel) throws DatabaseException {
    if (byteBuff == null) {
        byteBuff = ByteBuffer.allocateDirect(1024 * 1024);
    }
    for (int j = 0; j < row.length; j++) {
        if (j < row.length - 1) {
            if (row[j] != null) {
                byteBuff.put(row[j].toString().getBytes());
            }
            byteBuff.put(SPLITER_BYTES);
        } else {
            if (row[j] != null) {
                byteBuff.put(row[j].toString().getBytes());
            }
        }
    }
    byteBuff.put("\n".toString().getBytes());        
    byteBuff.flip();
    try {
        channel.write(byteBuff);
    } catch (IOException ex) {
        throw new DatabaseException("Imposible escribir en archivo temporal de exportación : " + ex.getMessage(), ex.getCause());
    }
    byteBuff.clear();
}

Being 6 millions of rows, I don't want to store that data in memory while the file is created. I made many temp files (wtih 5000 rows each one), and at the final of the process, append all those temp files in a single one, using two FileChannel. However, the exception for lack of memory is launched before the joining.

Do you now another strategy for export a lot of data?

Thanks a lot for any ansmwer. Sorry for my English, I'm improving xD

3
  • You'd be better of loading the data from the database as pages and writing each page to the file, then discarding the page before you load the next one. Commented Aug 9, 2011 at 21:12
  • Have you profiled the application to view the memory usage across objects/classes? I suspect that it is due to the size of the persistence context. Also, joining files is not a worthwhile exercise if you can write to the end of the CSV file. Commented Aug 9, 2011 at 21:13
  • Thank to Jeremy and Vineet for your replys. Jeremy, I created many archives because I had used BufferedWriter in earliest version of source code. But I'll try to use a single file with FileChannel :). Vineet, I use the profiler of NetBeans 6.9 for analyze the memory consumption. Pitifully, the "live results" tabs shows that the most memory consumer class is the array of bytes (byte[]), maybe I don't know completely the profiler. About the persistence context, disabling the caching, could reduce the memory consumption?. Commented Aug 11, 2011 at 20:34

1 Answer 1

3

The answer is to use a "stream" approach - ie read one row, write one row as you scroll through the dataset. You'll need to get the query result as a cursor and iterate through it, not get the whole result set.

In JPA, use code something like this:

ScrollableResults cursor = session.createQuery("from SomeEntity x").scroll();

while (cursor.next()) {
    writeToFile(cursor);
}

This means you only have one row in memory at a time, which is totally scalable to any number of rows and uses minimal memory (it's faster anyway).

Getting all rows at once in a result set is a convenience approach which works for small result set (which is most of the time), but as usual, convenience comes at a cost and it doesn't work in all situations.

Sign up to request clarification or add additional context in comments.

8 Comments

Oh, thank you vey much for your reply. Nevertheless, I'm not using te createQuery() method (I suposse that session is an instance of Query class), but te createNativeQuery() method, which execute a complex query (joining 3 tables). Is your approach correct for my problem? (I don't use an entity name in the query :S )
@Bohemian need your help....i am using hibernate and facing same issue. i have 2 million records in table. i want to write all 2 million records from table to csv file. going through the above solution .... it will fire query for approximately 2 million time...just want to know if it is the right and only way to do so. or we can create multiple threads and allow them to read from the table to enhance the speed.. but still the question rise like how will we maintain that a particular record is already read out by thread. Please help .. dont have answer for this question from last 3 weeks.
@sparsh610 I believe an operational need should have an operational solution. Java is great for applications that handle single requests. It doesn't work well for handling bulk operations. My strong recommendation is to not use java for this. Use a shell command to invoke a database utiity function. Every database has a simple way to dump data to csv - use that. It will be simple and easy to code, reliable and way faster than anything you can do in java.
@Bohemian thanks a lot ... was looking eagerly for your reply ... for your reply say...in worst case scenario if we have to do it via Java then what will be the best possible fastest way to implement the same.Please help me :)
i also tried googling but didn't find much on the same
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.