1

I have 9 million records in sqlserver. I am trying to import it into csv files so that I can put that data into mongo db. I have written Java code for sql2csv import. But I have two issue

  1. If I read all the data in list and then try to insert into CSV, I got outofmemorry exception.
  2. If I read line by line and try to insert every line in CSV, it took very long time to export data.

My code is some thing like

 List list = new ArrayList();  
    try {
        Class.forName(driver).newInstance();
        conn = DriverManager.getConnection(url, databaseUserName, databasePassword);
        stmt =  conn.prepareStatement("select  OptimisationId  from SubReports");
        result = null;

        result =   stmt.executeQuery(); 
        //  stmt.executeQuery("select * from Subscription_OptimisationReports");
        result.setFetchSize(1000);

        while (result.next()) {
            //System.out.println("Inside while");
            SubReportsBean bean = new SubReportsBean();
            bean.setOptimisationId(result.getLong(("OptimisationId")));

            list.add(bean);
             generateExcel(list);

        }
        //generateExcel(list);  
        conn.close();
    }

Can there be a faster approach to export all data quickly? Or even better if it can directly be exported to mongo instead of csv.

2
  • You can combine your two ways by adding counter, which will write your data foreach (for example 1000) beans. Also, you can directly export your data to CSV from SQL Server Management Studio Commented Feb 27, 2015 at 14:12
  • I know I can do that directly from SQL Server Management Studio but I don't have full access to studio. So I only one option to write some code. Commented Feb 27, 2015 at 14:16

2 Answers 2

1

Maybe you should paginate your data by only reading a little at a time by using LIMIT and OFFSET.

select  OptimisationId  from SubReports OFFSET 0 ROWS FETCH NEXT 1000 ROWS ONLY;
select  OptimisationId  from SubReports OFFSET 1000 ROWS FETCH NEXT 1000 ROWS ONLY;
select  OptimisationId  from SubReports OFFSET 2000 ROWS FETCH NEXT 1000 ROWS ONLY;
...

Just keep a counter of the offset.

Another Example

If you use this solution then you'd need to modify your code to append to the end of the Excel file -- don't keep all your results in memory otherwise you'll still run into the OutOfMemoryException.

Sign up to request clarification or add additional context in comments.

Comments

1

Definitely when dealing with so much records, collecting all date in a list before dumping in to CSV is bound to fail.

So your solution 2 is the way to go.

Your code seems to correspond to this solution but I think you 've just forgotten to move your list declaration or to empty your list in the loop. You could do :

try {
    Class.forName(driver).newInstance();
    conn = DriverManager.getConnection(url, databaseUserName, databasePassword);
    stmt =  conn.prepareStatement("select  OptimisationId  from SubReports");
    result = null;

    result =   stmt.executeQuery(); 
    //  stmt.executeQuery("select * from Subscription_OptimisationReports");
    result.setFetchSize(1000);

    while (result.next()) {
        //System.out.println("Inside while");
        SubReportsBean bean = new SubReportsBean();
        bean.setOptimisationId(result.getLong(("OptimisationId")));
        List list = new ArrayList();  
        list.add(bean);
         generateExcel(list);

    }
    //generateExcel(list);  
    conn.close();
}

2 Comments

Also be sure to use a BufferedWriter in your csv writing code
is it going to be only 1 record in the list, then what is the purpose of this list?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.