8

During my quest to find the fastest method to get data from Java to SQL Server, I have noticed that the fastest Java-method I can come up with, is still 12 times slower than using BULK INSERT.

My data is being generated from within Java, and BULK INSERT only supports reading data from a text file, so using BULK INSERT is not an option unless I output my data to a temporary text file. This in turn, would of course be a huge performance hit.

When inserting from Java, insert speeds are around 2500 rows per second. Even when I measure the time after the for loop, and just before the executeBatch. So "creating" the data in-memory is not the bottleneck.

When inserting with BATCH INSERT, insert speeds are around 30000 rows per second.

Both tests have been done on the server. So network is also not a bottleneck. Any clue as to why BATCH INSERT is faster? And, if the same performance can be attained from within Java?

This is just a big dataset that needs to get loaded once. So it would be OK to temporary disable any kind of logging (already tried simple logging), disable indexes (table has none), locking, whatever, ...

My test-setup so far

Database:

CREATE TABLE TestTable   
   (  Col1 varchar(50)
    , Col2 int);  

Java:

// This seems to be essential to get good speeds, otherwise batching is not used.
conn.setAutoCommit(false);

PreparedStatement prepStmt = conn.prepareStatement("INSERT INTO TestTable (Col1, Col2) VALUES (?, ?)");
for (int i = 1; i <= 10000; i++) {
    prepStmt.setString(1,"X");            
    prepStmt.setInt(2,100);
    prepStmt.addBatch();
}
prepStmt.executeBatch();
conn.commit();

BULK INSERT:

// A text file containing "X 100" over and over again... so the same data as generated in JAVA
bulk insert TestTable FROM 'c:\test\test.txt';
14
  • 1
    A batchsize of 10000 rows is quite large, you might get better performance by doing an executeBatch() every 100 rows or so. Commented Nov 7, 2016 at 17:27
  • 1
    Soz, I meant mem-issues from the client that sends the data. Your question is valid & interesting, but for a one-time job I'd say "screw it", export data in text file & do bulk insert on the server. Commented Nov 7, 2016 at 17:39
  • 1
    I believe there's no way to beat the bulk load methods (LOAD INFILE, COPY, BULK INSERT, etc.) of databases. They just require that the data is in a specific format for it to work, so it's not really even an honest comparison. Commented Nov 7, 2016 at 17:40
  • 1
    @Wouter Yes a text file, whether it's CSV or fixed width etc. I'd imagine that the speed difference comes from the fact that when you're using regular SQL to insert data, there are a lot of things that need to be considered (transactions, locks, etc.) whereas with bulk inserts the database engine is free to optimize for speed (although of course it still needs to consider some things, it can't just corrupt the db if it's in use). Commented Nov 7, 2016 at 18:20
  • 1
    If you're running the BULK INSERT locally from SQLServer Management Studio, it can communicate with the DB using Local Named Pipes protocol, which is way faster than JDBC over TCP/IP (even within localhost). Also, BULK INSERT is designed and optimized for loading massive amounts of data, so it's really not a fair comparison. However, (based on the provided snippet) it looks like you're re-declaring the prepared statement for each batch; you could only declare it once in the beginning to save some time. Also, commit only once, after all batches have been processed. Commented Nov 7, 2016 at 20:56

2 Answers 2

9

While BULK INSERT is the fastest way of doing bulk insert, SQL Server supports remote (client-driven) bulk insert operations both through the native driver and ODBC. From version 4.2 onwards of the JDBC driver, this functionality is exposed through the SQLServerBulkCopy class, which does not directly read from files but does support reading from a RowSet, ResultSet or a custom implementation of ISQLServerBulkRecord for generated data. This functionality is equivalent to the .NET SqlBulkCopy class, with largely the same interface, and should be the fastest way of performing bulk operations short of a server-based BULK INSERT.

EDIT: Example by OP

Below you can find an example use-case that could be used to test the performance of SQLServerBulkCSVFileRecord, a method similar to SQLServerBulkCopy except that it reads from a text file. In my test case, test.txt contained a million rows with "X tab 100"

CREATE TABLE TestTable (Col1 varchar(50), Col2 int);

The table should not have any indexes enabled.

In JAVA

// Make sure to use version 4.2, as SQLServerBulkCSVFileRecord is not included in version 4.1
import com.microsoft.sqlserver.jdbc.*;

long startTime = System.currentTimeMillis();
SQLServerBulkCSVFileRecord fileRecord = null;  

fileRecord = new SQLServerBulkCSVFileRecord("C:\\temp\\test.txt", true);   
fileRecord.addColumnMetadata(1, null, java.sql.Types.NVARCHAR, 50, 0);  
fileRecord.addColumnMetadata(2, null, java.sql.Types.INTEGER, 0, 0);  
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");  
Connection destinationConnection = DriverManager.getConnection("jdbc:sqlserver://Server\\\\Instance:1433", "user", "pass");
SQLServerBulkCopyOptions copyOptions = new SQLServerBulkCopyOptions();  

// Depending on the size of the data being uploaded, and the amount of RAM, an optimum can be found here. Play around with this to improve performance.
copyOptions.setBatchSize(300000); 

// This is crucial to get good performance
copyOptions.setTableLock(true);  

SQLServerBulkCopy bulkCopy =  new SQLServerBulkCopy(destinationConnection);
bulkCopy.setBulkCopyOptions(copyOptions);  
bulkCopy.setDestinationTableName("TestTable");
bulkCopy.writeToServer(fileRecord);

long endTime   = System.currentTimeMillis();
long totalTime = endTime - startTime;
System.out.println(totalTime + "ms");

Using this example, I was able to get insert speeds of up to 30000 rows per second.

Sign up to request clarification or add additional context in comments.

Comments

2

Below is the fastest method I could find that does not use SQLServerBulkCopy. It is a lot slower than SQLServerBulkCopy though. Instead of 30000 rows per second it inserts at 2500 rows per second. For a lot of use-cases, this might still be interesting. The main things to keep in mind is to set AutoCommit to false, use large batches use PreparedStatements, and disable any indexes.

Connection db_connection = DriverManager.getConnection("jdbc:sqlserver://Server\\\\Instance:1433", "User", "Pass");

// This is crucial to getting good performance
db_connection.setAutoCommit(false);

PreparedStatement prepStmt = db_connection.prepareStatement("INSERT INTO TestTable (Col1, Col2) VALUES (?, ?)");
for (int i = 1; i <= 10000; i++) {
    prepStmt.setString(1,"X");            
    prepStmt.setInt(2,100);
    prepStmt.addBatch();
}
prepStmt.executeBatch();
db_connection.commit();

1 Comment

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.