Can I get "BULK INSERT"-like speeds when inserting from Java into SQL Server?

Question

During my quest to find the fastest method to get data from Java to SQL Server, I have noticed that the fastest Java-method I can come up with, is still 12 times slower than using BULK INSERT.

My data is being generated from within Java, and BULK INSERT only supports reading data from a text file, so using BULK INSERT is not an option unless I output my data to a temporary text file. This in turn, would of course be a huge performance hit.

When inserting from Java, insert speeds are around 2500 rows per second. Even when I measure the time after the for loop, and just before the executeBatch. So "creating" the data in-memory is not the bottleneck.

When inserting with BATCH INSERT, insert speeds are around 30000 rows per second.

Both tests have been done on the server. So network is also not a bottleneck. Any clue as to why BATCH INSERT is faster? And, if the same performance can be attained from within Java?

This is just a big dataset that needs to get loaded once. So it would be OK to temporary disable any kind of logging (already tried simple logging), disable indexes (table has none), locking, whatever, ...

My test-setup so far

Database:

CREATE TABLE TestTable   
   (  Col1 varchar(50)
    , Col2 int);

Java:

// This seems to be essential to get good speeds, otherwise batching is not used.
conn.setAutoCommit(false);

PreparedStatement prepStmt = conn.prepareStatement("INSERT INTO TestTable (Col1, Col2) VALUES (?, ?)");
for (int i = 1; i <= 10000; i++) {
    prepStmt.setString(1,"X");            
    prepStmt.setInt(2,100);
    prepStmt.addBatch();
}
prepStmt.executeBatch();
conn.commit();

BULK INSERT:

// A text file containing "X 100" over and over again... so the same data as generated in JAVA
bulk insert TestTable FROM 'c:\test\test.txt';

A batchsize of 10000 rows is quite large, you might get better performance by doing an executeBatch() every 100 rows or so. — Kayaman
– Kayaman, Commented Nov 7, 2016 at 17:27
Soz, I meant mem-issues from the client that sends the data. Your question is valid & interesting, but for a one-time job I'd say "screw it", export data in text file & do bulk insert on the server. — TT.
– TT., Commented Nov 7, 2016 at 17:39
I believe there's no way to beat the bulk load methods (LOAD INFILE, COPY, BULK INSERT, etc.) of databases. They just require that the data is in a specific format for it to work, so it's not really even an honest comparison. — Kayaman
– Kayaman, Commented Nov 7, 2016 at 17:40
@Wouter Yes a text file, whether it's CSV or fixed width etc. I'd imagine that the speed difference comes from the fact that when you're using regular SQL to insert data, there are a lot of things that need to be considered (transactions, locks, etc.) whereas with bulk inserts the database engine is free to optimize for speed (although of course it still needs to consider some things, it can't just corrupt the db if it's in use). — Kayaman
– Kayaman, Commented Nov 7, 2016 at 18:20
If you're running the BULK INSERT locally from SQLServer Management Studio, it can communicate with the DB using Local Named Pipes protocol, which is way faster than JDBC over TCP/IP (even within localhost). Also, BULK INSERT is designed and optimized for loading massive amounts of data, so it's really not a fair comparison. However, (based on the provided snippet) it looks like you're re-declaring the prepared statement for each batch; you could only declare it once in the beginning to save some time. Also, commit only once, after all batches have been processed. — Mick Mnemonic
– Mick Mnemonic, Commented Nov 7, 2016 at 20:56

Wouter · Accepted Answer · 2016-11-14 12:58:00Z

While BULK INSERT is the fastest way of doing bulk insert, SQL Server supports remote (client-driven) bulk insert operations both through the native driver and ODBC. From version 4.2 onwards of the JDBC driver, this functionality is exposed through the SQLServerBulkCopy class, which does not directly read from files but does support reading from a RowSet, ResultSet or a custom implementation of ISQLServerBulkRecord for generated data. This functionality is equivalent to the .NET SqlBulkCopy class, with largely the same interface, and should be the fastest way of performing bulk operations short of a server-based BULK INSERT.

EDIT: Example by OP

Below you can find an example use-case that could be used to test the performance of SQLServerBulkCSVFileRecord, a method similar to SQLServerBulkCopy except that it reads from a text file. In my test case, test.txt contained a million rows with "X tab 100"

CREATE TABLE TestTable (Col1 varchar(50), Col2 int);

The table should not have any indexes enabled.

In JAVA

// Make sure to use version 4.2, as SQLServerBulkCSVFileRecord is not included in version 4.1
import com.microsoft.sqlserver.jdbc.*;

long startTime = System.currentTimeMillis();
SQLServerBulkCSVFileRecord fileRecord = null;  

fileRecord = new SQLServerBulkCSVFileRecord("C:\\temp\\test.txt", true);   
fileRecord.addColumnMetadata(1, null, java.sql.Types.NVARCHAR, 50, 0);  
fileRecord.addColumnMetadata(2, null, java.sql.Types.INTEGER, 0, 0);  
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");  
Connection destinationConnection = DriverManager.getConnection("jdbc:sqlserver://Server\\\\Instance:1433", "user", "pass");
SQLServerBulkCopyOptions copyOptions = new SQLServerBulkCopyOptions();  

// Depending on the size of the data being uploaded, and the amount of RAM, an optimum can be found here. Play around with this to improve performance.
copyOptions.setBatchSize(300000); 

// This is crucial to get good performance
copyOptions.setTableLock(true);  

SQLServerBulkCopy bulkCopy =  new SQLServerBulkCopy(destinationConnection);
bulkCopy.setBulkCopyOptions(copyOptions);  
bulkCopy.setDestinationTableName("TestTable");
bulkCopy.writeToServer(fileRecord);

long endTime   = System.currentTimeMillis();
long totalTime = endTime - startTime;
System.out.println(totalTime + "ms");

Using this example, I was able to get insert speeds of up to 30000 rows per second.

Wouter · Accepted Answer · 2016-11-14 13:30:25Z

2

Below is the fastest method I could find that does not use SQLServerBulkCopy. It is a lot slower than SQLServerBulkCopy though. Instead of 30000 rows per second it inserts at 2500 rows per second. For a lot of use-cases, this might still be interesting. The main things to keep in mind is to set AutoCommit to false, use large batches use PreparedStatements, and disable any indexes.

Connection db_connection = DriverManager.getConnection("jdbc:sqlserver://Server\\\\Instance:1433", "User", "Pass");

// This is crucial to getting good performance
db_connection.setAutoCommit(false);

PreparedStatement prepStmt = db_connection.prepareStatement("INSERT INTO TestTable (Col1, Col2) VALUES (?, ?)");
for (int i = 1; i <= 10000; i++) {
    prepStmt.setString(1,"X");            
    prepStmt.setInt(2,100);
    prepStmt.addBatch();
}
prepStmt.executeBatch();
db_connection.commit();

answered Nov 14, 2016 at 13:30

Wouter

2,0373 gold badges30 silver badges34 bronze badges

1 Comment

Panagiotis Kanavos Over a year ago

It's now possible to use bulk copy for batch inserts too

Collectives™ on Stack Overflow

Can I get "BULK INSERT"-like speeds when inserting from Java into SQL Server?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related