2

Background

We have a VB.NET application that loads text files into database tables. Text files can be in a variety of formats (Flat file (fixed column width), character delimited) and can have complex structures (e.g. variable number of columns). The format and structure of the file being imported is known in advance.

Before commencing file import, the application loads the "file format/structure" configuration settings. The application, then, reads the file and builds an SQL string (INSERT( [col list] ) VALUES( 'value1', 'value2', .. ), ..), which is then sent to the target table.
Update: Because all data comes as text (from a text file) the application also inserts it as text into a database, letting database do all the implicit conversion work. Having said above, there is some formatting being done (e.g. American style 06-05-2016 would be converted to 2016-06-05) based on configuration settings for a give type of file.

Target tables columns can have non-VARCHAR data types e.g. INT, DATETIME, DECIMAL( x, y ) etc. So inserting string data causes implicit VARCHAR->other data type conversions.
It is of note that SQL Server is able to convert empty strings to default values for certain types (the most notable examples are '' -> '1900-01-01' are '' -> '0')

This application is (was) a bit of a spaghetti monster, which I had to untangle before I could make my changes. As part of this "untanglement process" I was looking to replace SQL string building with SqlBulkCopy (using DataTable as cache object).

Question

SqlBulkCopy seems to first convert strings into CLR types before sending the data over to SQL Server. CLR types have different rules for implicit string to non-string type conversions so my data cannot get imported as-is (even though it imports just fine using the above SQL String building approach).

Is there a way to work around this?

What I have found so far: This question mentions custom “clean up” functions may need to be implemented. SqlBulkCopy - The given value of type String from the data source cannot be converted to type money of the specified target column
IMO custom "clean up" / coversion functions solution to replicate SQL Server behaviour is "akward".

I have also tried to play around with DataTable column types but to no avail:

SQL Table is declared as follows:

CREATE TABLE LoadTable( IntVal INT NULL, BitVal BIT NULL, DateVal DATETIME NULL, DecVal DECIMAL( 12, 3 ) NULL )

.NET Code:

'DTColumns(1) = New System.Data.DataColumn("BitVal", GetType(SqlTypes.SqlString))
'DTColumns(1) = New System.Data.DataColumn("BitVal", GetType(System.String))
DTColumns(1) = New System.Data.DataColumn("BitVal")
CacheRow.Item("BitVal") = "1"
BulkCopier.WriteToServer(_DataCache)

But always get Exceptions telling me that cannot convert value to Boolean:

The given value of type String from the data source cannot be converted to type bit of the specified target column.
Failed to convert parameter value from a String to a Boolean.

Another one:

Failed to convert parameter value from a SqlString to a Boolean.

Implicit default values do not work either (do NOT confuse with DEFAULT CONSTRAINT):

DTColumns(2) = New System.Data.DataColumn("DateVal", GetType(SqlTypes.SqlString))
CacheRow.Item("DateVal") = ""

On the other had, this works in SQL:

INSERT LoadTable( IntVal, BitVal, DateVal )
SELECT '', '', ''
/* Result
IntVal  BitVal  DateVal
0       0       1900-01-01 00:00:00.000
*/

It clearly proves that SqlBulkCopy internally converts an incoming data type to a CLR equivalent destination type.

Bonus question: anyone knows the reason why SqlBulkCopy must convert everything to a CLR type? Seems strange given it is writing data to a non CLR target.

Clarification: I fully understand that data types have to be converted (implicitly or explicitly). I am looking for a solution to convert strings to other types in exacly the same way as SQL Server does. My problem is that .NET does it differently, so simply using Integer.Parse (or equivalents) will fail in some cases, where in SQL the same conversion succeeds. I have provided a few examples where such conversions fail but there are plenty more that I know of (and still more that I don't).

23
  • You could import to a staging table first consisting of all VARCHAR columns, then execute a stored proc that imports to the target table and performs all the conversions explicitly. Commented Jun 23, 2016 at 10:15
  • Just to be sure I understand the issue - you provide a "1" and want it into a BIT or varchar column? Commented Jun 23, 2016 at 10:17
  • @Chris Pickford, thanks for reply. I could but that would require creating "custom" staging tables / procedures for every file type OR lots of dynamic code. I should also mention that this program loads to multiple different servers / databases. Commented Jun 23, 2016 at 10:18
  • @Allan S. Hansen, one is supposed to go into a BIT column but it comes as string from a text file. There are also "problems" with empty string to default value conversions ('' -> '1900-01-01' and '' -> '0') which do not work in .NET. Commented Jun 23, 2016 at 10:20
  • For the "default" values - you most likely need to handle themselves in your datatable, so an empty string gets mapped to a value you want. Also - have you tested having "true" as a value instead of 1? Commented Jun 23, 2016 at 10:28

2 Answers 2

3

No, there is no (direct) way to work around this and keep the bulk copy. SqlBulkCopy contains code that analyzes the target table and performs its own conversions, and if these are improper according to its rules, it will complain. There is no flag or setting that will convince it not to do this. Source: the source.

As to your bonus question, why does it need to do this? Well, it doesn't, really. It could just send over the data, with exactly the types you define, and have SQL Server figure it out. It could simply send over, say, an SqlInt32 as an INT, and then have SQL Server figure out that you actually want 2 converted to the BIT value 1. The protocol allows for this, but SqlBulkCopy insists on having its ducks in a row before it passes the data on to the server. This isn't completely unreasonable, because the alternative is ignoring the target types and using conservative and overly broad types to convert to before sending over data (all strings would need to be NVARCHAR(MAX), all DateTime as DATETIME2, just in the off chance the target column needs to fit the whole precision), which reduces performance, or else requiring the programmer to specify, down to the exact length and scale, what the SQL types are supposed to be, which is tedious, error-prone, and would yield the obvious question "why doesn't it just ask the server"? And, well, that's what it does now. To do what you want, it would need a separate mode where it checks if you're "really sure" about using a different exact data type than the target, just in case you wanted to leave it to SQL Server do the conversion for you. I guess that feature just didn't make the cut.

Mostly it's considered a Good Thing to do your own conversions to the exact data type in advance. Assuming that "SQL Server knows best" is terrible practice to begin with, because the conversions SQL Server applies are only partially documented, language-dependent, and often downright weird. The example you cite, that '' is converted to 1900-01-01, is a clear argument against relying on them. Rather than assuming your application definitely needs bug-for-bug compatibility, it's probably better to review if you can switch to a properly typed data model, with the few exceptions that may be implicitly relied on explicitly implemented. Maybe your application is counting on SQL Server converting True to the BIT value 1. Maybe it's even relying on it converting 2 to 1. Maybe it even needs the conversion from -,. to the MONEY value 0.00. Maybe, but it probably doesn't, and if it does, it probably shouldn't.

Here endeth the sermon. What if for some ungodly reason you needed this anyway?

First of all, you could simply not use SqlBulkCopy. Generating a huge INSERT statement is very inefficient (the T-SQL parser does not like big statements), but you could issue a lot of individual (parameterized!) INSERT statements and wrap these in a transaction. This is less efficient than a bulk copy, but still far more efficient than individual inserts outside a transaction or ginormous statements, and may be good enough for your purposes.

You could create a staging table (but not a #temporary table) with the desired columns and make all columns (N)VARCHAR(MAX). Then you could SqlBulkCopy into that, and then have SQL Server do the conversions from this table. This can be made to work even if you don't know the columns in advance, with the help of some dynamic SQL. The main objection is that you need additional space for the staging table, and the final update/insert operation isn't bulk logged. In short, this is quite inefficient. It may very well be less efficient than doing the inserts directly, despite the bulk copy step.

You could have your application spit out a flat file with the desired columns and all values in string format, then programmatically invoke bcp. Of course this requires installing bcp(it can be installed separate from SQL Server, as part of the Command Line Utilities package) and you may lose a chunk of performance writing the intermediate files, but it will probably still beat generating huge INSERT statements or intermediate tables. As long as you specify that everything is a (N)VARCHAR in your input, bcp will dutifully leave the conversions to SQL Server.

Last but not least, for completeness, you could conceivably write your own code to wrap around the native bulk copy functions, which don't have this restriction. But that would be quite a bit of work, and all that interop is also not good for performance. SqlBulkCopy doesn't wrap around these functions -- it's a from-scratch implementation of the bulk load operation of TDS, also known as the mysterious INSERT BULK statement. I recommend writing your own implementation of that even less, but now that .NET is open source, you could (say) fork .NET Core and have your own version of SqlRawBulkCopy. With blackjack and hookers.

Sign up to request clarification or add additional context in comments.

7 Comments

Many thanks for posting this comprehensive answer. I searched for more info re "Generating a huge INSERT statement is very inefficient ...." and found this comprehensive article: simple-talk.com/sql/performance/… . Reading this article leads me to believe that the current method used by my app is not bad at all (we are using 2008 R2), though tuning will be necessary on case by case basis. I was surprised by the TVP results though (previously I read TVP was only marginally slower than BulkInsert).
One note for future reference is that "Table Value Constructor" (VALUES()()) is llimited to 1000 rows as per msdn.microsoft.com/en-us/library/dd776382.aspx
Just to clarify: the app already uploads files in multiple "batches" as some files are 500MB+. Rows are cached until "MaxRowsToCache" is exceeded at which point SQL string is buit and sent to the server. From your answer and the link I have found I can see that for each file type I must tune "MaxRowsToCache".
RE "sermon": This is how I have inherited it. Changing is hard: as you say, SQL Server conversion rules are poorly documented, so for me to replicate this behaviour in .NET is very risky. At the same time I do not fully buy into "data has to satisfy .NET validation/conversion limitations". This data is consumed by batch processing logic in SQL Server (.NET is only a minor transit point) and therefore has to meet SQL Server rules but I fail to see a reason why it needs to meet .NET rules. When you use SQL Server native import (e.g. bcp) SQL Server rules are used.
@Alex: that's a great article. I'll delve into it when I have the time, to see what parts of my answer need to be revised. I've had bad experiences with big statements in the past, but I've never had to work around it with anything other than bulk insert.
|
1

I know this is an old question but hopefully this helps someone, I found this IBM Q&A ticket.

You can create a schema.ini file in the same folder as the file you're reading from, so you can force the SQLBulkCopy to handle the fields as strings and let SQL handle the conversions

Example:

[FILENAME.csv]
Format=Delimited()
Col1=COL1NAME Text Width 100
Col2=COL2NAME Text Width 100
...

1 Comment

Seeing as that's from IBM, who produce the DB2 database, I suspect that it does not apply to SQL Server, from Microsoft

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.