0

I currently am facing an issue with loading CSV files directly from blob storage to our Azure SQL instance. I have the credentials correctly set up an able to access the files without any issue. However, when performing the BULK INSERT it fails at a certain row. I have tried the exact same file and the exact same BULK INSERT configuration on our local SQL server and it works without any issue.

    BULK INSERT dbo.this_is_a_table FROM
        'transformed/this_is_a_csv.csv'
    WITH (
        DATA_SOURCE = 'this_is_a_blob_storage_data_source'
        ,FORMAT = 'CSV' 
        ,FIELDTERMINATOR = '|' 
        ,FIELDQUOTE = '"'
        ,ROWTERMINATOR = '0x0a' 
        ,BATCHSIZE=100000 
        ,TABLOCK
        ,FIRSTROW = 2
        ,CODEPAGE = '65001'
    )

I have tried altering the FORMAT to DATAFILETYPE = 'char', tried rotating the ROWTERMINATORS, etc. without any luck so far.

The error is as follows:

Failed to execute query. Error: Bulk load failed due to invalid column value in CSV data file transformed/this_is_a_csv.csv in row 3128, column 10. The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".

Loading the data until MAXROW 3127 works perfectly fine. I checked the CSV file and there is nothing that stands out - basically identical to any other row in the file.

Any ideas or suggestions on what else I could try?

Sample data:

"MANDT"|"EBELN"|"BUKRS"|"BSTYP"|"BSART"|"BSAKZ"|"LOEKZ"|"STATU"|"AEDAT"|"ERNAM"|"LIFNR"|"EKORG"|"WAERS"|"BEDAT"|"FRGGR"|"FRGSX"|"FRGKE"|"FRGZU"|"FRGRL"|"ABSGR"|"PROCSTAT"
"100"|"4620739302"|"1000"|"F"|"UB"|""|""|"L"|"20230118"|"84HD937J3J392H371842CAZEF91374923HDTAZ32"|"0001000001"|"AB01"|"USD"|"20230118"|"A1"|"S1"|"F"|"XX"|""|"00"|"05"
"100"|"5729302783"|"1000"|"F"|"UB"|""|""|"L"|"20230118"|"84HD937J3J392H371842CAZEF91374923HDTAZ32"|"0001000002"|"AB01"|"USD"|"20230118"|"A1"|"S1"|"F"|"XX"|""|"00"|"05"
"100"|"5720562944"|"1000"|"F"|"UB"|""|""|"L"|"20230118"|"84HD937J3J392H371842CAZEF91374923HDTAZ32"|"0001000003"|"AB01"|"USD"|"20230118"|"A1"|"S1"|"F"|"XX"|""|"00"|"05"
"100"|"8401817494"|"1000"|"F"|"UB"|""|""|"L"|"20230118"|"84HD937J3J392H371842CAZEF91374923HDTAZ32"|"0001000004"|"AB01"|"USD"|"20230118"|"A1"|"S1"|"F"|"XX"|""|"00"|"05"
7
  • 1
    If you create a copy of the CSV file and remove that row, now proceed to process that copy of the file, what happen? Does the file get processed successfully or it fails on the next row? Commented Aug 4, 2023 at 21:35
  • 1
    Have you actually looked at that row in a hex editor/viewer? Sounds like there's an invalid byte sequence on that row. A possibility comes to mind: have you been concatenating UTF-8 files together? Could be there's a UTF-8 BOM (Byte Order Mark) on the row that's confusing the reader, BOMs should only appear at the very start of a file. Commented Aug 4, 2023 at 23:15
  • can you provide sample input of rows which getting an error and not getting an error? Commented Aug 5, 2023 at 9:24
  • @AlbertoMorillo I tried removing both the faulty row as well as the row prior. It then fails at the exact same row again, regardless of which on I removed. Commented Aug 5, 2023 at 18:22
  • @PratikLad Added an example to the post. The insert fails on the second row. Commented Aug 5, 2023 at 18:23

2 Answers 2

0

I also tried to load CSV file from storage account to azure SQL server with your sample data and getting similar error.

enter image description here

The error can cause for many reasons like:

  • Incorrect Row Terminators or Field Terminators.
  • Inappropriate size of table columns.

To resolve this error, I check all the size of the columns is align with incoming data and datatypes are also correct for each column. and changed ROWTERMINATOR = '\n'.

BULK INSERT sample2
FROM 'csv1.csv'
WITH (DATA_SOURCE = 'MyAzureBlobStorage'
        ,FORMAT = 'CSV' 
        ,FIELDTERMINATOR = '|' 
        ,FIELDQUOTE = '"'
        ,ROWTERMINATOR = '\n' 
        ,BATCHSIZE=100000 
        ,TABLOCK
        ,FIRSTROW = 0
        ,CODEPAGE = '65001');

OUTPUT:

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the response! A quick search of the raw file gives the following "\n\r" - 0 hits, "\r\n" - 0, "\r" - 0 hits, "\n" - 21168 hits. I have set the datatype and field lengths all to NVARCHAR(MAX) (also tried e.g. VARCHAR(1000) just in case) and ensured that all columns are correctly created and in the correct order (should matter in case of NVARCHAR(MAX) though). Still getting the same error. Also tried '0x0A' instead of '\n' without any luck. Slowly but steadily running out of ideas here..
0

I found this solution that helped me:

change

ROWTERMINATOR = '\n'

to

ROWTERMINATOR = '0x0A'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.