3

I have CSV files that I want to copy from a blob to DW, the CSV files have comma after the last column (see example below). Using ADF, I tried to copy csv files to a SQL table in DW. However, I got this error, which I think it's because of the last comma (as I have 15 columns):

enter image description here

few rows of csv file:

Code,Last Trading Date,Bid Price,Bid Size,Ask Price,Ask Size,Last Price,Traded Volume,Open Price,High Price,Low Price,Settlement Price,Settlement Date,Implied Volatility,Last Trade Time,
BNH2021F,31/03/2021,37.750000,1,38.000000,1,,0,,,,37.750000,29/03/2021,,,
BNM2021F,30/06/2021,44.500000,6,44.700000,2,44.400000,4,44.300000,44.400000,44.300000,44.500000,29/03/2021,,15-55-47.000,
BNU2021F,30/09/2021,46.250000,2,47.000000,1,47.490000,2,47.490000,47.490000,47.490000,46.920000,29/03/2021,,15-59-10.000,

Note that CSVs are the original files and I can't change them. I also tried different Quote and Escape characters in the dataset and it didn't work. Also I want to do this using ADF, not azure functions.

I couldn't find any solution to that, please help.

Update: It's interesting that the dataset preview works: enter image description here

3
  • Hey @mas, Since you have a SQL DW as the sink, you can use an External table polybase to read the blob file and via ADF leverage the external table to copy data. In case of a CSV, an additional column at the end would fail your SQL DW select query Commented Mar 30, 2021 at 5:47
  • thanks but this is the answer to a different question Commented Mar 30, 2021 at 6:05
  • Generally speaking, the end of the CSV in the blob cannot be a comma, and adf will interpret it as null. Commented Mar 30, 2021 at 8:19

4 Answers 4

1

I think you can use data flow to achieve that.

  1. Azure data factory will interpret last comma as a column with null value. So we can use Select activity to filter last column. enter image description here

  2. Set mapping manually at sink. enter image description here

  3. Then we can sink to our DW or SQL table. enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

good solution, thanks! But I still have one issue "Linked service with Self-hosted Integration runtime is not supported in data flow." Our connection to storage account is through a Vnet :(
The Data Flow source dataset must use a Linked Service that uses an Azure IR, not a self-hosted IR. :(
yep, that's my issue. Thanks for your answer
1

You are using 15 columns and your destination is expecting 16. Add another column to your CSV or modify your DW.

Comments

1

There is a simple solution to this.

Step 1:

Uncheck the "First Row as header" option in your source dataset enter image description here

Step 2: Sink it first to another CSV file. in the sink csv dataset import schema like below. Copy activity will create a new CSV file with all clean 15 columns i.e. last extra comma will not be present in new csv file.

Click here to see image of mapping setting

Step 3: Copy from the newly created csv file with "First row as header" checked and sick it to DW.

Comments

0

I had the same issue with the last column from a csv file being empty col1,col2,col3

"a","b",

"c","d",

The pipeline is dynamic so I can't do manual mapping.

In my case, I solved the issue by replacing NULL value in the dataset with ""

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.