1

I am trying to load multiple files via polybase and its throwing me an error while creating view.

select * from dbo.test

fname lname
a.csv null
b.csv null

I am able to successfully create data source and external table without any issue.

Code Error

Error in the code.
Msg 207, Level 16, State 1, Line 4
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.
Invalid column name 'DATA_FILE_NAME'.

CODE

 IF OBJECT_ID('dbo.TEST_V', 'V') IS NOT NULL
DROP VIEW [dbo].[TEST_V]
GO

 -- Create new view
CREATE VIEW [dbo].[TEST_V]
AS

-- Add dynamic T-SQL to temporary table 

SELECT 

-- Auto increment number
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS ROW_ID,

-- Data file name
DATA_FILE_NAME AS DATA_FILE,

-- Drop external table if it exists
'IF EXISTS (SELECT * FROM sys.external_tables WHERE NAME = ' + 
CHAR(39) + SUBSTRING(DATA_FILE_NAME, 1, LEN(DATA_FILE_NAME) - 11) + CHAR(39) + ') ' + 
'DROP EXTERNAL TABLE [dbo].[' + SUBSTRING(DATA_FILE_NAME, 1, LEN(DATA_FILE_NAME) - 11) + '];' AS DROP_STMT,

-- Create new external table
'CREATE EXTERNAL TABLE [dbo].[' + SUBSTRING(DATA_FILE_NAME, 1, LEN(DATA_FILE_NAME) - 11) + '] ' + 
'( ' + 
'fname [varchar] (70), ' +
'lname [varchar] (70) ' +
   ') ' + 
'WITH ' +
'( ' +
'LOCATION=' + CHAR(39) + '/NEW/' + DATA_FILE_NAME + CHAR(39) + ' ' +
', DATA_SOURCE = SIMPLE' +
', FILE_FORMAT = SIMPLE_HEADER' +
', REJECT_TYPE = VALUE ' + 
', REJECT_VALUE = 1 ' +
') ' AS CREATE_STMT,

-- Move data into staging table
'INSERT INTO dbo.fname ' + 
'SELECT ' +
'  fname ,' +
'  lname ' +
'FROM [dbo].[' + SUBSTRING(DATA_FILE_NAME, 1, LEN(DATA_FILE_NAME) - 11) + '] ' +
';' AS INSERT_STMT

FROM 
[dbo].[test];
GO

I am trying to create procedure using the above view to load data using multiple files dynamically.

4
  • So the table dbo.test does not have a column called ... DATA_FILE_NAME. Commented Feb 5, 2019 at 22:34
  • dbo.test contains only fname and lname columns. Commented Feb 6, 2019 at 0:25
  • Have you considered adding the column DATA_FILE_NAME to your table called dbo.test? Or changing all the references in your view which point to DATA_FILE_NAME to one of the columns in your table, say fname? Commented Feb 6, 2019 at 1:20
  • If I change this one DATA_FILE_NAME AS fname, then I am getting same error. Commented Feb 6, 2019 at 2:18

1 Answer 1

2

Since they are all targeting the same table, you don't need to use this file-by-file method. Polybase will load a wildcard-like set of files. If you put all your data files of the same type in a folder, and set the FOLDER as the location, all the files in the folder will be loaded in parallel. You will find your loads hugely faster … at the moment you're processing files in series, using the method I suggest will process them in parallel as fast as readers are available.

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks Ron for your comment but How to achieve incremental loads in PolyBase?
Load into a transient table, then insert into the target table with a WHERE NOT EXISTS clause. A slightly better approach may be to CTAS into a new table from a union of the old table, and a WHERE NOT EXISTS from the new table.
You mean to say, If I have a daily file a_02_05_19.txt and a_02_06_19.txt. You mean to say load both the files into a temporary table and select only today's data from external table and load into permanent table.
No, sorry if I wasn't clear. Put all your files into a folder "a". Set this as your external data source. One CTAS statement will load all files from this folder into a temp table. Then CTAS or INSERT the rows from the temp table into your permanent table, using a WHERE NOT EXISTS clause to ignore the rows already in that permanent table.
Thanks Ron for your response. Let's say if I have 100 files and each file contain 1 million record. If I loaded 99 files already and If I need to insert 100th one then do we need to load all the 100 files and remove the 99 file records from the temp table and load it into permanent table. In this case, How the performance will be?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.