Update the source Dataset of Azure Data factory built-in copy task

Question

I am new to Azure Data Lakes. Currently I am using ADLS Gen2. I have configured pipelines for single source data sets to destination data sets. This is working fine. Then I configured the pipeline for uploading / ingesting whole database from on-prem to ADLS gen2.

Here is the process: Factory studio -> Ingest -> Built-in copy task -> sql server as Data store and configured runtime -> Select All tables (Existing tables) -> and so on.

Then from pipelines when I trigger this activity, it successfully uploads all tables to containers in separate files(based on source table names)

Then when I update data in one of the tables in sources it successfully updates the data in destination file.

But the problem is when I add new table in database and then triggers the activity, this new table is not uploaded. Is there a way to update source data set to include this new table?

I have seen all the properties of source data set and activity in pipeline. Also searched for the solution, but stuck in this scenario.

when I update data in one of the tables in sources it successfully updates the data in destination file. When you update the tables, are you running the pipeline again? — Saideep Arikontham
– Saideep Arikontham, Commented Nov 28, 2022 at 11:17
Yes, I rerun the pipeline, Is there a way to get updated schema? — mohammadAli
– mohammadAli, Commented Nov 28, 2022 at 11:20
The pipeline has pre-built parameter for table names and hence even when you add new tables to database, this parameters value is not changing and hence it won't be copied to ADLS — Saideep Arikontham
– Saideep Arikontham, Commented Nov 28, 2022 at 12:10
Yeah, exactly the problem I'm facing. Is there a way to change it dynamically or any other way by which I can set my pipeline so that it tracks new changes in the database.. — mohammadAli
– mohammadAli, Commented Nov 28, 2022 at 14:14
If you database is Sql Server use a Lookup activity with SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE='BASE TABLE' as the query to get all the tables. If another database then there should be similar. — Scott Mildenberger
– Scott Mildenberger, Commented Nov 28, 2022 at 14:44

Saideep Arikontham · Accepted Answer · 2022-11-29 05:22:04Z

1

To dynamically get list of all tables and copy it to your datalake storage account, you can use the following procedure:

I have used a script activity on my azure SQL database (for demonstration) to get the list of tables in my database using the following query as suggested by @Scott Mildenberger:

SELECT TABLE_SCHEMA,TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE='BASE TABLE'

enter image description here

Now in for each loop, Use items value as the output rows of above script activity i.e., @activity('get tables').output.resultSets[0].rows. Inside for each loop, use a copy activity, where source is your database and destination is ADLS sink.
I have created 2 parameters for sink destination called schema and table and used them as shown below:

enter image description here

In the source settings of copy activity, I have given the values for these dataset parameters as following:

schema: @item().TABLE_SCHEMA
table: @item().TABLE_NAME

enter image description here

For ADLS sink, I have created 2 parameters called schema and table and I have used it to create file name dynamically as @{dataset().schema}@{dataset().table}.csv.

enter image description here

I have given its values in sink settings of copy data same as above.

enter image description here

When I run the pipeline, it will give desired results. The following is a reference image. Even when you add a new table, the query will pick this up and copy it as a file to your ADLS container.

enter image description here

answered Nov 29, 2022 at 5:22

Saideep Arikontham

6,1922 gold badges6 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mohammadAli Over a year ago

Right on the mark, Worked perfectly.

Collectives™ on Stack Overflow

Update the source Dataset of Azure Data factory built-in copy task

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related