0

I am new to Azure Data Lakes. Currently I am using ADLS Gen2. I have configured pipelines for single source data sets to destination data sets. This is working fine. Then I configured the pipeline for uploading / ingesting whole database from on-prem to ADLS gen2.

Here is the process: Factory studio -> Ingest -> Built-in copy task -> sql server as Data store and configured runtime -> Select All tables (Existing tables) -> and so on.

Then from pipelines when I trigger this activity, it successfully uploads all tables to containers in separate files(based on source table names)

Then when I update data in one of the tables in sources it successfully updates the data in destination file.

But the problem is when I add new table in database and then triggers the activity, this new table is not uploaded. Is there a way to update source data set to include this new table?

I have seen all the properties of source data set and activity in pipeline. Also searched for the solution, but stuck in this scenario.

5
  • when I update data in one of the tables in sources it successfully updates the data in destination file. When you update the tables, are you running the pipeline again? Commented Nov 28, 2022 at 11:17
  • Yes, I rerun the pipeline, Is there a way to get updated schema? Commented Nov 28, 2022 at 11:20
  • 1
    The pipeline has pre-built parameter for table names and hence even when you add new tables to database, this parameters value is not changing and hence it won't be copied to ADLS Commented Nov 28, 2022 at 12:10
  • Yeah, exactly the problem I'm facing. Is there a way to change it dynamically or any other way by which I can set my pipeline so that it tracks new changes in the database.. Commented Nov 28, 2022 at 14:14
  • If you database is Sql Server use a Lookup activity with SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE='BASE TABLE' as the query to get all the tables. If another database then there should be similar. Commented Nov 28, 2022 at 14:44

1 Answer 1

1

To dynamically get list of all tables and copy it to your datalake storage account, you can use the following procedure:

I have used a script activity on my azure SQL database (for demonstration) to get the list of tables in my database using the following query as suggested by @Scott Mildenberger:

SELECT TABLE_SCHEMA,TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE='BASE TABLE'

enter image description here

  • Now in for each loop, Use items value as the output rows of above script activity i.e., @activity('get tables').output.resultSets[0].rows. Inside for each loop, use a copy activity, where source is your database and destination is ADLS sink.

  • I have created 2 parameters for sink destination called schema and table and used them as shown below:

enter image description here

  • In the source settings of copy activity, I have given the values for these dataset parameters as following:
schema: @item().TABLE_SCHEMA
table: @item().TABLE_NAME

enter image description here

  • For ADLS sink, I have created 2 parameters called schema and table and I have used it to create file name dynamically as @{dataset().schema}@{dataset().table}.csv.

enter image description here

  • I have given its values in sink settings of copy data same as above.

enter image description here

  • When I run the pipeline, it will give desired results. The following is a reference image. Even when you add a new table, the query will pick this up and copy it as a file to your ADLS container.

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Right on the mark, Worked perfectly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.