Orchestration control in Data Pipeline in Azure

Question

I am designing the Data Pipeline which consumes data from Salesforce using bulk API endpoint (pull mechanism).

The data comes and lands in an ADLS Gen2 Bronze Layer.

Next transformation job will start and clean the data and push to Silver layer ADLS Gen2. The transformation will be performed by Databricks.

Push the clean records to ADLS Gen2 Silver layer, then using Databricks, I push the clean records to another Databricks environment.

My questions are :

How to handle orchestration?

I have to pull one time full data records, then every 1 hour incremental records where records detected if it is not already present.

Then how to make sure once all the records have arrived, the transformation starts? The records were processed using Databricks.
How to make sure the next step after processing is push records in ADLS Gen2 Silver?
And lastly, how does Databricks know it has to move those records to instance B Databricks as shown in figure?

May someone please suggest how to achieve this.

Which option is scalable, reliable and can handle high throughput?

Option #1: connect and ingest using Azure function, orchestrated through ADF, Bronze to silver using Databricks
Option #2: connect and ingest using Databricks, orchestrated through ADF, Bronze to silver using Databricks [Native Databricks connector to SF Lakeflow]
Option #3: connect and ingest using ADF, orchestrated through ADF, Bronze to silver using Databricks [Native ADF connector to SF]
Option #4: connect and ingest using Databricks, orchestrated through Databricks, Bronze to silver using Databricks [no ADF at all]

Image : Logical Flow

Thanks a lot.

Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. — Community
– Community Bot, Commented Jul 15 at 14:35

Patrick R · Accepted Answer · 2025-07-16 03:20:58Z

1

For your scenario, the best approach is option 1 : use an azure function for ingestion, orchestrated end-to-end with azure data factory (adf), then transform bronze to silver using databricks.

adf will handle the full orchestration: triggering the azure function, checking file arrival in adls bronze, and kicking off databricks jobs only when all data is ready.
inside databricks, use a simple control table to log which files are processed this ensures transformations only run on complete data.
when transformation finishes, let adf move the cleaned data forward or trigger the next databricks job this chaining is reliable, scalable, and easy to monitor.

this pattern keeps ingestion, transformation, and orchestration loosely coupled but fully automated ideal for high-throughput pipelines.

references:

custom activities with azure function

answered Jul 16 at 3:20

Patrick R

7,0061 gold badge26 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Binod Kumar Jul 16 at 11:22

Out of the below 4 option: Opt1: Connect and ingest using Azure function, orchestrated through ADF, Bronze to silver using Databricks Opt2: Connect and ingest using Databricks, orchestrated through ADF, Bronze to silver using Databricks [Native Databricks connector to SF Lakeflow] Opt3: Connect and ingest using ADF, orchestrated through ADF, Bronze to silver using Databricks [Native ADF connector to SF] Opt4: Connect and ingest using Databricks, orchestrated through Databricks, Bronze to silver using Databricks [no ADF]. which one should I prefer and what could be pros and cons. Please hel

Patrick R Jul 16 at 11:42

option 1, is the strongest for most cases: azure function + adf gives you clean orchestration, flexible retry logic, easy monitoring, and full separation of concerns. databricks focuses only on heavy processing this keeps your pipeline modular and easy to scale.

Collectives™ on Stack Overflow

Orchestration control in Data Pipeline in Azure

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related