I developed azure data factory pipeline/data flow to compare source data with existing data in delta table it matched 150 million of records. Data flow can't handle update huge volume of rows. Pipeline is error out the below message.
Job failed due to reason: at Sink 'updateDeltaTable': Failed to execute dataflow with internal server error, please retry later. If issue persists, please contact Microsoft support for further assistance.
Operation on target ForLoopControlTable failed: Activity failed because an inner activity failed; Inner activity name: DFCallSoftDelete, Error: {"StatusCode":"DF-Executor-InternalServerError","Message":"Job failed due to reason: at Sink 'updateDeltaTable': Failed to execute dataflow with internal server error, please retry later. If issue persists, please contact Microsoft support for further assistance","Details":"org.apache.spark.SparkException: Job aborted due to stage failure: Task 179 in stage 11.0 failed 1 times, most recent failure: Lost task 179.0 in stage 11.0 (TID 1910) (vm-67d14591 executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Container from a bad node: container_1698779096775_0001_01_000004 on host: vm-67d14591. Exit status: 143. Diagnostics: [2023-10-31 19:11:37.455]Container killed on request. Exit code is 143\n[2023-10-31 19:11:37.457]Container exited with a non-zero exit code 143. \n[2023-10-31 19:11:37.464]Killed by external signal\n.\nDriver stacktrace:\n\tat org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2313)\n\tat org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2262)\n\tat org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2261)\n\tat scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)\n\tat scala.collection.mutable.ResizableArray.for"}
Already google people says it needs to increase compute nodes.
FYI. Historical comparison, first time only huge volume subsequently it will be reduced.
Is there way we can update chunks of records like partition by year,Key or something else to achieve this.
Can you please suggest best way to implement.
DataFlow diagram


