Azure Data Factory: Ingest - from Delta table to Postgres

Question

I am currently creating an ingest pipeline to copy data from a delta table to a postgres table. When selecting the sink, I am asked to enable staging.

Direct copying data from Azure Databricks Delta Lake is only supported when sink dataset is DelimitedText, Parquet or Avro with Azure Blob Storage linked service or Azure Data Lake Storage Gen2, for other dataset or linked service, please enable staging

This will turn my pipeline into a 2 step process where my delta table data is copied to a staging location and then from there it is inserted into postgres. How can I take the delta table data and directly load it directly into postgres using an ingest pipeline in ADF without staging? Is this possible?

Do you have databricks in your environment? if so, simply ready that delta table location and write back into postgres through jdbc connector . ref : stackoverflow.com/questions/38825836/… — Karthikeyan Rasipalay Durairaj
– Karthikeyan Rasipalay Durairaj, Commented Nov 10, 2021 at 23:47
We do use databricks. We just wanted to see if creating and injest pipeline without databricks would be faster. When I tried to create the injest pipeline I ran into the issue I posted. — JS noob
– JS noob, Commented Nov 12, 2021 at 16:20
If the answer was helpful , You can Accept it as an Answer, so that others who encounter the same issue can find this solution and fix their problem. — Abhishek Khandave
– Abhishek Khandave, Commented Nov 24, 2021 at 7:15

Abhishek Khandave · Accepted Answer · 2021-11-11 11:35:18Z

1

As suggested by @Karthikeyan Rasipalay Durairaj in comments, you can directly copy data from databricks to postgresql

To copy data from Azure databricks to postgresql use below code -

df.write().option('driver', 'org.postgresql.Driver').jdbc(url_connect, table, mode, properties)

Staged copy from delta lake

When your sink data store or format does not match the direct copy criteria, It enables the built-in staged copy using an interim Azure storage instance. The staged copy feature also provides you better throughput. The service exports data from Azure Databricks Delta Lake into staging storage, then copies the data to sink, and finally cleans up your temporary data from the staging storage.

Direct copy from delta lake

If your sink data store and format meet the criteria described below, you can use the Copy activity to directly copy from Azure Databricks Delta table to sink.

• The sink linked service is Azure Blob storage or Azure Data Lake Storage Gen2. The account credential should be pre-configured in Azure Databricks cluster configuration.

• The sink data format is of Parquet, delimited text, or Avro with the following configurations, and points to a folder instead of file.

• In the Copy activity source, additionalColumns is not specified.

• If copying data to delimited text, in copy activity sink, fileExtension need to be ".csv".

Refer this documentation

answered Nov 11, 2021 at 11:35

Abhishek Khandave

3,2621 gold badge9 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

SenseiH Over a year ago

Thanks, I found this answer helpful. However, the syntax is actually df.write.option('driver', 'org.postgresql.Driver').jdbc(url_connect, table, mode, properties).

Collectives™ on Stack Overflow

Azure Data Factory: Ingest - from Delta table to Postgres

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related