0

I am currently creating an ingest pipeline to copy data from a delta table to a postgres table. When selecting the sink, I am asked to enable staging.

Direct copying data from Azure Databricks Delta Lake is only supported when sink dataset is DelimitedText, Parquet or Avro with Azure Blob Storage linked service or Azure Data Lake Storage Gen2, for other dataset or linked service, please enable staging

This will turn my pipeline into a 2 step process where my delta table data is copied to a staging location and then from there it is inserted into postgres. How can I take the delta table data and directly load it directly into postgres using an ingest pipeline in ADF without staging? Is this possible?

3
  • Do you have databricks in your environment? if so, simply ready that delta table location and write back into postgres through jdbc connector . ref : stackoverflow.com/questions/38825836/… Commented Nov 10, 2021 at 23:47
  • We do use databricks. We just wanted to see if creating and injest pipeline without databricks would be faster. When I tried to create the injest pipeline I ran into the issue I posted. Commented Nov 12, 2021 at 16:20
  • If the answer was helpful , You can Accept it as an Answer, so that others who encounter the same issue can find this solution and fix their problem. Commented Nov 24, 2021 at 7:15

1 Answer 1

1

As suggested by @Karthikeyan Rasipalay Durairaj in comments, you can directly copy data from databricks to postgresql

To copy data from Azure databricks to postgresql use below code -

df.write().option('driver', 'org.postgresql.Driver').jdbc(url_connect, table, mode, properties)

Staged copy from delta lake

When your sink data store or format does not match the direct copy criteria, It enables the built-in staged copy using an interim Azure storage instance. The staged copy feature also provides you better throughput. The service exports data from Azure Databricks Delta Lake into staging storage, then copies the data to sink, and finally cleans up your temporary data from the staging storage.

Direct copy from delta lake

If your sink data store and format meet the criteria described below, you can use the Copy activity to directly copy from Azure Databricks Delta table to sink.

• The sink linked service is Azure Blob storage or Azure Data Lake Storage Gen2. The account credential should be pre-configured in Azure Databricks cluster configuration.

• The sink data format is of Parquet, delimited text, or Avro with the following configurations, and points to a folder instead of file.

• In the Copy activity source, additionalColumns is not specified.

• If copying data to delimited text, in copy activity sink, fileExtension need to be ".csv".

Refer this documentation

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, I found this answer helpful. However, the syntax is actually df.write.option('driver', 'org.postgresql.Driver').jdbc(url_connect, table, mode, properties).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.