5

I have a dataframe in PySpark (using Databricks) and I want to write this dataframe to a SQL DB (Azure SQL Database in my case). This works fine except that it seems that this triggers a row-by-row insert into the SQL DB which is of course not feasible for 10M+ rows. Is there any way to force PySpark to use Bulk-Inserts instead?

currently I simply use this command:

df.write.jdbc(url=jdbcUrl, table=targetTable, mode="append", properties=connectionProperties)

The code that gets executed on the SQL side looks like this:

(@P0 int,@P1 bit,@P2 bit,@P3 float,@P4 float,@P5 nvarchar(4000),@P6 int,@P7 int,@P8 int)INSERT INTO dbo.MyTable("Index","Sampling10pct","Sampling1pct","Latitude","Longitude","SessionID","Year","Month","Day") VALUES (@P0,@P1,@P2,@P3,@P4,@P5,@P6,@P7,@P8)

3

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.