Connecting from Azure Synapse Analytics Spark Pool to Azure SQL Database

Question

Did anyone had luck connecting from Azure Synapse Analytics provisioned Spark Pool to Azure SQL Database?

Problem 1:

I've uploaded the Spark SQL Connector https://github.com/microsoft/sql-spark-connector as a Workspace Library and linked to the Spark Pool. Installing this causes an error when trying to start Spark Pool Session. I get a Livy error. In the Monitor section the error is :

This application failed due to the total number of errors: 1.
Error code 1
LIBRARY_MANAGEMENT_FAILED

Message
[...] Cleaning up the Spark service job because the cluster has failed.

Edit: This actually works on another Spark Pool for no reason. I don't know the root cause but I was able to run this on another pool.

Problem 2: I'm trying to use the TokenLibrary with Azure SQL Linked Service. This code:

conn = TokenLibrary.getConnectionString("MyAzureSQLDev")
print(conn)

Displays something that looks like Base64-encrypted JWT token plus some unknown characters. This is not a connection string.

I am looking for any working solution.

I've done that recently using a SQL login - have a look here: stackoverflow.com/a/66546617/1527504 — wBob
– wBob, Commented Mar 12, 2021 at 9:37
SQL Login should be fine. There is an open request to add documentation on how to use Token Library with SQL since, as you have noticed, it doesn't work as expected: github.com/MicrosoftDocs/azure-docs/issues/72077 — PerfectlyPanda
– PerfectlyPanda, Commented Mar 22, 2021 at 16:46
Yes, I've opened that ticket ;-) I don't want to use SQL user but AAD and managed identity. Let's see what happens on MS side. — Piotr Gwiazda
– Piotr Gwiazda, Commented Mar 23, 2021 at 17:39
We'll use JDBC hopefully with AAD rather than SQL login but store credentials in KV. No response from MS yet. — Piotr Gwiazda
– Piotr Gwiazda, Commented Mar 31, 2021 at 15:26

devrogs · Accepted Answer · 2021-04-16 11:05:55Z

TokenLibrary.getConnectionString("MyAzureSQLDev") returns the access token of the workspace identity (MSI). To use the token and write to the database I uploaded sql-spark-connector to workspace packages and wrote this code:

df.write.format("com.microsoft.sqlserver.jdbc.spark") \
    .option("url", "jdbc:sqlserver://%s.database.windows.net:%d" % (sql_server_name, db_port)) \
    .option("dbtable", db_table) \
    .option("accessToken", mssparkutils.credentials.getConnectionStringOrCreds("MyAzureSQLDev")) \
    .option("encrypt", "true") \
    .option("databaseName", db_name) \
    .option("hostNameInCertificate", "*.database.windows.net") \
    .mode("append") \
    .save()

The workspace identity has to be added to Azure SQL Database this way:

CREATE USER [your workspace identity] FROM EXTERNAL PROVIDER;

However...

...the currently released version of sql-spark-connector (version 1.0.1 from November 2020; see here) is not compatible with the current version of Spark 2.4 used by Azure Synapse Analytics. The primary problem is with the version of SQL Server driver - Spark 2.4 on Azure Synapse provides version 8.4.1.jre8, whereas spark-mssql-connector:1.0.1 depends on version 7.2.1.jre8. Hence, installing spark-mssql-connector:1.0.1 on Azure Synapse and running the code above yields NoSuchMethodError when writing batches of data to the database.

Although spark-mssql-connector has not been released in a couple of months, it is still in active development and proper support for Spark 2.4 on Azure Synapse has been added in March 2021. I built the latest version from source and used the produced jar instead of the one on the Maven repo.

Good answer mate. Are you able to help with the following?stackoverflow.com/questions/67329558/…

GaZ · Accepted Answer · 2021-12-20 14:44:42Z

1

Just to update @mateharu 's answer, the following works in Synapse "out of the box" as of December 2021:

sql_server_name = "SOMETHING"
db_port = 1433
db_table = "SOMETHING"
db_name = "SOMETHING"
linked_service_name = "LINKEDSERVICENAME"

access_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name)

# Write
df.write.format("com.microsoft.sqlserver.jdbc.spark") \
    .option("url", "jdbc:sqlserver://%s.database.windows.net:%d" % (sql_server_name, db_port)) \
    .option("dbtable", db_table) \
    .option("accessToken", access_token) \
    .option("encrypt", "true") \
    .option("databaseName", db_name) \
    .option("hostNameInCertificate", "*.database.windows.net") \
    .mode("append") \
    .save()

# Read
df2 = spark.read.format("com.microsoft.sqlserver.jdbc.spark") \
    .option("url", "jdbc:sqlserver://%s.database.windows.net:%d" % (sql_server_name, db_port)) \
    .option("dbtable", db_table) \
    .option("accessToken", access_token) \
    .option("encrypt", "true") \
    .option("databaseName", db_name) \
    .option("hostNameInCertificate", "*.database.windows.net") \
    .load()

answered Dec 20, 2021 at 14:44

GaZ

2,40625 silver badges47 bronze badges

1 Comment

Shashank Karam Over a year ago

Is there anyway we can make this work with azure sql managed instance? I get an error "Linked Service Type 'AzureSqlMI' not supported"

Travis · Accepted Answer · 2023-09-19 20:03:02Z

This answer is just to expand upon the previous two answers. The previous two answers work just fine however require storing information in variables that can be retrieved from the linked service. Here is an example where you can get the server, port and database all from the linked service.

import json
linked_service_name = "MyLinkedServiceName"
access_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name)
linkedServiceProperties = json.loads(mssparkutils.credentials.getPropertiesAll(MyLinkedServiceName))
endPoint = linkedServiceProperties.get('Endpoint')

#Split the EndPoint string at the colon (:) to separate the protocol prefix ('tcp') and the remaining part of the string
_, server_and_port = endPoint.split(':')

 #Further split the server_and_port string at the comma (,) to extract the server name and the port number
sql_server_name, db_port = server_and_port.split(',')

db_name = linkedServiceProperties.get('Database')

db_table = "MySchema.MyTable"


tasks_ct = spark.read.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", "jdbc:sqlserver://%s:%s" % (sql_server_name, db_port)) \
.option("accessToken", access_token) \
.option("dbtable", db_table) \
.option("encrypt", "true") \
.option("databaseName", db_name) \
.option("hostNameInCertificate", "*.database.windows.net") \
.load()

Collectives™ on Stack Overflow

Connecting from Azure Synapse Analytics Spark Pool to Azure SQL Database

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related