1

Did anyone had luck connecting from Azure Synapse Analytics provisioned Spark Pool to Azure SQL Database?

Problem 1:

I've uploaded the Spark SQL Connector https://github.com/microsoft/sql-spark-connector as a Workspace Library and linked to the Spark Pool. Installing this causes an error when trying to start Spark Pool Session. I get a Livy error. In the Monitor section the error is :

This application failed due to the total number of errors: 1.
Error code 1
LIBRARY_MANAGEMENT_FAILED

Message
[...] Cleaning up the Spark service job because the cluster has failed.

Edit: This actually works on another Spark Pool for no reason. I don't know the root cause but I was able to run this on another pool.

Problem 2: I'm trying to use the TokenLibrary with Azure SQL Linked Service. This code:

conn = TokenLibrary.getConnectionString("MyAzureSQLDev")
print(conn)

Displays something that looks like Base64-encrypted JWT token plus some unknown characters. This is not a connection string.

I am looking for any working solution.

5
  • I've done that recently using a SQL login - have a look here: stackoverflow.com/a/66546617/1527504 Commented Mar 12, 2021 at 9:37
  • SQL Login should be fine. There is an open request to add documentation on how to use Token Library with SQL since, as you have noticed, it doesn't work as expected: github.com/MicrosoftDocs/azure-docs/issues/72077 Commented Mar 22, 2021 at 16:46
  • Yes, I've opened that ticket ;-) I don't want to use SQL user but AAD and managed identity. Let's see what happens on MS side. Commented Mar 23, 2021 at 17:39
  • Hi @PiotrGwiazda, did you get any progresses? Commented Mar 30, 2021 at 2:14
  • We'll use JDBC hopefully with AAD rather than SQL login but store credentials in KV. No response from MS yet. Commented Mar 31, 2021 at 15:26

3 Answers 3

5

TokenLibrary.getConnectionString("MyAzureSQLDev") returns the access token of the workspace identity (MSI). To use the token and write to the database I uploaded sql-spark-connector to workspace packages and wrote this code:

df.write.format("com.microsoft.sqlserver.jdbc.spark") \
    .option("url", "jdbc:sqlserver://%s.database.windows.net:%d" % (sql_server_name, db_port)) \
    .option("dbtable", db_table) \
    .option("accessToken", mssparkutils.credentials.getConnectionStringOrCreds("MyAzureSQLDev")) \
    .option("encrypt", "true") \
    .option("databaseName", db_name) \
    .option("hostNameInCertificate", "*.database.windows.net") \
    .mode("append") \
    .save()

The workspace identity has to be added to Azure SQL Database this way:

CREATE USER [your workspace identity] FROM EXTERNAL PROVIDER;

However...

...the currently released version of sql-spark-connector (version 1.0.1 from November 2020; see here) is not compatible with the current version of Spark 2.4 used by Azure Synapse Analytics. The primary problem is with the version of SQL Server driver - Spark 2.4 on Azure Synapse provides version 8.4.1.jre8, whereas spark-mssql-connector:1.0.1 depends on version 7.2.1.jre8. Hence, installing spark-mssql-connector:1.0.1 on Azure Synapse and running the code above yields NoSuchMethodError when writing batches of data to the database.

Although spark-mssql-connector has not been released in a couple of months, it is still in active development and proper support for Spark 2.4 on Azure Synapse has been added in March 2021. I built the latest version from source and used the produced jar instead of the one on the Maven repo.

Sign up to request clarification or add additional context in comments.

1 Comment

Good answer mate. Are you able to help with the following?stackoverflow.com/questions/67329558/…
1

Just to update @mateharu 's answer, the following works in Synapse "out of the box" as of December 2021:

sql_server_name = "SOMETHING"
db_port = 1433
db_table = "SOMETHING"
db_name = "SOMETHING"
linked_service_name = "LINKEDSERVICENAME"

access_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name)

# Write
df.write.format("com.microsoft.sqlserver.jdbc.spark") \
    .option("url", "jdbc:sqlserver://%s.database.windows.net:%d" % (sql_server_name, db_port)) \
    .option("dbtable", db_table) \
    .option("accessToken", access_token) \
    .option("encrypt", "true") \
    .option("databaseName", db_name) \
    .option("hostNameInCertificate", "*.database.windows.net") \
    .mode("append") \
    .save()

# Read
df2 = spark.read.format("com.microsoft.sqlserver.jdbc.spark") \
    .option("url", "jdbc:sqlserver://%s.database.windows.net:%d" % (sql_server_name, db_port)) \
    .option("dbtable", db_table) \
    .option("accessToken", access_token) \
    .option("encrypt", "true") \
    .option("databaseName", db_name) \
    .option("hostNameInCertificate", "*.database.windows.net") \
    .load()

1 Comment

Is there anyway we can make this work with azure sql managed instance? I get an error "Linked Service Type 'AzureSqlMI' not supported"
0

This answer is just to expand upon the previous two answers. The previous two answers work just fine however require storing information in variables that can be retrieved from the linked service. Here is an example where you can get the server, port and database all from the linked service.

import json
linked_service_name = "MyLinkedServiceName"
access_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name)
linkedServiceProperties = json.loads(mssparkutils.credentials.getPropertiesAll(MyLinkedServiceName))
endPoint = linkedServiceProperties.get('Endpoint')

#Split the EndPoint string at the colon (:) to separate the protocol prefix ('tcp') and the remaining part of the string
_, server_and_port = endPoint.split(':')

 #Further split the server_and_port string at the comma (,) to extract the server name and the port number
sql_server_name, db_port = server_and_port.split(',')

db_name = linkedServiceProperties.get('Database')

db_table = "MySchema.MyTable"


tasks_ct = spark.read.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", "jdbc:sqlserver://%s:%s" % (sql_server_name, db_port)) \
.option("accessToken", access_token) \
.option("dbtable", db_table) \
.option("encrypt", "true") \
.option("databaseName", db_name) \
.option("hostNameInCertificate", "*.database.windows.net") \
.load()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.