Cant connect to Mysql database from pyspark, getting jdbc error

Question

I am learning pyspark, and trying to connect to a mysql database.

But i am getting a java.lang.ClassNotFoundException: com.mysql.jdbc.Driver Exception while running the code. I have spent a whole day trying to fix it, any help would be appreciated :)

I am using pycharm community edition with anaconda and python 3.6.3

Here is my code:

from pyspark import SparkContext,SQLContext
sc= SparkContext()
sqlContext= SQLContext(sc)

df = sqlContext.read.format("jdbc").options(
url ="jdbc:mysql://192.168.0.11:3306/my_db_name",
driver = "com.mysql.jdbc.Driver",
dbtable = "billing",
user="root",
password="root").load()

Here is the error:

py4j.protocol.Py4JJavaError: An error occurred while calling o27.load.
: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

No i i didn't. Could you please direct me about how to do it? i am a newbie in this — rakesht
– rakesht, Commented Feb 28, 2018 at 3:58
I don't know pyspark, maybe the example in this question (and the ones linked from it) are helpful: stackoverflow.com/questions/36326066/… — Mark Rotteveel
– Mark Rotteveel, Commented Feb 28, 2018 at 9:38
Thank you for replying. I ended up running it from the terminal by giving the path explicitly.and it worked — rakesht
– rakesht, Commented Feb 28, 2018 at 12:35
This is in an interactive environment, not a compiled program. It cannot find the driver, you need to look in the documentation to find out how to provide the path or if there is configuration file to state where that particular driver is. — Ged
– Ged, Commented Aug 8, 2018 at 12:17

Kondado · Accepted Answer · 2018-12-18 16:33:29Z

This got asked 9 months ago at the time of writing, but since there's no answer, there it goes. I was in the same situation, searched stackoverflow over and over, tried different suggestions but the answer finally is absurdly simple: You just have to COPY the MySQL driver into the "jars" folder of Spark!

Download here https://dev.mysql.com/downloads/connector/j/5.1.html

I'm using the 5.1 version, although 8.0 exists, but I had some other problems when running the latest version with Spark 2.3.2 (had also other problems running Spark 2.4 on Windows 10).

Once downloaded you can just copy it into your Spark folder E:\spark232_hadoop27\jars\ (use your own drive:\folder_name -- this is just an example)

You should have two files: E:\spark232_hadoop27\jars\mysql-connector-java-5.1.47-bin.jar E:\spark232_hadoop27\jars\mysql-connector-java-5.1.47.jar

After that the following code launched through pyCharm or jupyter notebook should work (as long as you have a MySQL database set up, that is):

import findspark
findspark.init()

import pyspark # only run after findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

dataframe_mysql = spark.read.format("jdbc").options(
    url="jdbc:mysql://localhost:3306/uoc2",
    driver = "com.mysql.jdbc.Driver",
    dbtable = "company",
    user="root",
    password="password").load()

dataframe_mysql.show()

Bear in mind, I'm working currently locally with my Spark setup, so no real clusters involved, and also no "production" kind of code which gets submitted to such a cluster. For something more elaborate this answer could help: MySQL read with PySpark

If you're trying to connect to SQL Server, the process is similar then your read option is .option('driver', 'com.microsoft.sqlserver.jdbc.SQLServerDriver').

David · Accepted Answer · 2019-04-10 01:03:01Z

On my computer, @Kondado 's solution works only if I change the driver in the options:

driver = 'com.mysql.cj.jdbc.Driver'

I am using Spark 8.0 on Windows. I downloaded mysql-connector-java-8.0.15.jar, Platform Independent version from here. And copy it to 'C:\spark-2.4.0-bin-hadoop2.7\jars\'
My code in Pycharm looks like this:

#import findspark # not necessary
#findspark.init() # not necessary
from pyspark import SparkConf, SparkContext, sql
from pyspark.sql import SparkSession
sc = SparkSession.builder.getOrCreate()
sqlContext = sql.SQLContext(sc)
source_df = sqlContext.read.format('jdbc').options(
    url='jdbc:mysql://localhost:3306/database1',
    driver='com.mysql.cj.jdbc.Driver', #com.mysql.jdbc.Driver
    dbtable='table1',
    user='root',
    password='****').load()
print (source_df)
source_df.show()

Nontapat Sumalnop · Accepted Answer · 2020-12-07 14:50:40Z

0

I dont know how to add jar file to ClassPath(can someone tell me how??) so I put it in the SparkSession config and it works fine.

spark = SparkSession \
        .builder \
        .appName('test') \
        .master('local[*]') \
        .enableHiveSupport() \
        .config("spark.driver.extraClassPath", "<path to mysql-connector-java-5.1.49-bin.jar>") \
        .getOrCreate()
df = spark.read.format("jdbc").option("url","jdbc:mysql://localhost/<database_name>").option("driver","com.mysql.jdbc.Driver").option("dbtable",<table_name>).option("user",<user>).option("password",<password>).load()
df.show()

answered Dec 7, 2020 at 14:50

Nontapat Sumalnop

413 bronze badges

Comments

Shinto Joseph · Accepted Answer · 2022-03-15 09:58:34Z

This worked for me, pyspark with mssql

java version is 1.7.0_191

pyspark version is 2.1.2

Download the below jar files

sqljdbc41.jar

mssql-jdbc-6.2.2.jre7.jar

Paste the above jars inside jars folder in the virtual environment

test_env/lib/python3.6/site-packages/pyspark/jars

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Practise').getOrCreate()

url = 'jdbc:sqlserver://your_host_name:your_port;databaseName=YOUR_DATABASE_NAME;useNTLMV2=true;'

df = spark.read.format('jdbc'
        ).option('url', url
        ).option('user', 'your_db_username'
        ).option('password','your_db_password'
        ).option('dbtable', 'YOUR_TABLE_NAME'
        ).option('driver', 'com.microsoft.sqlserver.jdbc.SQLServerDriver'
        ).load()

Collectives™ on Stack Overflow

Cant connect to Mysql database from pyspark, getting jdbc error

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related