0

I am using python 2.7 with ubuntu and running spark via a python script using a sparkcontext

My db is a remote mysql, with a username and password.

I try to query it using this code

sc = createSparkContext()
sql = SQLContext(sc)
df = sql.read.format('jdbc').options(url='jdbc:mysql://ip:port?user=user&password=password', dbtable='(select * from tablename limit 100) as tablename').load()
print df.head()

And get this error

py4j.protocol.Py4JJavaError: An error occurred while calling o32.load. : java.sql.SQLException: No suitable driver

I found that I need the JDBC driver for mysql.

I downloaded the platform free one from here

I tried including it using this code in starting the spark context

conf.set("spark.driver.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")

and tried to install it using

sudo apt-get install libmysql-java

on the master machine, on the db machine and on the machine running the python script with no luck.

edit2

#

i tried using

conf.set("spark.executor.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")

seems by the output of

print sc.getConf().getAll()

which is

[(u'spark.driver.memory', u'3G'), (u'spark.executor.extraClassPath', u'file:///var/nfs/general/mysql-connector-java-5.1.43.jar'), (u'spark.app.name', u'spark-basic'), (u'spark.app.id', u'app-20170830'), (u'spark.rdd.compress', u'True'), (u'spark.master', u'spark://127.0.0.1:7077'), (u'spark.driver.port', u''), (u'spark.serializer.objectStreamReset', u'100'), (u'spark.executor.memory', u'2G'), (u'spark.executor.id', u'driver'), (u'spark.submit.deployMode', u'client'), (u'spark.driver.host', u''), (u'spark.driver.cores', u'3')]

that it includes the correct path, but still i get the same "no driver" error...

What am i missing here?

Thanks

0

2 Answers 2

1

You need to set the classpath for both driver and worker nodes. Add the following to spark configuration

conf.set("spark.executor.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")
conf.set("spark.driver.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")

Or you can pass it using

import os
os.environ['SPARK_CLASSPATH'] = "/path/to/driver/mysql.jar"

For spark >=2.0.0 you can add the comma separated list of jars to spark-defaults.conf file located in spark_home/conf directory like this

spark.jars     path_2_jar1,path_2_jar2
Sign up to request clarification or add additional context in comments.

Comments

0
from pyspark.sql import SparkSession
spark = SparkSession\
    .builder\
    .appName("Word Count")\
    .config("spark.driver.extraClassPath", "/home/tuhin/mysql.jar")\
    .getOrCreate()

dataframe_mysql = spark.read\
    .format("jdbc")\
    .option("url", "jdbc:mysql://ip:port/db_name")\
    .option("driver", "com.mysql.jdbc.Driver")\
    .option("dbtable", "employees").option("user", "root")\
    .option("password", "12345678").load()

print(dataframe_mysql.columns)

"/home/tuhin/mysql.jar" is the location of mysql jar file

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.