I am using python 2.7 with ubuntu and running spark via a python script using a sparkcontext
My db is a remote mysql, with a username and password.
I try to query it using this code
sc = createSparkContext()
sql = SQLContext(sc)
df = sql.read.format('jdbc').options(url='jdbc:mysql://ip:port?user=user&password=password', dbtable='(select * from tablename limit 100) as tablename').load()
print df.head()
And get this error
py4j.protocol.Py4JJavaError: An error occurred while calling o32.load. : java.sql.SQLException: No suitable driver
I found that I need the JDBC driver for mysql.
I downloaded the platform free one from here
I tried including it using this code in starting the spark context
conf.set("spark.driver.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")
and tried to install it using
sudo apt-get install libmysql-java
on the master machine, on the db machine and on the machine running the python script with no luck.
edit2
#i tried using
conf.set("spark.executor.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")
seems by the output of
print sc.getConf().getAll()
which is
[(u'spark.driver.memory', u'3G'), (u'spark.executor.extraClassPath', u'file:///var/nfs/general/mysql-connector-java-5.1.43.jar'), (u'spark.app.name', u'spark-basic'), (u'spark.app.id', u'app-20170830'), (u'spark.rdd.compress', u'True'), (u'spark.master', u'spark://127.0.0.1:7077'), (u'spark.driver.port', u''), (u'spark.serializer.objectStreamReset', u'100'), (u'spark.executor.memory', u'2G'), (u'spark.executor.id', u'driver'), (u'spark.submit.deployMode', u'client'), (u'spark.driver.host', u''), (u'spark.driver.cores', u'3')]
that it includes the correct path, but still i get the same "no driver" error...
What am i missing here?
Thanks