pyspark using mysql database on remote machine

Question

I am using python 2.7 with ubuntu and running spark via a python script using a sparkcontext

My db is a remote mysql, with a username and password.

I try to query it using this code

sc = createSparkContext()
sql = SQLContext(sc)
df = sql.read.format('jdbc').options(url='jdbc:mysql://ip:port?user=user&password=password', dbtable='(select * from tablename limit 100) as tablename').load()
print df.head()

And get this error

py4j.protocol.Py4JJavaError: An error occurred while calling o32.load. : java.sql.SQLException: No suitable driver

I found that I need the JDBC driver for mysql.

I downloaded the platform free one from here

I tried including it using this code in starting the spark context

conf.set("spark.driver.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")

and tried to install it using

sudo apt-get install libmysql-java

on the master machine, on the db machine and on the machine running the python script with no luck.

edit2

#

i tried using

conf.set("spark.executor.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")

seems by the output of

print sc.getConf().getAll()

which is

[(u'spark.driver.memory', u'3G'), (u'spark.executor.extraClassPath', u'file:///var/nfs/general/mysql-connector-java-5.1.43.jar'), (u'spark.app.name', u'spark-basic'), (u'spark.app.id', u'app-20170830'), (u'spark.rdd.compress', u'True'), (u'spark.master', u'spark://127.0.0.1:7077'), (u'spark.driver.port', u''), (u'spark.serializer.objectStreamReset', u'100'), (u'spark.executor.memory', u'2G'), (u'spark.executor.id', u'driver'), (u'spark.submit.deployMode', u'client'), (u'spark.driver.host', u''), (u'spark.driver.cores', u'3')]

that it includes the correct path, but still i get the same "no driver" error...

What am i missing here?

Thanks

pauli · Accepted Answer · 2017-08-30 13:57:26Z

1

You need to set the classpath for both driver and worker nodes. Add the following to spark configuration

conf.set("spark.executor.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")
conf.set("spark.driver.extraClassPath", "/var/nfs/general/mysql-connector-java-5.1.43")

Or you can pass it using

import os
os.environ['SPARK_CLASSPATH'] = "/path/to/driver/mysql.jar"

For spark >=2.0.0 you can add the comma separated list of jars to spark-defaults.conf file located in spark_home/conf directory like this

spark.jars     path_2_jar1,path_2_jar2

edited Aug 30, 2017 at 13:57

answered Aug 30, 2017 at 10:20

pauli

4,3112 gold badges28 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

MD. HUMAYUN KABIR TUHIN · Accepted Answer · 2018-08-08 12:19:43Z

0

from pyspark.sql import SparkSession
spark = SparkSession\
    .builder\
    .appName("Word Count")\
    .config("spark.driver.extraClassPath", "/home/tuhin/mysql.jar")\
    .getOrCreate()

dataframe_mysql = spark.read\
    .format("jdbc")\
    .option("url", "jdbc:mysql://ip:port/db_name")\
    .option("driver", "com.mysql.jdbc.Driver")\
    .option("dbtable", "employees").option("user", "root")\
    .option("password", "12345678").load()

print(dataframe_mysql.columns)

"/home/tuhin/mysql.jar" is the location of mysql jar file

answered Aug 8, 2018 at 12:19

MD. HUMAYUN KABIR TUHIN

511 silver badge2 bronze badges

Collectives™ on Stack Overflow

pyspark using mysql database on remote machine

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related