I am using Python 2.7 & Spark 2.0.2 in a jupyter notebook trying to access a mySql database in another docker container. I have implemented just about everything I can find to remedy the issue and still coming up short. This is my model, so at least similar has been done before. I put my notebook and Dockerfile in a public repository for reference, in the 'mysql' branch, here.
Code that fails:
df = (spark.read.format('jdbc')
.options(
url='jdbc:mysql://172.17.0.8:6603/giskard',
user='root',
password='datascience',
dbtable='supers',driver='com.mysql.jdbc.Driver')
.load()
)
with (excerpt, full list in the notebook referenced above):
Py4JJavaError: An error occurred while calling o42.load.
: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
Inside the dockerfile, I added all possible solutions I found:
RUN apt-get update && apt-get install -y --no-install-recommends apt-utils && \
apt-get install -y mysql-client && \
apt-get install -y python-dev && \
apt-get install -y libmysqlclient-dev && \
apt-get install -y libmysql-java && \
apt-get clean
RUN pip2 install MySQL-python
I verified the mysql jar file exists inside the container and then added it to SPARK_OPTS such that a %env inside the notebook yeilds:
'SPARK_OPTS': '--driver-java-options=-Xms1024M --driver-java-options=-Xmx4096M --driver-java-options=-Dlog4j.logLevel=info --spark-jars=/usr/share/java/mysql-connector-java.jar',
other possible relevant parts of the environment:
'PATH': '/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
'PYSPARK_PYTHON': '/opt/conda/envs/python2/bin/python',
'PYTHONPATH': '/usr/local/spark/python:/usr/local/spark/python/lib/py4j-0.10.4-src.zip:/usr/lib/python2.7/dist-packages',
'SPARK_HOME': '/usr/local/spark',
The database I'm trying to reach does exist with data. The process I used is documented in the first cell of my notebook. Am I making this too complicated? What am I missing and what else can I try? I appreciate any direction you can offer towards a solution!
mysql-connector-java.jarsprobably a typo, it is usuallyjar(without an s), and usually has a version number in it.