3

I am using Python 2.7 & Spark 2.0.2 in a jupyter notebook trying to access a mySql database in another docker container. I have implemented just about everything I can find to remedy the issue and still coming up short. This is my model, so at least similar has been done before. I put my notebook and Dockerfile in a public repository for reference, in the 'mysql' branch, here.

Code that fails:

df = (spark.read.format('jdbc')
      .options(
        url='jdbc:mysql://172.17.0.8:6603/giskard',
        user='root',
        password='datascience',
        dbtable='supers',driver='com.mysql.jdbc.Driver')
      .load()
     )

with (excerpt, full list in the notebook referenced above):

Py4JJavaError: An error occurred while calling o42.load.
: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

Inside the dockerfile, I added all possible solutions I found:

RUN apt-get update && apt-get install -y --no-install-recommends apt-utils && \
    apt-get install -y mysql-client     && \
    apt-get install -y python-dev       && \
    apt-get install -y libmysqlclient-dev   && \
    apt-get install -y libmysql-java        && \
    apt-get clean

RUN pip2 install MySQL-python

I verified the mysql jar file exists inside the container and then added it to SPARK_OPTS such that a %env inside the notebook yeilds:

'SPARK_OPTS': '--driver-java-options=-Xms1024M --driver-java-options=-Xmx4096M --driver-java-options=-Dlog4j.logLevel=info --spark-jars=/usr/share/java/mysql-connector-java.jar',

other possible relevant parts of the environment:

 'PATH': '/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
 'PYSPARK_PYTHON': '/opt/conda/envs/python2/bin/python',
 'PYTHONPATH': '/usr/local/spark/python:/usr/local/spark/python/lib/py4j-0.10.4-src.zip:/usr/lib/python2.7/dist-packages',
 'SPARK_HOME': '/usr/local/spark',

The database I'm trying to reach does exist with data. The process I used is documented in the first cell of my notebook. Am I making this too complicated? What am I missing and what else can I try? I appreciate any direction you can offer towards a solution!

2
  • mysql-connector-java.jars probably a typo, it is usually jar (without an s), and usually has a version number in it. Commented Jan 17, 2017 at 12:30
  • Thank you, Mark. I corrected the typo; unfortunately it had no effect on the error. Commented Jan 17, 2017 at 15:46

1 Answer 1

3

I figured out the issue(s) by stepping back to see if I could access the database just through python and checking the spark process inside the container with ps -aux

1) all containers must be on the same network to communicate; linking is apparently not enough. I used a new one: docker network create --driver bridge dbnet

2) I installed python-mysqldb to access the database via python. I did this within the notebook instead of adding it to the dockerfile.

!sudo apt-get update && sudo apt-get install -y python-mysqldb

# from https://pypi.python.org/pypi/MySQL-python/1.2.5
import MySQLdb

db = MySQLdb.connect(host=DB_SERVER_IP,     # your host, usually localhost
                     user=MYSQL_USER,       # your username
                     passwd=MYSQL_PASSWORD, # your password
                     db=MYSQL_DATABASE)     # name of the data base

3) Spark needed libmysql-java and for the jar file to be copied to /usr/local/spark/jars. Far as I can tell, the docker SPARK_OPTS setting is ineffective. I added to the Dockerfile:

RUN apt-get update && apt-get install -y --no-install-recommends apt-utils && \
    apt-get install -y libmysql-java            && \
    apt-get clean
RUN ln -s /usr/share/java/mysql-connector-java.jar /usr/local/spark/jars

And now all is well. I'll leave the sample notebook in the mysql branch of the spark 2 docker repository should anyone else need my exact steps.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.