I installed gettyimages/spark docker image and jupyter/pyspark-notebook inside my machine.
However as the gettyimage/spark python version is 3.5.3 while jupyter/pyspark-notebook python version is 3.7, the following error come out:
Exception: Python in worker has different version 3.5 than that in driver 3.7, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
So, i have tried to upgrade the python version of gettyimage/spark image OR downgrade the python version of jupyter/pyspark-notebook docker image to fix it.
- Lets talk about method 1, downgrade
jupyter/pyspark-notebookpython version first:
I use conda install python=3.5 to downgrade the python version of jupyter/pyspark-notebook docker image. However, after i do so , my jupyter notebook cannot connect to any single ipynb and the kernel seems dead. Also, when i type conda again, it shows me conda command not found, but python terminal work well
I have compare the sys.path before the downgrade and after it
['', '/usr/local/spark/python', '/usr/local/spark/python/lib/py4j-0.10.7-src.zip', '/opt/conda/lib/python35.zip', '/opt/conda/lib/python3.5', '/opt/conda/lib/python3.5/plat-linux', '/opt/conda/lib/python3.5/lib-dynload', '/opt/conda/lib/python3.5/site-packages']
['', '/usr/local/spark/python', '/usr/local/spark/python/lib/py4j-0.10.7-src.zip', '/opt/conda/lib/python37.zip', '/opt/conda/lib/python3.7', '/opt/conda/lib/python3.7/lib-dynload', '/opt/conda/lib/python3.7/site-packages']
I think more or less, it is correct. So why i cannot use my jupyter notebook to connect to the kennel?
- So i use another method, i tried upgrade the
gettyimage/sparkimage
sudo docker run -it gettyimages/spark:2.4.1-hadoop-3.0 apt-get install python3.7.3 ; python3 -v
However, I find that even i do so, i cannot run the spark well.
I am not quite sure what to do. May you share with me how to modify the docker images internal package version
gettyimages/sparkandjupyter/pyspark-notebookwould be two different images that should not interact with each other unless you are setting up a connection between them through a docker network.