0

I just upgraded to Spark 2.0 from 1.4 and downloaded the ec2 directory from github.com/amplab/spark-ec2/tree/branch-2.0

To spin up some clusters I go to my ec2 directory and run these commands:

./spark-ec2 -k <keypair> -i <key-file> -s <num-slaves> launch <cluster-name>

./spark-ec2 -k <keypair> -i <key-file> login <cluster-name>

I have my clusters up and I'm logged into master but I don't know how to launch a pyspark notebook. With Spark 1.4 I'll run the command

IPYTHON_OPTS="notebook --ip=0.0.0.0" /root/spark/bin/pyspark --executor-memory 4G --driver-memory 4G &

and I have my notebook up and running fine but with Spark 2.0 there is no bin/pyspark directory. Can anyone help with this?

1 Answer 1

1

According to the source comments:

https://apache.googlesource.com/spark/+/master/bin/pyspark

In Spark 2.0, IPYTHON and IPYTHON_OPTS are removed and pyspark fails to launch if either option is set in the user's environment. Instead, users should set PYSPARK_DRIVER_PYTHON=ipython to use IPython and set PYSPARK_DRIVER_PYTHON_OPTS to pass options when starting the Python driver (e.g. PYSPARK_DRIVER_PYTHON_OPTS='notebook'). This supports full customization of the IPython and executor Python executables.

The following link will take you step by step. Along with upgrading to Spark 2.0, you should also upgrade to Juypter Notebooks (formerly Ipython Notebooks) as well.

Sign up to request clarification or add additional context in comments.

2 Comments

Does that mean I have to add the lines export PYSPARK_DRIVER_PYTHON=ipython export PYSPARK_DRIVER_PYTHON_OPTS="notebook" to my .bash_profile?
Short answer Yes, but please see my edited Answer with a link to take you step by step. Take what you need disregard the rest.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.