1

I'm testing turi with this example on my macbook osx 10.10.5 https://turi.com/learn/gallery/notebooks/spark_and_graphlab_create.html

when getting to this step

# Set up the SparkContext object
# this can be 'local' or 'yarn-client' in PySpark
# Remember if using yarn-client then all the paths should be accessible
# by all nodes in the cluster.
sc = SparkContext('local')

the following error comes up

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-12-dc1befb4186c> in <module>()
      3 # Remember if using yarn-client then all the paths should be accessible
      4 # by all nodes in the cluster.
----> 5 sc = SparkContext()

/usr/local/Cellar/apache-spark/1.6.2/libexec/python/pyspark/context.pyc in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    110         """
    111         self._callsite = first_spark_call() or CallSite(None, None, None)
--> 112         SparkContext._ensure_initialized(self, gateway=gateway)
    113         try:
    114             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

/usr/local/Cellar/apache-spark/1.6.2/libexec/python/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway)
    243         with SparkContext._lock:
    244             if not SparkContext._gateway:
--> 245                 SparkContext._gateway = gateway or launch_gateway()
    246                 SparkContext._jvm = SparkContext._gateway.jvm
    247 

/usr/local/Cellar/apache-spark/1.6.2/libexec/python/pyspark/java_gateway.pyc in launch_gateway()
     92                 callback_socket.close()
     93         if gateway_port is None:
---> 94             raise Exception("Java gateway process exited before sending the driver its port number")
     95 
     96         # In Windows, ensure the Java child processes do not linger after Python has exited.

Exception: Java gateway process exited before sending the driver its port number

quick google search gave no help yet.

here is my .bash_profile

# added by Anaconda2 4.1.1 installer
export PATH="/Users/me/anaconda/bin:$PATH"

export SCALA_HOME=/usr/local/Cellar/scala/2.11.8/libexec
export SPARK_HOME=/usr/local/Cellar/apache-spark/1.6.2/libexec
export PYTHONPATH=$SPARK_HOME/python/pyspark:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH 
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH

anyone knows how to fix this error?

thanks

5
  • Is the SPARK_HOME path correct? Have you set PYSPARK_SUBMIT_ARGS="--master spark://<host>:<port>" in your environment variables? This could be the port number you are missing Commented Jul 13, 2016 at 14:58
  • spark_home is correct, i haven't configured PYSPARK_SUBMIT_ARGS, what should I specify in this case?? Commented Jul 13, 2016 at 15:03
  • Try this export PYSPARK_SUBMIT_ARGS="--master local[2]" Commented Jul 13, 2016 at 15:05
  • @KartikKannapur i think that actually worked, can you edit your answer so that I can accept it ? thanks a lot Commented Jul 13, 2016 at 15:08
  • Sure glad to help. Will add the answer. Commented Jul 13, 2016 at 15:08

1 Answer 1

1

This could potentially happen because of two reasons:

  1. Environment variable SPARK_HOME could be pointing to the wrong path
  2. Set export PYSPARK_SUBMIT_ARGS="--master local[2]" - This is the configuration you want PySpark to start with.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.