1

I have followed the steps to set up pyspark in intellij from this question:

Write and run pyspark in IntelliJ IDEA

Here is the simple code attempted to run:

#!/usr/bin/env python
from pyspark import *

def p(msg): print("%s\n" %repr(msg))

import numpy as np
a = np.array([[1,2,3], [4,5,6]])
p(a)

import os
sc = SparkContext("local","ptest",conf=SparkConf().setAppName("x"))

ardd = sc.parallelize(a)
p(ardd.collect())

Here is the result of submitting the code

NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
  File "/git/misc/python/ptest.py", line 14, in <module>
    sc = SparkContext("local","ptest",SparkConf().setAppName("x"))
  File "/shared/spark16/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/shared/spark16/python/pyspark/context.py", line 245, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/shared/spark16/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number

However I really do not understand how this could be expected to work: in order to run in Spark the code needs to be bundled up and submitted via spark-submit.

So I doubt that that other question actually truly addressed submitting pyspark code through Intellij to spark.

Is there a way to submit pyspark code to pyspark? It would actually be

  spark-submit myPysparkCode.py

The pyspark executable itself is deprecated since Spark 1.0. Anyone have this working?

3
  • Could you add your run configuration? Commented Feb 26, 2017 at 22:48
  • 1
    @zero323 Ah - I had found the missing link here: need to add PYSPARK_SUBMIT_ARGS to pyspark-shell in the run configuration the other settings for PYTHONPATH, SPARK_HOME are as shown in that other question. I added an answer now for this. If you have other info to add pls feel free to add your own. btw : I am still looking for how to run the pyspark in the intellij python console. Commented Feb 26, 2017 at 22:52
  • me too. pyspark shell and spark-submit, work fine for me, but I get the "Exception: Java gateway process" when i try to run it on intellij Commented Mar 10, 2017 at 6:41

1 Answer 1

1

In my case the variable settings from the other Q&A Write and run pyspark in IntelliJ IDEA covered most but not all of the required settings. I tried them many times.

Only after adding :

  PYSPARK_SUBMIT_ARGS =  pyspark-shell

to the run configuration did pyspark finally quiet down and succeed.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.