0

I'm facing an issue when trying to use pyspark=3.1.2. I have java 1.8 installed and added in my user path. But according to the docs it does not need any other dependency.

My question is, do I have to install anything else? Like Spark itself or semething?

I'm using conda environments in VS Code.

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
k:\Deep Learning\Github\stock-pred\test_spark.ipynb Cell 2' in <cell line: 1>()
----> 1 spark = SparkSession \
      2     .builder \
      3         .appName("test-wretrwrwe") \
      4             .getOrCreate()

File ~\anaconda3\envs\prepro\lib\site-packages\pyspark\sql\session.py:228, in SparkSession.Builder.getOrCreate(self)
    226         sparkConf.set(key, value)
    227     # This SparkContext may be an existing one.
--> 228     sc = SparkContext.getOrCreate(sparkConf)
    229 # Do not update `SparkConf` for existing `SparkContext`, as it's shared
    230 # by all sessions.
    231 session = SparkSession(sc)

File ~\anaconda3\envs\prepro\lib\site-packages\pyspark\context.py:384, in SparkContext.getOrCreate(cls, conf)
    382 with SparkContext._lock:
    383     if SparkContext._active_spark_context is None:
--> 384         SparkContext(conf=conf or SparkConf())
    385     return SparkContext._active_spark_context

File ~\anaconda3\envs\prepro\lib\site-packages\pyspark\context.py:144, in SparkContext.__init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    139 if gateway is not None and gateway.gateway_parameters.auth_token is None:
    140     raise ValueError(
    141         "You are trying to pass an insecure Py4j gateway to Spark. This"
    142         " is not allowed as it is a security risk.")
--> 144 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    145 try:
    146     self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
    147                   conf, jsc, profiler_cls)

File ~\anaconda3\envs\prepro\lib\site-packages\pyspark\context.py:331, in SparkContext._ensure_initialized(cls, instance, gateway, conf)
    329 with SparkContext._lock:
    330     if not SparkContext._gateway:
--> 331         SparkContext._gateway = gateway or launch_gateway(conf)
    332         SparkContext._jvm = SparkContext._gateway.jvm
    334     if instance:

File ~\anaconda3\envs\prepro\lib\site-packages\pyspark\java_gateway.py:108, in launch_gateway(conf, popen_kwargs)
    105     time.sleep(0.1)
    107 if not os.path.isfile(conn_info_file):
--> 108     raise Exception("Java gateway process exited before sending its port number")
    110 with open(conn_info_file, "rb") as info:
    111     gateway_port = read_int(info)

Exception: Java gateway process exited before sending its port number

1 Answer 1

0

Using Windows as an example.

Method 1 (temporary solution):

import os
os.environ['JAVA_HOME'] = "C:\Program Files\Java\jdk1.8.0_331" 

Method 2:

Set the system variable in the environment variables, add a new variable named "JAVA_HOME" with the value "C:\Program Files\Java\jdk1.8.0_331" enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.