1

I run Windows 10 and have installed Python3 through Anaconda3. I am using Jupyter Notebook. I have installed Spark from here (spark-2.3.0-bin-hadoop2.7.tgz). I have extracted the files and pasted them in my directory D:\Spark. I have amended the Environment Variables:

User variable:

Variable: SPARK_HOME

Value: D:\Spark

System variable:

Variable: PATH

Value: D:\Spark\bin

I have installed/updated via conda the following modules:

pandas

numpy

pyarrow

pyspark

py4j

Java is installed:

enter image description here

I don't know if this is relevant but in my Environment Variables the following two variables appear:

enter image description here

Having done all these I rebooted and I run the following piece of code which results in an error message which I paste here:

import pandas as pd

import seaborn as sns

# These lines enable the run of spark commands

from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')
spark = SparkSession(sc)

import pyspark

data = sns.load_dataset('iris')

data_sp = spark.createDataFrame(data)

data_sp.show()

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-1-ec964ecd39a2> in <module>()
      7 from pyspark.context import SparkContext
      8 from pyspark.sql.session import SparkSession
----> 9 sc = SparkContext('local')
     10 spark = SparkSession(sc)
     11 

C:\ProgramData\Anaconda3\lib\site-packages\pyspark\context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    113         """
    114         self._callsite = first_spark_call() or CallSite(None, None, None)
--> 115         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    116         try:
    117             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

C:\ProgramData\Anaconda3\lib\site-packages\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
    296         with SparkContext._lock:
    297             if not SparkContext._gateway:
--> 298                 SparkContext._gateway = gateway or launch_gateway(conf)
    299                 SparkContext._jvm = SparkContext._gateway.jvm
    300 

C:\ProgramData\Anaconda3\lib\site-packages\pyspark\java_gateway.py in launch_gateway(conf)
     92 
     93             if not os.path.isfile(conn_info_file):
---> 94                 raise Exception("Java gateway process exited before sending its port number")
     95 
     96             with open(conn_info_file, "rb") as info:

Exception: Java gateway process exited before sending its port number

How can I make PySpark work?

3
  • 1
    You should also install java. Also, what is the output of D:\spark\sbin\start-master.sh? Commented Jan 14, 2019 at 10:19
  • 1
    @BlackBear: Thank you for your comment. Java is installed -- see my updated post. As for your question I don't understand --I am sorry. What you would like me to do exactly? The file you mention exists in my directory but what I should do with it? Commented Jan 14, 2019 at 10:27
  • @user8270077 did u solved the problem? Commented Aug 18, 2021 at 16:39

1 Answer 1

-2

I resolved the problem following the instructions to be found here: https://changhsinlee.com/install-pyspark-windows-jupyter/

Sign up to request clarification or add additional context in comments.

1 Comment

Please include content, once the link breaks this answer will have no value.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.