3

Following are the dependencies, which got installed successfully.

!apt-get install openjdk-8-jre
!apt-get install scala
!pip install py4j
!wget -q https://downloads.apache.org/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz
!tar xf spark-2.4.8-bin-hadoop2.7.tgz
!pip install -q findspark

Now to create the spark context:

# Setting up environment variables
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.4.8-bin-hadoop2.7"
# export PYSPARK_SUBMIT_ARGS ="--master local[2]"

# Importing and initating spark
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").appName("Test Setup").getOrCreate()
sc = spark.sparkContext

I'm getting this error:

RuntimeError: Java gateway process exited before sending its port number

Please note that this is a colab notebook. Any kind of help would be great.

2

1 Answer 1

4

You can install Pyspark using PyPI as an alternative:

For Python users, PySpark also provides pip installation from PyPI. This is usually for local usage or as a client to connect to a cluster instead of setting up a cluster itself.

Install pyspark + openjdk
%pip install pyspark==2.4.8
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
Create spark session
from pyspark.sql import SparkSession

spark = SparkSession.builder\
        .master("local[*]")\
        .appName("Test Setup")\
        .getOrCreate()

Tested in Google Colab Notebook:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.