Unable to create SparkContext object using Apache Spark 2.2 version

Question

I use MS Windows 7.

Initially, I tried one program using scala in Spark 1.6 and it worked fine (where I am getting SparkContext object as sc automatically).

When I tried Spark 2.2, I am not getting sc automatically so I created one by doing the following steps:

import org.apache.spark.SparkContext  
import org.apache.spark.SparkConf  
val sc = new SparkConf().setAppName("myname").setMaster("mast")  
new SparkContext(sc)

Now when I am trying to execute below parallelize method it gives me one error:

val data = Array(1, 2, 3, 4, 5)  
val distData = sc.parallelize(data)

Error:

Value parallelize is not a member of org.apache.spark.SparkConf

I followed these steps using official documentation only. So can anybody explain me where I went wrong? Thanks in advance. :)

I understand from "where I am getting SparkContext object as sc automatically" that you use spark-shell, don't you? Have you defined HADOOP_HOME and/or saved winutils.exe in $HADOOP_HOME/bin? — Jacek Laskowski
– Jacek Laskowski, Commented Dec 24, 2017 at 20:05

Pau Trepat · Accepted Answer · 2017-12-24 15:12:18Z

2

If spark-shell doesn't show this line on start:

Spark context available as 'sc' (master = local[*], app id = local-XXX).

Run

val sc = SparkContext.getOrCreate()

edited Dec 24, 2017 at 15:12

answered Dec 24, 2017 at 14:34

Pau Trepat

7171 gold badge6 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

whatsinthename Over a year ago

Thanks but it throws one error as i have already mentioned in below answer's comments

Pau Trepat Over a year ago

Which version of spark-shell are you using?

Pau Trepat Over a year ago

Read the awnser and it's important to try it in a new spark-shell session

Pau Trepat Over a year ago

Did you try this val sc = SparkContext.getOrCreate()?

Pau Trepat Over a year ago

sparks-shell currently has a sparkContext but you don't have it propertly assigned to variable. If you try inside spark-shell to create a new SparkContext on your own way jvm shows an error saying that currently exists a SparkContext. SparkContext.getOrCreate () function manage the situation and if exists a SparkContext return it, otherwise create a new one and return it.

|

Jacek Laskowski · Accepted Answer · 2017-12-24 15:47:25Z

1

The issue is that you created sc of type SparkConfig not SparkContext (both have the same initials).

For using parallelize method in Spark 2.0 version or any other version, sc should be SparkContext and not SparkConf. The correct code should be like this:

import org.apache.spark.SparkContext  
import org.apache.spark.SparkConf  
val sparkConf = new SparkConf().setAppName("myname").setMaster("mast")  
val sc = new SparkContext(sparkConf)
val data = Array(1, 2, 3, 4, 5)  
val distData = sc.parallelize(data)

This will give you the desired result.

edited Dec 24, 2017 at 15:47

Jacek Laskowski

75k28 gold badges253 silver badges440 bronze badges

answered Dec 24, 2017 at 14:32

himanshuIIITian

6,1157 gold badges55 silver badges71 bronze badges

8 Comments

whatsinthename Over a year ago

should i pass sparkConf in new SparkContext() instead of sc?

himanshuIIITian Over a year ago

Oops! My mistake. Yes, you should pass sparkConf in new SparkContext instead of sc. I have updated my answer.

whatsinthename Over a year ago

it throws one error that only one sparkcontext object should be running in jvm. How can i avoid this ?

himanshuIIITian Over a year ago

Are you using spark-shell?

himanshuIIITian Over a year ago

Then you don't need to create sc there. It is already created for you.

Spark context Web UI available at http://192.168.1.13:4040 Spark context available as 'sc' (master = local[*], app id = local-1514126801063). Spark session available as 'spark'.

|

deadbug · Accepted Answer · 2017-12-25 06:33:47Z

0

You should prefer to use SparkSession as it is the the entry point for Spark from version 2. You could try something like :

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.
    master("local")
    .appName("spark session example")
    .getOrCreate()
val sc = spark.sparkContext
val data = Array(1, 2, 3, 4, 5)
val distData = sc.parallelize(data)

edited Dec 25, 2017 at 6:33

answered Dec 24, 2017 at 16:12

deadbug

4445 silver badges20 bronze badges

17 Comments

whatsinthename Over a year ago

nope. It doesnt work. it throws error on getorcreate method. As far as i remember you posted some comment above but you deleted it. I dont know why. If that concept is correct then why dont you post it again ??

deadbug Over a year ago

Edited and tested in Databricks: was using sparkSession variable to get the SparkSession and using spark in getting sparkContext, edited to spark variable in both places. This was a mistake from my end. Sorry for that. Now, this code will work if you are using Spark 2.

whatsinthename Over a year ago

Thanks for your efforts but it still throws an error. It gives a long error which can't be posted here

deadbug Over a year ago

This is very basic code. Are you able to create SparkSession ? If you are running in spark-shell ,the SparkSession would be available as spark.(In version 2). Check by using spark.version.

whatsinthename Over a year ago

Yes i replaced my version with 2.2.1 and the problem is solved.

|

whatsinthename · Accepted Answer · 2017-12-25 08:24:23Z

0

There is some problem with 2.2.0 version of Apache Spark. I replaced it with 2.2.1 version which is the latest one and i am able to get sc and spark variables automatically when I start spark-shell via cmd in windows 7. I hope it will help someone.
I executed below code which creates rdd and it works perfectly. No need to import any packages.

val dataOne=sc.parallelize(1 to 10)
dataOne.collect(); //Will print 1 to 10 numbers in array

answered Dec 25, 2017 at 8:24

whatsinthename

2,1871 gold badge30 silver badges74 bronze badges

Comments

jay patel · Accepted Answer · 2018-05-08 05:11:19Z

0

Your code shud like this

val conf = new SparkConf()
conf.setMaster("local[*]")
conf.setAppName("myname")
val sc = new SparkContext(conf)

NOTE: master url should be local[*]

answered May 8, 2018 at 5:11

jay patel

11 bronze badge

Collectives™ on Stack Overflow

Unable to create SparkContext object using Apache Spark 2.2 version

5 Answers 5

9 Comments

8 Comments

17 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

9 Comments

8 Comments

17 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related