0

I use MS Windows 7.

Initially, I tried one program using scala in Spark 1.6 and it worked fine (where I am getting SparkContext object as sc automatically).

When I tried Spark 2.2, I am not getting sc automatically so I created one by doing the following steps:

import org.apache.spark.SparkContext  
import org.apache.spark.SparkConf  
val sc = new SparkConf().setAppName("myname").setMaster("mast")  
new SparkContext(sc) 

Now when I am trying to execute below parallelize method it gives me one error:

val data = Array(1, 2, 3, 4, 5)  
val distData = sc.parallelize(data) 

Error:

Value parallelize is not a member of org.apache.spark.SparkConf  

I followed these steps using official documentation only. So can anybody explain me where I went wrong? Thanks in advance. :)

1
  • I understand from "where I am getting SparkContext object as sc automatically" that you use spark-shell, don't you? Have you defined HADOOP_HOME and/or saved winutils.exe in $HADOOP_HOME/bin? Commented Dec 24, 2017 at 20:05

5 Answers 5

2

If spark-shell doesn't show this line on start:

Spark context available as 'sc' (master = local[*], app id = local-XXX).

Run

val sc = SparkContext.getOrCreate()
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks but it throws one error as i have already mentioned in below answer's comments
Which version of spark-shell are you using?
Read the awnser and it's important to try it in a new spark-shell session
Did you try this val sc = SparkContext.getOrCreate()?
sparks-shell currently has a sparkContext but you don't have it propertly assigned to variable. If you try inside spark-shell to create a new SparkContext on your own way jvm shows an error saying that currently exists a SparkContext. SparkContext.getOrCreate () function manage the situation and if exists a SparkContext return it, otherwise create a new one and return it.
|
1

The issue is that you created sc of type SparkConfig not SparkContext (both have the same initials).


For using parallelize method in Spark 2.0 version or any other version, sc should be SparkContext and not SparkConf. The correct code should be like this:

import org.apache.spark.SparkContext  
import org.apache.spark.SparkConf  
val sparkConf = new SparkConf().setAppName("myname").setMaster("mast")  
val sc = new SparkContext(sparkConf)
val data = Array(1, 2, 3, 4, 5)  
val distData = sc.parallelize(data)  

This will give you the desired result.

8 Comments

should i pass sparkConf in new SparkContext() instead of sc?
Oops! My mistake. Yes, you should pass sparkConf in new SparkContext instead of sc. I have updated my answer.
it throws one error that only one sparkcontext object should be running in jvm. How can i avoid this ?
Are you using spark-shell?
Then you don't need to create sc there. It is already created for you. Spark context Web UI available at http://192.168.1.13:4040 Spark context available as 'sc' (master = local[*], app id = local-1514126801063). Spark session available as 'spark'.
|
0

You should prefer to use SparkSession as it is the the entry point for Spark from version 2. You could try something like :

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.
    master("local")
    .appName("spark session example")
    .getOrCreate()
val sc = spark.sparkContext
val data = Array(1, 2, 3, 4, 5)
val distData = sc.parallelize(data)

This is what I tried in Databricks

17 Comments

nope. It doesnt work. it throws error on getorcreate method. As far as i remember you posted some comment above but you deleted it. I dont know why. If that concept is correct then why dont you post it again ??
Edited and tested in Databricks: was using sparkSession variable to get the SparkSession and using spark in getting sparkContext, edited to spark variable in both places. This was a mistake from my end. Sorry for that. Now, this code will work if you are using Spark 2.
Thanks for your efforts but it still throws an error. It gives a long error which can't be posted here
This is very basic code. Are you able to create SparkSession ? If you are running in spark-shell ,the SparkSession would be available as spark.(In version 2). Check by using spark.version.
Yes i replaced my version with 2.2.1 and the problem is solved.
|
0

There is some problem with 2.2.0 version of Apache Spark. I replaced it with 2.2.1 version which is the latest one and i am able to get sc and spark variables automatically when I start spark-shell via cmd in windows 7. I hope it will help someone.
I executed below code which creates rdd and it works perfectly. No need to import any packages.

val dataOne=sc.parallelize(1 to 10)
dataOne.collect(); //Will print 1 to 10 numbers in array

Comments

0

Your code shud like this

val conf = new SparkConf()
conf.setMaster("local[*]")
conf.setAppName("myname")
val sc = new SparkContext(conf)

NOTE: master url should be local[*]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.