1

I am new to Spark and I am trying to submit the "quick-start" job from my app. I try to emulate standalone-mode by starting master and slave on my localhost.

object SimpleApp {

  def main(args: Array[String]): Unit = {

    val logFile = "/opt/spark-2.0.0-bin-hadoop2.7/README.md"
    val conf = new SparkConf().setAppName("SimpleApp")
    conf.setMaster("spark://10.49.30.77:7077")
    val sc = new SparkContext(conf)

    val logData = sc.textFile(logFile,2).cache();
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s , lines with b: %s".format(numAs,numBs))

  }

}

I run my Spark app in my IDE (IntelliJ).

Looking at the logs (logs in workernode), it seems spark cannot find the job class.

16/09/15 17:50:58 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1912.0 B, free 366.3 MB)
16/09/15 17:50:58 INFO TorrentBroadcast: Reading broadcast variable 1 took 137 ms
16/09/15 17:50:58 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.1 KB, free 366.3 MB)
16/09/15 17:50:58 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ClassNotFoundException: SimpleApp$$anonfun$1
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)

1.Does this mean the Job resources (classes) are not transmitted to the slave node?

2.For stand-alone mode , I must submit jobs using "spark-submit" CLI? If so, how to submit sparks jobs from a app(for example a webapp)

3.Also unrelated question : I see in the logs the,DriverProgram starts a server(port 4040).Whats the purpose of this? DriveProgram being the client, why does it start this service ?

16/09/15 17:50:52 INFO SparkEnv: Registering OutputCommitCoordinator
16/09/15 17:50:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/09/15 17:50:53 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.49.30.77:4040

1 Answer 1

2

You should either set the resources paths in SparkConf using the setJars method or provide the resources in spark-submit command with the --jars option when running from CLI.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.