Spark stand alone mode: Submitting jobs programmatically

Question

I am new to Spark and I am trying to submit the "quick-start" job from my app. I try to emulate standalone-mode by starting master and slave on my localhost.

object SimpleApp {

  def main(args: Array[String]): Unit = {

    val logFile = "/opt/spark-2.0.0-bin-hadoop2.7/README.md"
    val conf = new SparkConf().setAppName("SimpleApp")
    conf.setMaster("spark://10.49.30.77:7077")
    val sc = new SparkContext(conf)

    val logData = sc.textFile(logFile,2).cache();
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s , lines with b: %s".format(numAs,numBs))

  }

}

I run my Spark app in my IDE (IntelliJ).

Looking at the logs (logs in workernode), it seems spark cannot find the job class.

16/09/15 17:50:58 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1912.0 B, free 366.3 MB)
16/09/15 17:50:58 INFO TorrentBroadcast: Reading broadcast variable 1 took 137 ms
16/09/15 17:50:58 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.1 KB, free 366.3 MB)
16/09/15 17:50:58 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ClassNotFoundException: SimpleApp$$anonfun$1
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)

1.Does this mean the Job resources (classes) are not transmitted to the slave node?

2.For stand-alone mode , I must submit jobs using "spark-submit" CLI? If so, how to submit sparks jobs from a app(for example a webapp)

3.Also unrelated question : I see in the logs the,DriverProgram starts a server(port 4040).Whats the purpose of this? DriveProgram being the client, why does it start this service ?

16/09/15 17:50:52 INFO SparkEnv: Registering OutputCommitCoordinator
16/09/15 17:50:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/09/15 17:50:53 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.49.30.77:4040

Hokam · Accepted Answer · 2016-09-15 09:17:33Z

2

You should either set the resources paths in SparkConf using the setJars method or provide the resources in spark-submit command with the --jars option when running from CLI.

answered Sep 15, 2016 at 9:17

Hokam

9247 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Spark stand alone mode: Submitting jobs programmatically

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related