19

I was trying to follow the Spark standalone application example described here https://spark.apache.org/docs/latest/quick-start.html#standalone-applications

The example ran fine with the following invocation:

spark-submit  --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar

However, when I tried to introduce some third-party libraries via --jars, it throws ClassNotFoundException.

$ spark-submit --jars /home/linpengt/workspace/scala-learn/spark-analysis/target/pack/lib/* \
  --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar

Spark assembly has been built with Hive, including Datanucleus jars on classpath
Exception in thread "main" java.lang.ClassNotFoundException: SimpleApp
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:300)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Removing the --jars option and the program runs again (I didn't actually start using those libraries yet). What's the problem here? How should I add the external jars?

2 Answers 2

41

According to spark-submit's --help, the --jars option expects a comma-separated list of local jars to include on the driver and executor classpaths.

I think that what's happening here is that /home/linpengt/workspace/scala-learn/spark-analysis/target/pack/lib/* is expanding into a space-separated list of jars and the second JAR in the list is being treated as the application jar.

One solution is to use your shell to build a comma-separated list of jars; here's a quick way of doing it in bash, based on this answer on StackOverflow (see that answer for more complex approaches that handle filenames that contain spaces):

spark-submit --jars $(echo /dir/of/jars/*.jar | tr ' ' ',') \
    --class "SimpleApp" --master local[4] path/to/myApp.jar
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks Josh! That was the problem.
I just spent some time with my boss even tracing into the Scala source and we didn't figure this out THANKS!!! Could this not be a standalone question and answer? I will submit a bug against the spark-submit docs page.
@jimlohse Sure! You can even submit a pull request yourself in order to update the documentation; see cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
@JoshRosen I noticed even in the 1.6.0 docs spark.apache.org/docs/latest/submitting-applications.html the --repository and --packages parameters specify commas but it's not clear for --jars. Thanks, submitted pull request with suggested edit, and posted this question here on SO: stackoverflow.com/questions/34738296/…
You saved my day :)
4

Is your SimpleApp class in any specific package? It seems that you need to include the full package name in the command line. So, if the SimpleApp class is located in com.yourcompany.yourpackage, you'd have to submit the Spark job with --class "com.yourcompany.yourpackage.SimpleApp" instead of --class "SimpleApp". I had the same problem and changing the name to the full package and class name fixed it. Hope that helps!

1 Comment

No. It was in the default package. I just tried to put it in a specific package but still no luck. As I stated, it only failed when I tried to add the third-party libraries with --jars option.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.