ClassNotFoundException: org.apache.spark.SparkConf with spark on hive

Question

I am trying to use SPARK as hive execution engine, but getting the below error. Spark 1.5.0 is installed and I am working with Hive 1.1.0 version with Hadoop 2.7.0 version.

hive_emp table is created as ORC format table in hive.

hive (Koushik)> insert into table hive_emp values (2,'Koushik',1);
Query ID = hduser_20150921072727_feba8363-258d-4d0b-8976-662e404bca88
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
    at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:140)
    at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:56)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116)
    at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:113)
    at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:95)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    ... 25 more
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. org/apache/spark/SparkConf

I also set the spark path and execution engine in hive shell.

hduser@ubuntu:~$ spark-shell
    Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_21)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.

scala> exit;
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
hduser@ubuntu:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in file:/usr/lib/hive/conf/hive-log4j.properties
hive (default)> use Koushik;
OK
Time taken: 0.593 seconds
hive (Koushik)> set spark.home=/usr/local/src/spark;

I have also created a .hiverc as below

hduser@ubuntu:/usr/lib/hive/conf$ cat .hiverc
SET hive.cli.print.header=true;
set hive.cli.print.current.db=true;
set hive.auto.convert.join=true;
SET hbase.scan.cacheblock=0;
SET hbase.scan.cache=10000;
SET hbase.client.scanner.cache=10000;
SET hive.execution.engine=spark;

DEBUG mode error details given below:

hduser@ubuntu:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in file:/usr/lib/hive/conf/hive-log4j.properties
hive (default)> use Koushik;
OK
Time taken: 0.625 seconds
hive (Koushik)> set hive --hiveconf hive.root.logger=DEBUG
              > ;
hive (Koushik)> set hive.execution.engine=spark;
hive (Koushik)> desc hive_emp;
OK
col_name    data_type   comment
empid                   int                                         
empnm                   varchar(50)                                 
deptid                  int                                         
Time taken: 0.173 seconds, Fetched: 3 row(s)
hive (Koushik)> select * from hive_emp;
OK
hive_emp.empid  hive_emp.empnm  hive_emp.deptid
Time taken: 1.689 seconds
hive (Koushik)> insert into table hive_emp values (2,'Koushik',1);
Query ID = hduser_20151015112525_c96a458b-34f8-42ac-ab11-52c32479a29a
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
java.lang.NoSuchMethodError: org.apache.spark.scheduler.LiveListenerBus.addListener(Lorg/apache/spark/scheduler/SparkListener;)V
    at org.apache.hadoop.hive.ql.exec.spark.LocalHiveSparkClient.<init>(LocalHiveSparkClient.java:85)
    at org.apache.hadoop.hive.ql.exec.spark.LocalHiveSparkClient.getInstance(LocalHiveSparkClient.java:69)
    at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:56)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116)
    at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:113)
    at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:95)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. org.apache.spark.scheduler.LiveListenerBus.addListener(Lorg/apache/spark/scheduler/SparkListener;)V
hive (Koushik)>

I have executed the above insert twice and both the times it fails. Please find the hive.log which generated today.hive.log

Where to check which spark version is compatible with hive 1.1.0 and hadoop 2.7.0 — Koushik Chandra
– Koushik Chandra, Commented Oct 23, 2015 at 5:32
I am facing the exact same issue. Did you get a resolution yet ?? — Raghav
– Raghav, Commented Nov 6, 2015 at 15:00

Arvindkumar · Accepted Answer · 2015-10-12 11:00:26Z

1

The reason for this error is hive not able to find the spark assembly jar.

export SPARK_HOME=/usr/local/src/spark or add the spark assembly jar in hive lib folder. This issue will be resolved.

answered Oct 12, 2015 at 11:00

Arvindkumar

1843 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Koushik Chandra Over a year ago

got the error, after adding spark assembly in hive lib folder.

hive (Koushik)> insert into table hive_emp values (2,'Koushik',1); Query ID = hduser_20151012230101_83f8304d-868a-4186-9380-416c1de40f45 Total jobs = 1 Launching Job 1 out of 1 Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

Arvindkumar Over a year ago

Can you run hive in debug mode.. hive --hiveconf hive.root.logger=DEBUG,console and provide more details about the error

Koushik Chandra Over a year ago

appended DEBUG mode error details in the with the actual post. Please see above.

Koushik Chandra Over a year ago

I have also provided hive.log https://drive.google.com/file/d/0B_Ed4jUfln0SNkhwNnVUMG9neGs/view?usp=sharing which generated today after executing the above insert twice.

Venu A Positive Over a year ago

I think In spark 2.0 (lib/jars) folder there is no spark-assembly jar so .. that u got that error..

Vinkal · Accepted Answer · 2015-10-22 12:54:15Z

1

I too was facing the same issue on my Ubuntu 14.4 VitualBox. Here are the steps I have followed to fix:

hive> set spark.home=/usr/local/spark;
hive> set spark.master=local;
hive> SET hive.execution.engine=spark;
Added spark-assembly jar file as shown below:

hive> ADD jar /usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar;

edited Oct 22, 2015 at 12:54

answered Oct 22, 2015 at 10:28

Vinkal

2,9941 gold badge22 silver badges19 bronze badges

7 Comments

Koushik Chandra Over a year ago

I followed the steps, but it didn't resolved the issue for me. Please find the execution error and hive.log for this execution can be found hive.log

Vinkal Over a year ago

In addition to following steps described in the answer. Try running this, I forgot to add, export SPARK_HOME=/usr/local/src/spark

Vinkal Over a year ago

does 'spark-assembly-1.5.0-hadoop2.6.0.jar' exist in /hive/lib folder?

Koushik Chandra Over a year ago

with or without the jar placed in hive/lib folder I am getting error. When jar is placed then the error is

FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. org.apache.spark.scheduler.LiveListenerBus.addListener(Lorg/apache/spark/scheduler/SparkListener;)V

Vinkal Over a year ago

when you followed the steps mentioned in the Answer, spark-assembly-1.5.0-hadoop2.6.0.jar was in /hive/lib folder or you followed the steps without having spark-assembly-1.5.0-hadoop2.6.0.jar' in /hive/lib?

|

刘后来 · Accepted Answer · 2020-07-03 06:44:14Z

Like you, I encountered the same problem when deploying hive on spark. Finally, after my research, it was found that because hive could not load spark jars, so I made the following changes to hive-env.sh.

Add in hive-env.sh:

//Pay attention to your spark path

export SPARK_HOME=/opt/module/spark-2.4.5-bin-without-hive
export SPARK_JARS=""
for jar in `ls $SPARK_HOME/jars`; do
    export SPARK_JARS=$SPARK_JARS:$SPARK_HOME/jars/$jar
done
export HIVE_AUX_JARS_PATH=$SPARK_JARS

中文：这就是你的hive启动时没加载到spark的jars,所以在hive-env.sh里配置一下环境就可以了。 注意这里面的路径，我最下面的lzo你也可以不配，可以参考上面的这个配置（只是少了lzo）

export SPARK_HOME=/opt/module/spark-2.4.5-bin-without-hive
export SPARK_JARS=""
for jar in `ls $SPARK_HOME/jars`; do
    export SPARK_JARS=$SPARK_JARS:$SPARK_HOME/jars/$jar
done
export HIVE_AUX_JARS_PATH=/opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar$SPARK_JARS

iSingh · Accepted Answer · 2022-08-10 16:02:05Z

I was running into the same issue, and it was because the hive is not able to find spark files. there is a well-detailed info step if you running spark on YARN. I followed Spark 2.3 on Yarn3.0 with Hive 3.1

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

To run with YARN mode (yarn-client or yarn-cluster), link the following jars to HIVE_HOME/lib.

    scala-library
    spark-core
    spark-network-common

Used below Steps:

Get all the files for my version from /usr/hdp/current/spark2-client/jars
Copied all the above files from the spark directory to /usr.hdp/current/hive-client
Create symlinks in hive dir files in the same dir without and just .jar suffix.
Restarted the hiveserver2 to load the new files.

It worked, and now Hive can load the files, and I can submit a hive query as Spark job on Yarn. also, the files will have a version suffix at the end

Collectives™ on Stack Overflow

ClassNotFoundException: org.apache.spark.SparkConf with spark on hive

4 Answers 4

5 Comments

7 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

7 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related