13

I am trying to use SPARK as hive execution engine, but getting the below error. Spark 1.5.0 is installed and I am working with Hive 1.1.0 version with Hadoop 2.7.0 version.

hive_emp table is created as ORC format table in hive.

hive (Koushik)> insert into table hive_emp values (2,'Koushik',1);
Query ID = hduser_20150921072727_feba8363-258d-4d0b-8976-662e404bca88
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
    at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:140)
    at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:56)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116)
    at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:113)
    at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:95)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    ... 25 more
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. org/apache/spark/SparkConf

I also set the spark path and execution engine in hive shell.

hduser@ubuntu:~$ spark-shell
    Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_21)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.

scala> exit;
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
hduser@ubuntu:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in file:/usr/lib/hive/conf/hive-log4j.properties
hive (default)> use Koushik;
OK
Time taken: 0.593 seconds
hive (Koushik)> set spark.home=/usr/local/src/spark;

I have also created a .hiverc as below

hduser@ubuntu:/usr/lib/hive/conf$ cat .hiverc
SET hive.cli.print.header=true;
set hive.cli.print.current.db=true;
set hive.auto.convert.join=true;
SET hbase.scan.cacheblock=0;
SET hbase.scan.cache=10000;
SET hbase.client.scanner.cache=10000;
SET hive.execution.engine=spark;

DEBUG mode error details given below:

hduser@ubuntu:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/auxlib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in file:/usr/lib/hive/conf/hive-log4j.properties
hive (default)> use Koushik;
OK
Time taken: 0.625 seconds
hive (Koushik)> set hive --hiveconf hive.root.logger=DEBUG
              > ;
hive (Koushik)> set hive.execution.engine=spark;
hive (Koushik)> desc hive_emp;
OK
col_name    data_type   comment
empid                   int                                         
empnm                   varchar(50)                                 
deptid                  int                                         
Time taken: 0.173 seconds, Fetched: 3 row(s)
hive (Koushik)> select * from hive_emp;
OK
hive_emp.empid  hive_emp.empnm  hive_emp.deptid
Time taken: 1.689 seconds
hive (Koushik)> insert into table hive_emp values (2,'Koushik',1);
Query ID = hduser_20151015112525_c96a458b-34f8-42ac-ab11-52c32479a29a
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
java.lang.NoSuchMethodError: org.apache.spark.scheduler.LiveListenerBus.addListener(Lorg/apache/spark/scheduler/SparkListener;)V
    at org.apache.hadoop.hive.ql.exec.spark.LocalHiveSparkClient.<init>(LocalHiveSparkClient.java:85)
    at org.apache.hadoop.hive.ql.exec.spark.LocalHiveSparkClient.getInstance(LocalHiveSparkClient.java:69)
    at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:56)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:116)
    at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:113)
    at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:95)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. org.apache.spark.scheduler.LiveListenerBus.addListener(Lorg/apache/spark/scheduler/SparkListener;)V
hive (Koushik)> 

I have executed the above insert twice and both the times it fails. Please find the hive.log which generated today.hive.log

2
  • Where to check which spark version is compatible with hive 1.1.0 and hadoop 2.7.0 Commented Oct 23, 2015 at 5:32
  • I am facing the exact same issue. Did you get a resolution yet ?? Commented Nov 6, 2015 at 15:00

4 Answers 4

1

The reason for this error is hive not able to find the spark assembly jar.

export SPARK_HOME=/usr/local/src/spark or add the spark assembly jar in hive lib folder. This issue will be resolved.

Sign up to request clarification or add additional context in comments.

5 Comments

got the error, after adding spark assembly in hive lib folder. hive (Koushik)> insert into table hive_emp values (2,'Koushik',1); Query ID = hduser_20151012230101_83f8304d-868a-4186-9380-416c1de40f45 Total jobs = 1 Launching Job 1 out of 1 Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Can you run hive in debug mode.. hive --hiveconf hive.root.logger=DEBUG,console and provide more details about the error
appended DEBUG mode error details in the with the actual post. Please see above.
I have also provided hive.log https://drive.google.com/file/d/0B_Ed4jUfln0SNkhwNnVUMG9neGs/view?usp=sharing which generated today after executing the above insert twice.
I think In spark 2.0 (lib/jars) folder there is no spark-assembly jar so .. that u got that error..
1

I too was facing the same issue on my Ubuntu 14.4 VitualBox. Here are the steps I have followed to fix:

  1. hive> set spark.home=/usr/local/spark;

  2. hive> set spark.master=local;

  3. hive> SET hive.execution.engine=spark;

  4. Added spark-assembly jar file as shown below:

    hive> ADD jar /usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar;

7 Comments

I followed the steps, but it didn't resolved the issue for me. Please find the execution error and hive.log for this execution can be found hive.log
In addition to following steps described in the answer. Try running this, I forgot to add, export SPARK_HOME=/usr/local/src/spark
does 'spark-assembly-1.5.0-hadoop2.6.0.jar' exist in /hive/lib folder?
with or without the jar placed in hive/lib folder I am getting error. When jar is placed then the error is FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. org.apache.spark.scheduler.LiveListenerBus.addListener(Lorg/apache/spark/scheduler/SparkListener;)V
when you followed the steps mentioned in the Answer, spark-assembly-1.5.0-hadoop2.6.0.jar was in /hive/lib folder or you followed the steps without having spark-assembly-1.5.0-hadoop2.6.0.jar' in /hive/lib?
|
1

Like you, I encountered the same problem when deploying hive on spark. Finally, after my research, it was found that because hive could not load spark jars, so I made the following changes to hive-env.sh.

Add in hive-env.sh:

//Pay attention to your spark path

export SPARK_HOME=/opt/module/spark-2.4.5-bin-without-hive
export SPARK_JARS=""
for jar in `ls $SPARK_HOME/jars`; do
    export SPARK_JARS=$SPARK_JARS:$SPARK_HOME/jars/$jar
done
export HIVE_AUX_JARS_PATH=$SPARK_JARS

中文: 这就是你的hive启动时没加载到spark的jars,所以在hive-env.sh里配置一下环境就可以了。 注意这里面的路径,我最下面的lzo你也可以不配,可以参考上面的这个配置(只是少了lzo)

export SPARK_HOME=/opt/module/spark-2.4.5-bin-without-hive
export SPARK_JARS=""
for jar in `ls $SPARK_HOME/jars`; do
    export SPARK_JARS=$SPARK_JARS:$SPARK_HOME/jars/$jar
done
export HIVE_AUX_JARS_PATH=/opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar$SPARK_JARS

Comments

0

I was running into the same issue, and it was because the hive is not able to find spark files. there is a well-detailed info step if you running spark on YARN. I followed Spark 2.3 on Yarn3.0 with Hive 3.1

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

To run with YARN mode (yarn-client or yarn-cluster), link the following jars to HIVE_HOME/lib.

    scala-library
    spark-core
    spark-network-common

Used below Steps:

  • Get all the files for my version from /usr/hdp/current/spark2-client/jars
  • Copied all the above files from the spark directory to /usr.hdp/current/hive-client
  • Create symlinks in hive dir files in the same dir without and just .jar suffix.
  • Restarted the hiveserver2 to load the new files.

It worked, and now Hive can load the files, and I can submit a hive query as Spark job on Yarn. also, the files will have a version suffix at the end

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.