Spark-HBASE Error java.lang.IllegalStateException: unread block data

Question

I am trying to use jersey Rest-API to fetch the records from HBASE table through java-Spark program then I am getting the below mentioned error however when I am accessing the HBase-table through spark-Jar then code is executing without errors.

I have a 2 worker node for Hbase and 2 worker node for spark which are maintained by same Master.

WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 172.31.16.140): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Can you provide the code you have written as well.? the question was not enough informative — Srinivasarao Daruna
– Srinivasarao Daruna, Commented Jan 20, 2016 at 16:16

Qin Dong Liang · Accepted Answer · 2016-01-21 06:38:26Z

6

ok, i may be know your problem , because i have just experienced .

the reason is very likely miss some hbase jars , because during spark runing , spark need through hbase jar to read data , if not exist , so some exception will throws , what should you do ? it is easy

before submit job , you need add params --jars and join some jar in follows:

--jars /ROOT/server/hive/lib/hive-hbase-handler-1.2.1.jar,
/ROOT/server/hbase/lib/hbase-client-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-common-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-server-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-hadoop2-compat-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/guava-12.0.1.jar,
/ROOT/server/hbase/lib/hbase-protocol-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/htrace-core-2.04.jar

if can , enjoy it !

answered Jan 21, 2016 at 6:38

Qin Dong Liang

4353 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Saurabh Agarwal Over a year ago

I am using restAPI which is calling above function for fetching the data from HBase through spark so please let me know how to pass these jars.. I tried to set the jars in spark-env.sh but not working SPARK_CLASSPATH=/hbase-1.1.2/lib/hbase-protocol-1.1.2.jar:/hbase-1.1.2/lib/hbase-common-1.1.2.jar:/hbase-1.1.2/lib/htrace-core-3.1.0-incubating.jar:/hbase-1.1.2/lib/hbase-server-1.1.2.jar:/hbase-1.1.2/lib/hbase-client-1.1.2.jar:/hbase/hive-1.2.1/lib/hive-hbase-handler-1.2.1.jar:/hive-1.2.1/lib/hive-common-1.2.1.jar:/hive-1.2.1/lib/hive-exec-1.2.1.jar

arghtype · Accepted Answer · 2019-05-14 23:38:22Z

I've met the same problem in CDH5.4.0 when submit the spark job implemented with java api, here are my solutions:

solution 1:Using spark-submit:

--jars zookeeper-3.4.5-cdh5.4.0.jar, 
hbase-client-1.0.0-cdh5.4.0.jar, 
hbase-common-1.0.0-cdh5.4.0.jar,
hbase-server1.0.0-cdh5.4.0.jar,
hbase-protocol1.0.0-cdh5.4.0.jar,
htrace-core-3.1.0-incubating.jar,
// custom jars which are needed in the spark executors

solution 2:Use SparkConf in code:

SparkConf.setJars(new String[]{"zookeeper-3.4.5-cdh5.4.0.jar",
"hbase-client-1.0.0-cdh5.4.0.jar",
"hbase-common-1.0.0-cdh5.4.0.jar",
"hbase-server1.0.0-cdh5.4.0.jar",
"hbase-protocol1.0.0-cdh5.4.0.jar",
"htrace-core-3.1.0-incubating.jar",
// custom jars which are needed in the spark executors
});

To summary
the problem is caused by missing jars in the spark project, you need to added these jars to your project classpath, besides, use the above 2 solutions to help distribute these jars to your spark cluster.

Ranga Reddy · Accepted Answer · 2020-11-25 08:00:40Z

CDP/CDH:

Step1: Copy the hbase-site.xml file into /etc/spark/conf/ directory. cp /opt/cloudera/parcels/CDH/lib/hbase/conf/hbase-site.xml /etc/spark/conf/

Step2: Add the following libraries to spark-submit/spark-shell.

/opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar
/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar
/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar

Spark-shell:

sudo -u hive spark-shell --master yarn --jars /opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar,/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar,/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar --files /etc/spark/conf/hbase-site.xml

Collectives™ on Stack Overflow

Spark-HBASE Error java.lang.IllegalStateException: unread block data

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related