java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow

Question

I am trying to get the compression to work.

Original Table defined as :

create external table orig_table (col1 String ...... coln String) 
.
.
.
partitioned by (pdate string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ( "separatorChar" = "|")
STORED AS TEXTFILE location '/user/path/to/table/';

The table orig_table has about 10 partitions with 100 rows each

To compress it, I have created a similar table with the only modification from TEXTFILE to ORCFILE

create external table orig_table_orc (col1 String ...... coln String) 
.
.
.
partitioned by (pdate string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ( "separatorChar" = "|")
STORED AS ORCFILE location '/user/path/to/table/';

Trying to copy the records across by:

set hive.exec.dynamic.partition.mode=nonstrict;
set mapred.output.compress=true;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.LzoCodec;
[have tried with other codecs as well, with same error]
set mapred.output.compression.type=RECORD;
insert overwrite table zip_test.orig_table_orc partition(pdate) select * from default.orgi_table;

The error I get is:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col1":value ... "coln":value}
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
        ... 8 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
        at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:81)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:689)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
        at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
        ... 9 more

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

Same thing works if I make the hive table as a SEQUENCEFILE - not with ORC, any work around? I have seen a couple of questions that have the same error but in a Java program and not Hive QL

Samson Scharfrichter · Accepted Answer · 2015-11-10 17:59:20Z

6

Gaah! ORC is nothing like CSV!!!

Explaining what you did wrong would take a couple of hours and a good many book excerpts about Hadoop and about DB technology in general, so the short answer is: ROW FORMAT and SERDE do not make sense for a columnar format. And since you are populating that table from within Hive, it's not an EXTERNAL but a "managed" table I.M.H.O.

create table orig_table_orc
 (col1 String ...... coln String) 
partitioned by (pdate string)
stored as Orc
location '/where/ever/you/want'
TblProperties ("orc.compress"="ZLIB")

answered Nov 10, 2015 at 17:59

Samson Scharfrichter

9,0771 gold badge19 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

GreenThumb Over a year ago

Thanks @Samson, I figured that yesterday. Now, I created a ORC table without serde properties and it works fine. Could you may be give some links or the books you refer to? I wouldn't mind spending a couple of days understanding.

Samson Scharfrichter Over a year ago

Start with slideshare.net/oom65/orc-andvectorizationhadoopsummit (a bit old, does not cover recent features e.g. "streaming" inserts) then delve into cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC

Samson Scharfrichter Over a year ago

If you have demanding perf requirements, look into streever.atlassian.net/wiki/display/HADOOP/… (tuning ORC table for hot/cold data) and thinkbig.teradata.com/… (setting bytes.per.reducer to match your compression ratio)

Samson Scharfrichter Over a year ago

For some background about the concepts involved in ORC design, and specifically the "stripes", investigate about Infobright "data packs", Netezza "zone maps" (and incidentally about the way Oracle Exadata does "smart scan")

Collectives™ on Stack Overflow

java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related