Error loading sequence data file using code generated by sqoop

Ask Question

Asked 6 years, 6 months ago

Modified 6 years, 6 months ago

Viewed 60 times

I imported data using sqoop in a sequence file and I am loading that data using spark-shell. The generated code from spark has references to classes in com.cloudera.sqoop.lib package. Running the command in spark-shell generates the following error:

  val ordersRDD = sc.sequenceFile("/user/pawinder/problem1-seq/orders",classOf[org.apache.hadoop.io.IntWritable],classOf[com.problem1.retaildb.orders])
    warning: Class com.cloudera.sqoop.lib.SqoopRecord not found - continuing with a stub.
    warning: Class com.cloudera.sqoop.lib.LargeObjectLoader not found - continuing with a stub.
    warning: Class com.cloudera.sqoop.lib.LargeObjectLoader not found - continuing with a stub.
    warning: Class com.cloudera.sqoop.lib.DelimiterSet not found - continuing with a stub.
    warning: Class com.cloudera.sqoop.lib.DelimiterSet not found - continuing with a stub.
    warning: Class com.cloudera.sqoop.lib.DelimiterSet not found - continuing with a stub.
    warning: Class com.cloudera.sqoop.lib.RecordParser not found - continuing with a stub.
    error: Class com.cloudera.sqoop.lib.SqoopRecord not found - continuing with a stub.

Can I instruct sqoop to generate the code without having a dependency on cloudera package? Do I need to add the jar file having com.cloudera.sqoop.lib package while starting spark-shell? Where can I find the jar file? Should I write the code for the value class so that it does not have dependency on com.cloudera.sqoop.lib package?

I am using cloudera quickstart vm. Many thanks for your help.

EDIT: The issue is resolved by adding sqoop-1.4.6.2.6.5.0-292.jar to spark2-shell

 spark-shell --jars problem1/bin/orders.jar,/usr/hdp/2.6.5.0-292/sqoop/sqoop-1.4.6.2.6.5.0-292.jar

I tried to resolve this by defining a case class for Orders, but that did not work. The MapReduce job still had a reference to com.cloudera.sqoop package classes

scala> case class Orders(order_id:Int,order_date:java.sql.Timestamp,customer_id:Int,status:String)
defined class Orders
scala> val ordersRDD = sc.sequenceFile("/user/pawinder/problem1-seq/orders",classOf[org.apache.hadoop.io.IntWritable],classOf[Orders])
 ordersRDD: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.IntWritable, Orders)] = /user/pawinder/problem1-seq/orders HadoopRDD[0] at sequenceFile at <console>:26

scala> ordersRDD.count
    19/05/14 14:29:21 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
    java.lang.NoClassDefFoundError: com/cloudera/sqoop/lib/SqoopRecord

edited May 14, 2019 at 18:46

asked May 14, 2019 at 18:16

pawinder gupta

1,24518 silver badges40 bronze badges

can you tell me how to do same in pyspark

Karthik
– Karthik

2020-01-29 14:20:52 +00:00
Commented Jan 29, 2020 at 14:20
Check this link to add jar files to pyspark. stackoverflow.com/questions/27698111/…

pawinder gupta
– pawinder gupta

2020-01-31 14:29:25 +00:00
Commented Jan 31, 2020 at 14:29
Even though iam adding jar files , when i am entering class name, in value place i.e. while reading sequence file. pyspark still throwing error class name not defined.

Karthik
– Karthik

2020-01-31 15:01:41 +00:00
Commented Jan 31, 2020 at 15:01

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Error loading sequence data file using code generated by sqoop

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked