0

I am trying to insert a dataframe into a Hive table using the following code:

import org.apache.spark.sql.SaveMode
import org.apache.spark.sql._
val hiveCont =  val hiveCont = new org.apache.spark.sql.hive.HiveContext(sc)
val empfile = sc.textFile("empfile")
val empdata = empfile.map(p => p.split(","))
case class empc(id:Int, name:String, salary:Int, dept:String, location:String)
val empRDD  = empdata.map(p => empc(p(0).toInt, p(1), p(2).toInt, p(3), p(4)))
val empDF   = empRDD.toDF()
empDF.registerTempTable("emptab")

I have a table in Hive with following DDL:

# col_name              data_type               comment             

id                      int                                         
name                    string                                      
salary                  int                                         
dept                    string                                      

# Partition Information      
# col_name              data_type               comment             

location                string           

I'm trying to insert the temporary table into the hive table as follows:

hiveCont.sql("insert into parttab select id, name, salary, dept from emptab")

This is giving an exception:

org.apache.spark.sql.AnalysisException: Table not found: emptab. 'emptab' is the temp table created from Dataframe

Here I understand that the hivecontext will run the query on 'HIVE' from Spark and it doesn't find the table there, hence resulting exception. But I don't understand how I can fix this issue. Could any tell me how to fix this ?

7
  • table party is a hive table or temp table created from dataframe ? I see that from dataframe you have create a temporary table named emptab Commented Jul 3, 2017 at 9:53
  • @SandeepSingh Updated the table name. Commented Jul 3, 2017 at 9:58
  • 1
    There are methods to save to Hive Table directly. I think saveAsTable and insertInto work for Spark 1.6. Did you try using them instead? Commented Jul 3, 2017 at 10:13
  • Which version of Spark and Scala you are using ? Commented Jul 3, 2017 at 10:19
  • @SandeepSingh Spark version: 1.6.0 Commented Jul 3, 2017 at 10:25

2 Answers 2

1

registerTempTable("emptab") : This line of code is used to create a table temporary table in spark, not in hive. For storing data to hive, you have to first create a table in hive explicitly. For storing a table value data to hive table, please use the below code:

import org.apache.spark.sql.SaveMode
import org.apache.spark.sql._

val hiveCont = new org.apache.spark.sql.hive.HiveContext(sc)
val empfile = sc.textFile("empfile")
val empdata = empfile.map(p => p.split(","))
case class empc(id:Int, name:String, salary:Int, dept:String, location:String)
val empRDD  = empdata.map(p => empc(p(0).toInt, p(1), p(2).toInt, p(3), p(4)))
val empDF   = empRDD.toDF()
empDF.write().saveAsTable("emptab");
Sign up to request clarification or add additional context in comments.

Comments

0

You are implicitly converting RDD into dataFrame but you are not importing implicit objects therefore RDD is not getting converted into dataframe. Include below line in import.

// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._

Also the case classes must be defined top level - they cannot be nested. So your final code should be like this:

import org.apache.spark._
import org.apache.spark.sql.hive.HiveContext;
import org.apache.spark.sql.DataFrame
import org.apache.spark.rdd.RDD
import org.apache.spark.sql._
import sqlContext.implicits._

val hiveCont = new org.apache.spark.sql.hive.HiveContext(sc)
case class Empc(id:Int, name:String, salary:Int, dept:String, location:String)
val empFile = sc.textFile("/hdfs/location/of/data/")
val empData = empFile.map(p => p.split(","))
val empRDD = empData.map(p => Empc(p(0).trim.toInt, p(1), p(2).trim.toInt, p(3), p(4)))
val empDF = empRDD.toDF()
empDF.registerTempTable("emptab")

Also trim all white space if you are converting a String to Integer. I have included that in the above code as well.

2 Comments

After struggling for a week, your answer finally helped. You may need to correct this line though "val hiveCont = val hiveCont = new org.apache.spark.sql.hive.HiveContext(sc)"
If you'd care to look at another issue of mine in Spark version 2: stackoverflow.com/questions/44888348/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.