1

I have a dataframe: yearDF obtained from reading an RDBMS table on Postgres which I need to ingest in a Hive table on HDFS.

  val yearDF = spark.read.format("jdbc").option("url", connectionUrl)
                         .option("dbtable", s"(${execQuery}) as year2017")
                         .option("user", devUserName)
                         .option("password", devPassword)
                         .option("numPartitions",10)
                         .load()

Before ingesting it, I have to add a new column: delete_flag of datatype: IntegerType to it. This column is used to mark a primary-key whether the row is deleted in the source table or not. To add a new column to an existing dataframe, I know that there is the option: dataFrame.withColumn("del_flag",someoperation) but there is no such option to specify the datatype of new column.

I have written the StructType for the new column as:

val delFlagColumn = StructType(List(StructField("delete_flag", IntegerType, true)))

But I don't understand how to add this column with the existing dataFrame: yearDF. Could anyone let me know how to add a new column along with its datatype, to an existing dataFrame ?

2
  • 3
    As long as someoperation returns a Column type, I believe the data type should be inferred. If someoperation generates a literal Integer, just wrap it in a lit() call, to make it a Column. Commented Aug 29, 2018 at 19:17
  • I understand that there is lit() to generate a constant value. Just wanted to learn if there is a way to add column with datatype. Commented Aug 30, 2018 at 6:41

1 Answer 1

2
import org.apache.spark.sql.types.IntegerType
df.withColumn("a", lit("1").cast(IntegerType)).show()

Though casting is not required if you are passing lit(1) as spark will infer the schema for you. But if you are passing as lit("1") it will cast it to Int

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.