I have a Spark DataFrame with more than 100 columns. In this DataFrame, I would like to convert all the DoubleType columns to DecimalType(18,5). I able to convert one specific datatype to another using below way:
def castAllTypedColumnsTo(inputDF: DataFrame, sourceType: DataType) = {
val targetType = sourceType match {
case DoubleType => DecimalType(18,5)
case _ => sourceType
}
inputDF.schema.filter(_.dataType == sourceType).foldLeft(inputDF) {
case (acc, col) => acc.withColumn(col.name, inputDF(col.name).cast(targetType))
}
}
val inputDF = Seq((1,1.0),(2,2.0)).toDF("id","amount")
inputDF.printSchema()
root
|-- id: integer (nullable = true)
|-- amount: double (nullable = true)
val finalDF : DataFrame = castAllTypedColumnsTo(inputDF, DoubleType)
finalDF.printSchema()
root
|-- id: integer (nullable = true)
|-- amount: decimal(18,5) (nullable = true)
Here I'm filtering out the DoubleType columns and converting to DecimalType(18,5). Let's say if I want to convert another DataType, how can I implement that scenario without passing the datatype as an input parameter.
I was expecting something like below:
def convertDataType(inputDF: DataFrame): DataFrame = {
inputDF.dtypes.map{
case (colName, colType) => (colName, colType match {
case "DoubleType" => DecimalType(18,5).toString
case _ => colType
})
}
//finalDF to be created with new DataType.
}
val finalDF = convertDataType(inputDF)
Can someone help me to handle this scenario?