I have a task to validate/Data cleansing of decimal field I am creating file as data frame and passing decimal column for validation .
SAMPLEINPUTCOLUMN
0.1
NA
123-
.54
Null
text123test
3453$
test123.49
EXPECTEDOUTPUT
0.1
0
-123
0.54
0
123
3453
123.49
object decimalfieldvalidation {
def main(args: Array[String]): Unit = {
Logger.getLogger("org").setLevel(Level.ERROR)
val spark = SparkSession.builder.master("local[*]").appName("Decimal Field Validation").getOrCreate()
val sourcefile = spark.read.textFile("C:/Users/phadpa01/Desktop/InputFiles/decimal.csv").filter(!_.isEmpty).toDF("DECIMALFIELD")
val updatedDf = sourcefile.withColumn("DECIMALFIELD", regexp_replace(col("DECIMALFIELD"), "#N/A", "0"))
val updatedDf1 = updatedDf.withColumn("DECIMALFIELD", regexp_replace(col("DECIMALFIELD"), "NA", "0"))
}
}
I am replacing each value individually. Kindly help me on this.
Regards,
Pravin