I wrote the following function which concatenates two strings and adds them in a dataframe new column:
def idCol(firstCol: String, secondCol: String, IdCol: String = FUNCTIONAL_ID): DataFrame = {
df.withColumn(IdCol,concat(col(firstCol),lit("."),col(secondCol))).dropDuplicates(IdCol)
}
My aim is to replace the use of different strings by one array of strings, and then define the new column from the concatenation of these different elements of the array. I am using an array in purpose in order to have a mutable data collection in case the number of elements to concatenate changes. Do you have any idea about how to do this So the function would be changed as :
def idCol(cols:Array[String], IdCol: String = FUNCTIONAL_ID): DataFrame = {
df.withColumn(IdCol,concat(col(cols(0)),lit("."),col(cols(1))).dropDuplicates(IdCol)
}
I want to bypass the cols(0), cols(1) and do a generic transformation which takes all elements of array and seperate them by the char "."