0

I wrote the following function which concatenates two strings and adds them in a dataframe new column:

def idCol(firstCol: String, secondCol: String, IdCol: String = FUNCTIONAL_ID): DataFrame = {
  df.withColumn(IdCol,concat(col(firstCol),lit("."),col(secondCol))).dropDuplicates(IdCol)
}

My aim is to replace the use of different strings by one array of strings, and then define the new column from the concatenation of these different elements of the array. I am using an array in purpose in order to have a mutable data collection in case the number of elements to concatenate changes. Do you have any idea about how to do this So the function would be changed as :

def idCol(cols:Array[String], IdCol: String = FUNCTIONAL_ID): DataFrame = {

 df.withColumn(IdCol,concat(col(cols(0)),lit("."),col(cols(1))).dropDuplicates(IdCol)
    }

I want to bypass the cols(0), cols(1) and do a generic transformation which takes all elements of array and seperate them by the char "."

1 Answer 1

1

You can use concat_ws which has the following definition:

def concat_ws(sep: String, exprs: Column*): Column

You need to convert your column names which are in String to Column type:

import org.apache.spark.sql.functions._

def idCol(cols:Array[String], IdCol: String = FUNCTIONAL_ID): DataFrame = {    
    val concatCols = cols.map(col(_))    
    df.withColumn(IdCol, concat_ws(".", concatCols : _*) ).dropDuplicates(IdCol)   
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.