Recursively apply a function to elements of an array spark dataFrame

Question

I wrote the following function which concatenates two strings and adds them in a dataframe new column:

def idCol(firstCol: String, secondCol: String, IdCol: String = FUNCTIONAL_ID): DataFrame = {
  df.withColumn(IdCol,concat(col(firstCol),lit("."),col(secondCol))).dropDuplicates(IdCol)
}

My aim is to replace the use of different strings by one array of strings, and then define the new column from the concatenation of these different elements of the array. I am using an array in purpose in order to have a mutable data collection in case the number of elements to concatenate changes. Do you have any idea about how to do this So the function would be changed as :

def idCol(cols:Array[String], IdCol: String = FUNCTIONAL_ID): DataFrame = {

 df.withColumn(IdCol,concat(col(cols(0)),lit("."),col(cols(1))).dropDuplicates(IdCol)
    }

I want to bypass the cols(0), cols(1) and do a generic transformation which takes all elements of array and seperate them by the char "."

philantrovert · Accepted Answer · 2018-09-06 12:57:28Z

1

You can use concat_ws which has the following definition:

def concat_ws(sep: String, exprs: Column*): Column

You need to convert your column names which are in String to Column type:

import org.apache.spark.sql.functions._

def idCol(cols:Array[String], IdCol: String = FUNCTIONAL_ID): DataFrame = {    
    val concatCols = cols.map(col(_))    
    df.withColumn(IdCol, concat_ws(".", concatCols : _*) ).dropDuplicates(IdCol)   
}

answered Sep 6, 2018 at 12:57

philantrovert

10.1k3 gold badges43 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Recursively apply a function to elements of an array spark dataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related