1

I have a sequence of string

val listOfString : Seq[String] = Seq("a","b","c")

How can I make a transform like

def addColumn(example: Seq[String]): DataFrame => DataFrame {
some code which returns a transform which add these String as column to dataframe
}
input
+-------
| id                      
+-------
|  1     
+-------
output 
+-------+-------+----+-------
| id    |    a  |  b |    c                   
+-------+-------+----+-------
|  1    |  0    |  0 |    0     
+-------+-------+----+-------

I am only interested in making it as transform

0

2 Answers 2

1

You can use the transform method of the datasets together with a single select statement:

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.lit

def addColumns(extraCols: Seq[String])(df: DataFrame): DataFrame = {
  val selectCols = df.columns.map{col(_)} ++ extraCols.map{c => lit(0).as(c)}
  df.select(selectCols :_*)
}


// usage example
val yourExtraColumns : Seq[String] = Seq("a","b","c")

df.transform(addColumns(yourExtraColumns))

Resources

https://towardsdatascience.com/dataframe-transform-spark-function-composition-eb8ec296c108

https://mungingdata.com/apache-spark/chaining-custom-dataframe-transformations/

Sign up to request clarification or add additional context in comments.

2 Comments

thanks for posting but Can you make an intermediate function having datatype as DataFrame => DataFrame . The reason is that In my code i am having List[DataFrame => DataFrame] as a function
the above function you can also write it as def addColumns(extraCols: Seq[String]): DataFrame => DataFrame = { val selectCols = df.columns.map{col(_)} ++ extraCols.map{c => lit(0).as(c)} df => df.select(selectCols :_*) } this is what you mean? It is the same as above but different declaration type in Scala
1

Use .toDF() and pass your listOfString.

Example:

//sample dataframe
df.show()
//+---+---+---+
//| _1| _2| _3|
//+---+---+---+
//|  0|  0|  0|
//+---+---+---+


df.toDF(listOfString:_*).show()
//+---+---+---+
//|  a|  b|  c|
//+---+---+---+
//|  0|  0|  0|
//+---+---+---+

UPDATE:

Use foldLeft to add the columns to the existing dataframe with values.

val df=Seq(("1")).toDF("id")

val listOfString : Seq[String] = Seq("a","b","c")

val new_df=listOfString.foldLeft(df){(df,colName) => df.withColumn(colName,lit("0"))}
//+---+---+---+---+
//| id|  a|  b|  c|
//+---+---+---+---+
//|  1|  0|  0|  0|
//+---+---+---+---+

//or creating a function 
import org.apache.spark.sql.DataFrame

def addColumns(extraCols: Seq[String],df: DataFrame): DataFrame = {
  val new_df=extraCols.foldLeft(df){(df,colName) => df.withColumn(colName,lit("0"))}
  return new_df
}

addColumns(listOfString,df).show()
//+---+---+---+---+
//| id|  a|  b|  c|
//+---+---+---+---+
//|  1|  0|  0|  0|
//+---+---+---+---+

6 Comments

I want to make a transfrom of type DataFrame => DataFrame
@AdityaSeth,Use toDF() on the existing DataFrame and new dataframe will have new column names!
How you are initializing the value like I mean by default value
@AdityaSeth, Check my Updated answer by using foldLeft to add the columns.
thanks for posting but Can you make an intermediate function having datatype as DataFrame => DataFrame . The reason is that In my code i am having List[DataFrame => DataFrame] as a function
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.