0

enter code hereI am practising to add a list into dataframe col. I can def udf and register and then apply on dataframe but I want to try a different approach that extracting a list from dataframe col and them map it and then readd to the original dataframe in new column.

val df = spark.createDataFrame(Seq(("A",1),("B",2),("C",3))).toDF("Str", "Num")
+---+---+
|Str|Num|
+---+---+
|  A|  1|
|  B|  2|
|  C|  3|
+---+---+

list collected:

scala> var ls : List[String] = df.select("Str").collect().map(f=>f.getString(0)).toList
var ls: List[String] = List(A, B, C, d)

Transformation:

def f(x : String) : String = {
  if (x=="A") {x + "100"}
  else {x + x.length.toString}
  }

apply transformation:

scala> ls.map(x => f(x))
val res95: List[String] = List(A100, B1, C1, d1)

add column from the list: ERROR

import org.apache.spark.sql.functions.{lit,col}
df.withColumn("new", lit(ls)).show()

error: org.apache.spark.SparkRuntimeException: The feature is not supported: literal for 'List(A100, B1, C1)' of class scala.collection.immutable.$colon$colon. 

//Please correct here
2
  • f must return String not Unit Commented Sep 13, 2022 at 14:21
  • Thanks for that. can you please help for creating column as well Commented Sep 14, 2022 at 8:09

1 Answer 1

1

Create the udf:

val myUdf = udf { x: String =>
   if (x=="A") {x + "100"}
   else {x + x.length.toString}
 }

and the apply to the df:

df.withColumn("new", myUdf(col("Str")))

to add a new column from a List:

df.withColumn("fromListColumn", array(Seq("one", "two").map(lit(_)):_*))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.