3

I have case class

case class MyCaseClass(City : String, Extras : Map[String, String])

and user defined function which returns scala.collection.immutable.Map

def extrasUdf = spark.udf.register(
   "extras_udf", 
   (age : Int, name : String) => Map("age" -> age.toString, "name" -> name)
)

but this breaks with Exception:

import spark.implicits._

spark.read.options(...).load(...)
      .select('City, 'Age, 'Name)
      .withColumn("Extras", extrasUdf('Age, 'Name))
      .drop('Age)
      .drop('Name)
      .as[MyCaseClass]

I should use spark sql's MapType(DataTypes.StringType, DataTypes.IntegerType) but I can't find any working example...

And this works if I use scala.collection.Map but I need immutable Map

1 Answer 1

4

There are many problems with your code:

  • You are using def extrastUdf =, which creates a function for registering a UDF as opposed to actually creating/registering a UDF. Use val extrasUdf = instead.

  • You are mixing value types in your map (String and Int), which makes the map be Map[String, Any] as Any is the common superclass of String and Int. Spark does not support Any. You can do at least two things: (a) switch to using a string map (with age.toString, in which case you don't need a UDF as you can simply use map()) or (b) switch to using named structs using named_struct() (again, without the need for a UDF). As a rule, only write a UDF if you cannot do what you need to do with the existing functions. I prefer to look at the Hive documentation because the Spark docs are rather sparse.

  • Also, keep in mind that type specification in Spark schema (e.g., MapType) is completely different from Scala types (e.g., Map[_, _]) and separate from how types are represented internally and mapped between Scala & Spark data structures. In other words, this has nothing to do with mutable vs. immutable collections.

Hope this helps!

Sign up to request clarification or add additional context in comments.

2 Comments

Is it possible without/with UDF to put inside WithColumn() expression to put 'Age, 'Name in map only if they are not null cause I don't want to have null values in map?
You could use if() or a case when ... then expression.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.