0

I have created this currying function to check for null values for endDateStr inside an udf, the code is as follows:(Type of col x is ArrayType[TimestampType]):

    def _getCountAll(dates: Seq[Timestamp]) = Option(dates).map(_.length)
    def _getCountFiltered(endDate: Timestamp)(dates: Seq[Timestamp]) = Option(dates).map(_.count(!_.after(endDate)))

    val getCountUDF = udf((endDateStr: Option[String]) => {
      endDateStr match {
        case None => _getCountAll _
        case Some(value) => _getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
      }
    })
    df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"))(col("x")))

But I am getting this exception while executing:

java.lang.UnsupportedOperationException: Schema for type Seq[java.sql.Timestamp] => Option[Int] is not supported

Can anyone please help me to figure out my mistake?

1 Answer 1

1

You cannot curry udf like this. If you want curry-like behavior you should return udf from the outer function:

def getCountUDF(endDateStr: Option[String]) = udf {
  endDateStr match {
    case None => _getCountAll _
    case Some(value) => 
      _getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
  }
}

df.withColumn("distinct_dx_count", getCountUDF(Some("2009-09-10"))(col("x")))

otherwise just drop currying and provide both arguments at the same time:

val getCountUDF = udf((endDateStr: String, dates: Seq[Timestamp]) => 
  endDateStr match {
    case null => _getCountAll(dates)
    case _ => 
      _getCountFiltered(Timestamp.valueOf(endDateStr + " 23:59:59"))(dates)
  }
)

df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"), col("x")))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.