0

Could you please help me in understanding the following method:

def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
  val getGlobId = udf[String,Seq[GenericRowWithSchema]](genArr => {
    val globId: List[String] =
      genArr.toList
        .filter(_(0) == custDimIndex)
         .map(custDim => custDim(1).toString)

    globId match {
      case Nil => ""
      case x :: _ => x
    }
  })

  gaData.withColumn("globalId", getGlobId('customDimensions))
}
2
  • 1
    What do you want to know about it? Commented Feb 8, 2019 at 18:03
  • 1
    Is it just me or is this rather poor code? Surely collectFirst followed by fold would be cleaner and faster? Commented Feb 8, 2019 at 18:29

1 Answer 1

1

The method applies an UDF to to dataframe. The UDF seems intended to extract a single ID from column of type array<struct>, where the first element of the struct is an index, the second one an ID.

You could rewrite the code to be more readable:

def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
  val getGlobId = udf((genArr : Seq[Row]) => {
    genArr
      .find(_(0) == custDimIndex)
      .map(_(1).toString)
      .getOrElse("")
  })

  gaData.withColumn("globalId", getGlobId('customDimensions))
}

or even shorter with collectFirst:

def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
  val getGlobId = udf((genArr : Seq[Row]) => {
    genArr
      .collectFirst{case r if(r.getInt(0)==custDimIndex) => r.getString(1)}     
      .getOrElse("")
  })

  gaData.withColumn("globalId", getGlobId('customDimensions))
}
Sign up to request clarification or add additional context in comments.

1 Comment

@Jay1991 if this solved your problem, you may accept it - see What should I do when someone answers my question?.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.