0

I was converting in the Spark Shell (1.6) a List of strings into an array like this:

val mapData = List("column1", "column2", "column3")
val values = array(mapData.map(col): _*)

The type of values is:

values: org.apache.spark.sql.Column = array(column1,column2,column3)

Everything fine, but when I start developing in Eclipse I got the error:

not found: value array

So I changed to this:

val values = Array(mapData.map(col): _*)

The problem I faced then was that the type of value now changed and the udf which was consuming it doesn't accept this new type:

values: Array[org.apache.spark.sql.Column] = Array(column1, column2, column3)

Why I am not able to use array() in my IDE as in the Shell (what import am I missing)? and why array produce a org.apache.spark.sql.Column without the Array[] wrapper?

Edit: The udf function:

def replaceFirstMapOfArray = 
udf((p: Seq[Map[String, String]], o: Seq[Map[String, String]]) =>
{
    if((null != o && null !=p)){
        if ( o.size == 1 ) p
        else p ++ o.drop(1)
    }else{
        o
    }
})
4
  • where is your udf function? Commented Mar 21, 2018 at 7:13
  • yeah let me add it Commented Mar 21, 2018 at 7:14
  • 1
    a udf function would require a column argument to be passed and values in a primitive array . Thats why the error is happening. Shankar's answer below should have answered your confusion Commented Mar 21, 2018 at 7:30
  • Yes. Thanks Ramesh for your comment Commented Mar 21, 2018 at 7:51

1 Answer 1

3
val mapData = List("column1", "column2", "column3")
val values = array(mapData.map(col): _*)

Here, Array or List is the collection of objects

where as array in array(mapData.map(col): _*) is a spark function that creates a new column with type array for the same datatype columns.

For this to be used you need to import

import org.apache.spark.sql.functions.array

You can see here about the array

/**    
 * Creates a new array column. The input columns must all have the same data type.  
 * @group normal_funcs    
 * @since 1.4.0   
 */

 @scala.annotation.varargs
 def array(cols: Column*): Column = withExpr { 
   CreateArray(cols.map(_.expr)) 
 }
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. So how could I modify me code so Eclipse doesn't complain?
did you import import org.apache.spark.sql.functions.array
Yes was that import. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.