Spark convert single column into array

Question

How can I convert a single column in spark 2.0.1 into an array?

+---+-----+
| id| dist| 
+---+-----+
|1.0|2.0|
|2.0|4.0|
|3.0|6.0|
|4.0|8.0|
+---+-----+

should return Array(1.0, 2.0, 3.0, 4.0)

A

import scala.collection.JavaConverters._ 
df.select("id").collectAsList.asScala.toArray

fails with

java.lang.RuntimeException: Unsupported array type: [Lorg.apache.spark.sql.Row;
java.lang.RuntimeException: Unsupported array type: [Lorg.apache.spark.sql.Row;

Possible duplicate of Spark Dataframe groupby with agg performing list appending — mrsrinivas
– mrsrinivas, Commented Nov 12, 2016 at 4:46

cheseaux · Accepted Answer · 2016-11-21 20:48:35Z

9

Why do you use JavaConverters if you then re-transform the Java List to a Scala List ? You just need to collect the dataset and then map this array of Rows to an array of doubles, like this :

df.select("id").collect.map(_.getDouble(0))

edited Nov 21, 2016 at 20:48

answered Nov 10, 2016 at 14:52

cheseaux

5,32534 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

anji_rajesh Over a year ago

collect() on dataframe is not a scalable way.

cheseaux Over a year ago

Who talked about scalability here ?

mrsrinivas · Accepted Answer · 2016-11-14 17:54:29Z

6

I'd try something like this with dataframe aggregate function - collect_list() to avoid memory overhead on the driver JVM. With this approach only selected column values will be copied to driver JVM.

df.select(collect_list("id")).first().getList[Double](0)

This returns java.util.List[Double].

edited Nov 14, 2016 at 17:54

answered Nov 11, 2016 at 4:59

mrsrinivas

35.7k13 gold badges133 silver badges132 bronze badges

Collectives™ on Stack Overflow

Spark convert single column into array

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related