0

Given Dataset[Array[String]]. In fact, this structure has a single field of array type. Is there any possibility to convert it into a DataFrame with each array item placed into a separate column?

If I have RDD[Array[String]] I can achieve it in this way:

val rdd: RDD[Array[String]] = ???
rdd.map(arr => Row.fromSeq(arr))

But surprisingly I cannot do the same with Dataset[Array[String]] – it says that there's no encoder for Row.

And I cannot replace an array with Tuple or case class because the size of the array is unknown at compile time.

3
  • Do you get the same exception using ds.toDF? Commented May 21, 2019 at 10:00
  • which exception? there's no exception thrown in my code sample – it just doesn't compile. Commented May 21, 2019 at 10:40
  • Sorry I meant error, but now realize toDF isn’t what you are looking for Commented May 21, 2019 at 10:44

2 Answers 2

1

If arrays have the same size, "select" can be used:

val original: Dataset[Array[String]] = Seq(Array("One", "Two"), Array("Three", "Four")).toDS()
val arraySize = original.head.size
val result = original.select(
  (0 until arraySize).map(r => original.col("value").getItem(r)): _*)
result.show(false)

Output:

+--------+--------+
|value[0]|value[1]|
+--------+--------+
|One     |Two     |
|Three   |Four    |
+--------+--------+
Sign up to request clarification or add additional context in comments.

Comments

1

Here you can do a foldLeft to create all your columns manually.

val df = Seq(Array("Hello", "world"), Array("another", "row")).toDS()

Then you calculate the size of your array.

val size_array = df.first.length

Then you add the columns to your dataframe with a foldLeft :

0.until(size_array).foldLeft(df){(acc, number) => df.withColumn(s"col$number", $"value".getItem(number))}.show

Here our accumulator is our df, and we just add the columns one by one.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.