1

I have the following dataframe

+-----+-----+-----+ .......+-------+
|item1|item2|item3|........| itemN |
+-----+-----+-----+........|-------+
|   v1|   v2|   v3|........| vN----+
|   v4|   v5|   v6|........| v2N---+
+-----+-----+-----+........|-------+ 

here item1 , item2 and item3 are the column names and table consists of 1 row v1,v2,v3.

I want to transform it into

colA   colB
item1    v1
item2    v2
item3    v3
 .        .
 .        .
 .        . 

Here there are two columns lets say colA and colB and rows are as shown.

How to do this transformation in spark using scala?

2
  • could please paste your code ? Commented Oct 18, 2016 at 7:55
  • stackoverflow.com/questions/35603689/…. It is similar to the answer provided as java code. Can you help me with that in scala as I am new to scala? Commented Oct 18, 2016 at 8:30

1 Answer 1

5

You can use explode:

import org.apache.spark.sql.functions._

input.show()
// +-----+-----+-----+
// |item1|item2|item3|
// +-----+-----+-----+
// |   v1|   v2|   v3|
// |   v4|   v5|   v6|
// +-----+-----+-----+

val columns: Array[String] = input.columns

val result = input.explode(columns.map(s => col(s)): _*) {
  r: Row => columns.zipWithIndex.map { case (name, index) => (name, r.getAs[String](index)) }
}.select($"_1" as "colA", $"_2" as "colB")

result.show()
// +-----+----+
// | colA|colB|
// +-----+----+
// |item1|  v1|
// |item2|  v2|
// |item3|  v3|
// |item1|  v4|
// |item2|  v5|
// |item3|  v6|
// +-----+----+
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for the above code . I have one more question , If the number of columns(i.e. item1,item2 ......item n) in initial table are very large then how to do the above thing?
How large? And why wouldn't the above code work for any (valid) number of columns?
I don't know the number of columns initially lets say there are 100 columns. I think the above code takes as case the variables i1,i2,i3 corresponding to columns so if there are many columns how will this work?
Thanks , If values v1,v2 .... are of complex types(say array) rather than string then what change I have to make in above code
Are they all the same type at least? If they are, just change r.getAs[String] to r.getAs[T] where T is the type you expect, for example it could be Array[String] or Array[(Int, String)] or whatever...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.