1

SCALA

I have a table with this struct:

FName SName Email Jan 2021 Feb 2021 Mar 2021 Total 2021
Micheal Scott [email protected] 4000 5000 3400 50660
Dwight Schrute [email protected] 1200 6900 1000 35000
Kevin Malone [email protected] 9000 6000 18000 32000

And i want to transform it to: enter image description here

I tried with 'stack' method but i couldn't get it to work. Thanks

1
  • What version of spark & scala are you using? Commented Jun 21, 2022 at 15:57

2 Answers 2

2

You can flatten the monthly/total columns via explode as shown below:

val df = Seq(
  ("Micheal", "Scott", "[email protected]", 4000, 5000, 3400, 50660),
  ("Dwight", "Schrute", "[email protected]", 1200, 6900, 1000, 35000),
  ("Kevin", "Malone", "[email protected]", 9000, 6000, 18000, 32000)
).toDF("FName","SName", "Email", "Jan 2021", "Feb 2021", "Mar 2021", "Total 2021")

val moYrCols = Array("Jan 2021", "Feb 2021", "Mar 2021", "Total 2021")  // (**)
val otherCols = df.columns diff moYrCols
val structCols = moYrCols.map{ c =>
    val moYr = split(lit(c), "\\s+")
    struct(moYr(1).as("Year"), moYr(0).as("Month"), col(c).as("Value"))
  }

df.
  withColumn("flattened", explode(array(structCols: _*))).
  select(otherCols.map(col) :+ $"flattened.*": _*).
  show
/*
+-------+-------+------------------+----+-----+-----+
|  FName|  SName|             Email|Year|Month|Value|
+-------+-------+------------------+----+-----+-----+
|Micheal|  Scott| [email protected]|2021|  Jan| 4000|
|Micheal|  Scott| [email protected]|2021|  Feb| 5000|
|Micheal|  Scott| [email protected]|2021|  Mar| 3400|
|Micheal|  Scott| [email protected]|2021|Total|50660|
| Dwight|Schrute|[email protected]|2021|  Jan| 1200|
| Dwight|Schrute|[email protected]|2021|  Feb| 6900|
| Dwight|Schrute|[email protected]|2021|  Mar| 1000|
| Dwight|Schrute|[email protected]|2021|Total|35000|
|  Kevin| Malone| [email protected]|2021|  Jan| 9000|
|  Kevin| Malone| [email protected]|2021|  Feb| 6000|
|  Kevin| Malone| [email protected]|2021|  Mar|18000|
|  Kevin| Malone| [email protected]|2021|Total|32000|
+-------+-------+------------------+----+-----+-----+
*/

(**) Use pattern matching in case there are many columns; for example:

val moYrCols = df.columns.filter(_.matches("[A-Za-z]+\\s+\\d{4}"))
Sign up to request clarification or add additional context in comments.

1 Comment

really elegant solution!
1
val data = Seq( 
     ("Micheal","Scott","[email protected]",4000,5000,3400,50660),
     ("Dwight","Schrute","[email protected]",1200,6900,1000,35000),
     ("Kevin","Malone","[email protected]",9000,6000,18000,32000)) )
val columns = Seq("FName","SName","Email","Jan 2021","Feb 2021","Mar 2021","Total 2021")
val newColumns = Array( "FName", "SName", "Email","Total 2021" )

val df = spark.createDataFrame( data ).toDF(columns:_*)
df
 .select( 
  struct( 
   (for {column <- df.columns } yield col(column)).toSeq :_* 
  ).as("mystruct")) // create your data set with a column as a struct.
 .select( 
  $"mystruct.Fname", // refer to sub element of struct with '.' operator
  $"mystruct.sname",
  $"mystruct.Email",
  explode( /make rows for every entry in the array.
   array( 
    (for {column <- df.columns if !(newColumns contains column) } //filter out the columns we already selected
     yield // for each element yield the following expression (similar to map)
      struct(
       col(s"mystruct.$column").as("value"), // create the value column
       lit(column).as("date_year")) // create a date column
   ).toSeq :_* ) // shorthand to pass scala array into varargs for array function 
  ) 
 )
.select( 
 col("*"), // just being lazy instead of typing.
 col("col.*") // create columns from new column.  Seperating the year/date should be easy from here.
).drop($"col")
.show(false)
+--------------+--------------+------------------+-----+---------+
|mystruct.Fname|mystruct.sname|mystruct.Email    |value|date_year|
+--------------+--------------+------------------+-----+---------+
|Micheal       |Scott         |[email protected] |4000 |Jan 2021 |
|Micheal       |Scott         |[email protected] |5000 |Feb 2021 |
|Micheal       |Scott         |[email protected] |3400 |Mar 2021 |
|Dwight        |Schrute       |[email protected]|1200 |Jan 2021 |
|Dwight        |Schrute       |[email protected]|6900 |Feb 2021 |
|Dwight        |Schrute       |[email protected]|1000 |Mar 2021 |
|Kevin         |Malone        |[email protected] |9000 |Jan 2021 |
|Kevin         |Malone        |[email protected] |6000 |Feb 2021 |
|Kevin         |Malone        |[email protected] |18000|Mar 2021 |
+--------------+--------------+------------------+-----+---------

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.