1

Say I have this original dataframe:

  var df1 = Seq(("John","Jameson","TRUE","TRUE","FALSE"),("Kevin","Smith","TRUE","FALSE","TRUE"))
    .toDF("First Name","Last Name","Married","Employed","Children")

enter image description here

and I want to convert it so that it fits into this template:

enter image description here

The output dataframe will look like this:

enter image description here

I want to iterate over the columns, "Married","Employed","Children", using "when" conditions and then populate the template like the screenshot above.

Any help would truly be appreciated!

Have a great day.

0

2 Answers 2

5

You could pair up each of the selected column values/names into a Struct, group them into an Array and flatten them via explode, as shown below:

val df = Seq(
  ("John", "Jameson", "TRUE", "TRUE", "FALSE"),
  ("Kevin", "Smith", "TRUE", "FALSE", "TRUE")
).toDF("First Name", "Last Name", "Married", "Employed", "Children")

val cols = df.columns.filterNot(_.endsWith("Name"))
// cols: Array[String] = Array(Married, Employed, Children)

df.
  withColumn("Temp", explode(array(cols.map(
    c => struct(col(c).as("Value"), lit(c).as("Criteria"))): _*))
  ).
  select($"First Name" :: $"Last Name" :: $"Temp.*" :: Nil: _*).
  show
// +----------+---------+-----+--------+
// |First Name|Last Name|Value|Criteria|
// +----------+---------+-----+--------+
// |      John|  Jameson| TRUE| Married|
// |      John|  Jameson| TRUE|Employed|
// |      John|  Jameson|FALSE|Children|
// |     Kevin|    Smith| TRUE| Married|
// |     Kevin|    Smith|FALSE|Employed|
// |     Kevin|    Smith| TRUE|Children|
// +----------+---------+-----+--------+
Sign up to request clarification or add additional context in comments.

Comments

0

Another solution using stack() function

val df = Seq(
              ("John", "Jameson", "TRUE", "TRUE", "FALSE"),
              ("Kevin", "Smith", "TRUE", "FALSE", "TRUE")
).toDF("First Name", "Last Name", "Married", "Employed", "Children")
df.show(false)
df.createOrReplaceTempView("df")

+----------+---------+-------+--------+--------+
|First Name|Last Name|Married|Employed|Children|
+----------+---------+-------+--------+--------+
|John      |Jameson  |TRUE   |TRUE    |FALSE   |
|Kevin     |Smith    |TRUE   |FALSE   |TRUE    |
+----------+---------+-------+--------+--------+

spark.sql("""
select `First Name`, `Last Name`, stack(3,Married,"Married",Employed,"Employed",Children,"Children") (Value,Criteria) from df
""").show(false)

+----------+---------+-----+--------+
|First Name|Last Name|Value|Criteria|
+----------+---------+-----+--------+
|John      |Jameson  |TRUE |Married |
|John      |Jameson  |TRUE |Employed|
|John      |Jameson  |FALSE|Children|
|Kevin     |Smith    |TRUE |Married |
|Kevin     |Smith    |FALSE|Employed|
|Kevin     |Smith    |TRUE |Children|
+----------+---------+-----+--------+

If you want to use dataframe steps:

df.selectExpr( "`First Name`", "`Last Name`",  """ stack(3,Married,"Married",Employed,"Employed",Children,"Children") (value,criteria) """ ).show(false)

+----------+---------+-----+--------+
|First Name|Last Name|value|criteria|
+----------+---------+-----+--------+
|John      |Jameson  |TRUE |Married |
|John      |Jameson  |TRUE |Employed|
|John      |Jameson  |FALSE|Children|
|Kevin     |Smith    |TRUE |Married |
|Kevin     |Smith    |FALSE|Employed|
|Kevin     |Smith    |TRUE |Children|
+----------+---------+-----+--------+

Or:

df.select( $"First Name", $"Last Name", expr(""" stack(3,Married,"Married",Employed,"Employed",Children,"Children") (value,criteria) """) ).show(false)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.