1

I am bit new to functional programming. How to generate the below sequence of data.

Below is the input dataset of the following columns:

INPUT

ID       PARENT_ID     AMT      NAME
 1       none          1000     A
 2       1            -5000     B
 3       2            -2000     C
 5       3             7000     D
 6       4            -7000     E
 4       none          7000     F

OUTPUT

ID       PARENT_ID     AMT       AMT_1     AMT_2     AMT_3   NAME_1  ...
 1       none          1000      none      none      none    none
 2       1            -5000      1000      none      none    A
 3       2            -2000     -5000      1000      none    B
 4       none          7000      none      none      none    none
 5       3             7000     -2000     -5000      1000    C
 6       4            -7000      7000      none      none    D

1 Answer 1

1

Here's one way to perform the recursive join up to a specific level:

import org.apache.spark.sql.functions._

val df = Seq(
  (Some(1), None, Some(1000), Some("A")),
  (Some(2), Some(1), Some(-5000), Some("B")),
  (Some(3), Some(2), Some(-2000), Some("C")),
  (Some(4), None, Some(7000), Some("D")),
  (Some(5), Some(3), Some(7000), Some("E")),
  (Some(6), Some(4), Some(-7000), Some("F"))
).toDF("id", "parent_id", "amt", "name")

val nestedLevel = 3

(1 to nestedLevel).foldLeft( df.as("d0") ){ (accDF, i) =>
    val j = i - 1
    accDF.join(df.as(s"d$i"), col(s"d$j.parent_id") === col(s"d$i.id"), "left_outer")
  }.
  select(
    col("d0.id") :: col("d0.parent_id") ::
    col("d0.amt").as("amt") :: col("d0.name").as("name") :: (
      (1 to nestedLevel).toList.map(i => col(s"d$i.amt").as(s"amt_$i")) :::
      (1 to nestedLevel).toList.map(i => col(s"d$i.name").as(s"name_$i"))
    ): _*
  ).
  show
// +---+---------+-----+----+-----+-----+-----+------+------+------+
// | id|parent_id|  amt|name|amt_1|amt_2|amt_3|name_1|name_2|name_3|
// +---+---------+-----+----+-----+-----+-----+------+------+------+
// |  1|     null| 1000|   A| null| null| null|  null|  null|  null|
// |  2|        1|-5000|   B| 1000| null| null|     A|  null|  null|
// |  3|        2|-2000|   C|-5000| 1000| null|     B|     A|  null|
// |  4|     null| 7000|   D| null| null| null|  null|  null|  null|
// |  5|        3| 7000|   E|-2000|-5000| 1000|     C|     B|     A|
// |  6|        4|-7000|   F| 7000| null| null|     D|  null|  null|
// +---+---------+-----+----+-----+-----+-----+------+------+------+
Sign up to request clarification or add additional context in comments.

1 Comment

@Sampat Kumar, please see updated answer per your expanded requirement.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.