Spark scala - convert an array into values from the same table in a Hierarchy type table

Question

have data table with hierarchy data model with tree structures. For example: Here is a sample data row:

-------------------------------------------
Id | name    |parentId | path       | depth
-------------------------------------------
55 | Canada  | null    | null       | 0
77 | Ontario |  55     | /55        | 1
100| Toronto |  77     | /55/77     | 2
104| Brampton| 100     | /55/77/100 | 3

I am looking to convert those rows into flattening version, sample output would be:

-------------------------------------------------------
Id | name    |parentId | path       | depth | pathNames
-------------------------------------------------------
55 | Canada  | null    | null       | 0 .   | None
77 | Ontario |  55     | /55        | 1 .   | Canada
100| Toronto |  77     | /55/77     | 2 .   | Canada, Ontario
104| Brampton| 100     | /55/77/100 | 3 .   | Canada, Ontario, Toronto

To simply how the PathFullNames is generated, it comes from the same table matching on the ids from the path. So in the above example /55/77/100 is equal to /Canada/Ontario/Toronto

Hope that makes sense.

Possible duplicate of Scala spark - Dealing with Hierarchy data tables — abiratsis
– abiratsis, Commented Mar 22, 2018 at 21:02
It would make your question clearer if you explained where the pathNames come from (i.e. looked up by Id) rather than making the reader figure this out for themselves. — DNA
– DNA, Commented Mar 22, 2018 at 21:24
Oh okay, sure I can make it more clear. I thought it was obvious from looking at the Path column to understand pathFullName columns — Shivakanth Komatreddy
– Shivakanth Komatreddy, Commented Mar 22, 2018 at 21:26

illak zapata · Accepted Answer · 2018-03-22 23:47:24Z

maybe this will help specifically with your problem:

You can create a dict from columns Id and name

// Generate a dict: Id -> name
val idMap = test.distinct.select($"Id", $"name").rdd.map(r => (r.getInt(0), r.getString(1))).collectAsMap

then define a UDF (user defined function) that will map the string

/55/77

to the string

Canada,Ontario

val pathMap = udf((p: String) => p.split("/").filter(_!="").map(id => idMap(id.toInt)).mkString(","))

finally, add a new column using this UDF and the path column

test.select(col("*"), when($"path".isNull, "None").otherwise(pathMap($"path")).as("pathNames")).show(false)

this gives you the dataframe you want:

+---+--------+--------+----------+-----+----------------------+
|Id |name    |parentId|path      |depth|pathNames             |
+---+--------+--------+----------+-----+----------------------+
|55 |Canada  |null    |null      |0    |None                  |
|77 |Ontario |55      |/55       |1    |Canada                |
|100|Toronto |77      |/55/77    |2    |Canada,Ontario        |
|104|Brampton|100     |/55/77/100|3    |Canada,Ontario,Toronto|
+---+--------+--------+----------+-----+----------------------+

Hope this will help you!

pd: Sorry for my english

Collectives™ on Stack Overflow

Spark scala - convert an array into values from the same table in a Hierarchy type table

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related