1

Let's say I have a table :

id array_col
101 [{"system": "x", "value": "1"}, {"system": "y", "value": "2"},{"system": "z", "value": "3"}]

Where array_col basically contains an array of structures

0: {"system": "x", "value": "1"}

1: {"system": "y", "value": "2"}

2: {"system": "z", "value": "3"}

I need the output like the following table:

id system value
101 x 1
101 y 2
101 z 3

Right now I'm trying to use explode in sub queries (Since can't have multiple explode in a single select statement, and then joining them based on id. But that is giving me an output where each system is showing for each value, so instead of 3 i'm getting 9 results.

id system value
101 x 1
101 x 2
101 x 3
101 y 1
101 y 2
101 y 3
101 z 1
101 z 2
101 z 3

Help me get the output with 3 rows, instead of 9.

1 Answer 1

3

Try inline:

df.selectExpr('id', 'inline(array_col)').show()
+---+------+-----+
| id|system|value|
+---+------+-----+
|101|     x|    1|
|101|     y|    2|
|101|     z|    3|
+---+------+-----+

The above assumes that the arrays contains structs, not structs as strings. If your structs are strings, you need to parse them with from_json first:

df2 = df.selectExpr(
    'id', 'explode(array_col) array_col'
).selectExpr(
    'id', "inline(array(from_json(array_col, 'struct<system:string, value:string>')))"
)

df2.show()
+---+------+-----+
| id|system|value|
+---+------+-----+
|101|     x|    1|
|101|     y|    2|
|101|     z|    3|
+---+------+-----+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.