Exploding an array into 2 columns

Question

Suppose we want to track the hops made by an package from warehouse to the customer. We have a table which store the data but the data is in a column SAY Route The package starts at the Warehouse – YYY,TTT,MMM The hops end when the package is delivered to the CUSTOMER The values in the Route column are separated by space

ID  Route   
1   TTT A B X Y Z CUSTOMER
2   YYY E Y F G I P B X Q CUSTOMER
3   MMM R T K L CUSTOMER

Expected Output

ID START    END
1   TTT     A
1   A       B
1   B       X
.
.
.
1   Z       CUSTOMER
2   YYY     E
2   E       Y
2   Y       F
.
.
2   Q       CUSTOMER
3   MMM     R
.
.
3   L       CUSTOMER

Is there anyway to achieve this in pyspark

what's your spark version?

jxc
– jxc

2020-12-17 13:43:12 +00:00
Commented Dec 17, 2020 at 13:43 — jxc
– jxc, Commented Dec 17, 2020 at 13:43

mck · Accepted Answer · 2020-12-17 14:59:49Z

Add an index to the split route using posexplode, and get the location at the next index for each starting location using lead. If you want to remove the index simply add .drop('index') at the end.

import pyspark.sql.functions as F
from pyspark.sql.window import Window

df2 = df.select(
    'ID',
    F.posexplode(F.split('Route', ' ')).alias('index', 'start')
).withColumn(
    'end', 
    F.lead('start').over(Window.partitionBy('ID').orderBy('index'))
).orderBy('ID', 'index').dropna()

df2.show(99,0)
+---+-----+-----+--------+
|ID |index|start|end     |
+---+-----+-----+--------+
|1  |0    |TTT  |A       |
|1  |1    |A    |B       |
|1  |2    |B    |X       |
|1  |3    |X    |Y       |
|1  |4    |Y    |Z       |
|1  |5    |Z    |CUSTOMER|
|2  |0    |YYY  |E       |
|2  |1    |E    |Y       |
|2  |2    |Y    |F       |
|2  |3    |F    |G       |
|2  |4    |G    |I       |
|2  |5    |I    |P       |
|2  |6    |P    |B       |
|2  |7    |B    |X       |
|2  |8    |X    |Q       |
|2  |9    |Q    |CUSTOMER|
|3  |0    |MMM  |R       |
|3  |1    |R    |T       |
|3  |2    |T    |K       |
|3  |3    |K    |L       |
|3  |4    |L    |CUSTOMER|
+---+-----+-----+--------+

Collectives™ on Stack Overflow

Exploding an array into 2 columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related